John Smith's picture

John Smith PRO

John6666

AI & ML interests

None yet

Recent Activity

Organizations

open/ acc's profile picture Solving Real World Problems's profile picture FashionStash Group meeting's profile picture

John6666's activity

reacted to florentgbelidji's post with ๐Ÿ”ฅ about 14 hours ago
view post
Post
320
๐—ฃ๐—น๐—ฎ๐—ป๐—ป๐—ถ๐—ป๐—ด ๐—ฌ๐—ผ๐˜‚๐—ฟ ๐—ก๐—ฒ๐˜…๐˜ ๐—ฆ๐—ธ๐—ถ ๐—”๐—ฑ๐˜ƒ๐—ฒ๐—ป๐˜๐˜‚๐—ฟ๐—ฒ ๐—๐˜‚๐˜€๐˜ ๐—š๐—ผ๐˜ ๐—ฆ๐—บ๐—ฎ๐—ฟ๐˜๐—ฒ๐—ฟ: ๐—œ๐—ป๐˜๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐—ถ๐—ป๐—ด ๐—”๐—น๐—ฝ๐—ถ๐—ป๐—ฒ ๐—”๐—ด๐—ฒ๐—ป๐˜!๐Ÿ”๏ธโ›ท๏ธ

With the big hype around AI agents these days, I couldnโ€™t stop thinking about how AI agents could truly enhance real-world activities.
What sort of applications could we build with those AI agents: agentic RAG? self-correcting text-to-sql? Nah, boringโ€ฆ

Passionate about outdoors, Iโ€™ve always dreamed of a tool that could simplify planning mountain trips while accounting for all potential risks. Thatโ€™s why I built ๐—”๐—น๐—ฝ๐—ถ๐—ป๐—ฒ ๐—”๐—ด๐—ฒ๐—ป๐˜, a smart assistant designed to help you plan safe and enjoyable itineraries in the French Alps and Pyrenees.

Built using Hugging Face's ๐˜€๐—บ๐—ผ๐—น๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐˜€ library, Alpine Agent combines the power of AI with trusted resources like ๐˜š๐˜ฌ๐˜ช๐˜ต๐˜ฐ๐˜ถ๐˜ณ.๐˜ง๐˜ณ (https://skitour.fr/) and METEO FRANCE. Whether itโ€™s suggesting a route with moderate difficulty or analyzing avalanche risks and weather conditions, this agent dynamically integrates data to deliver personalized recommendations.

In my latest blog post, I share how I developed this projectโ€”from defining tools and integrating APIs to selecting the best LLMs like ๐˜˜๐˜ธ๐˜ฆ๐˜ฏ2.5-๐˜Š๐˜ฐ๐˜ฅ๐˜ฆ๐˜ณ-32๐˜‰-๐˜๐˜ฏ๐˜ด๐˜ต๐˜ณ๐˜ถ๐˜ค๐˜ต, ๐˜“๐˜ญ๐˜ข๐˜ฎ๐˜ข-3.3-70๐˜‰-๐˜๐˜ฏ๐˜ด๐˜ต๐˜ณ๐˜ถ๐˜ค๐˜ต, or ๐˜Ž๐˜—๐˜›-4.

โ›ท๏ธ Curious how AI can enhance adventure planning?โ€จTry the app and share your thoughts: florentgbelidji/alpine-agent

๐Ÿ‘‰ Want to build your own agents? Whether for cooking, sports training, or other passions, the possibilities are endless. Check out the blog post to learn more: https://huggingface.co/blog/florentgbelidji/alpine-agent

Many thanks to @m-ric for helping on building this tool with smolagents!
reacted to Kseniase's post with ๐Ÿ‘€ about 14 hours ago
view post
Post
231
Today, we spoke with Snowflakeโ€™s AI Research Team Leads, Yuxiong He and Samyam Rajbhandari ( @samyam ) (he is also one the researchers behind DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference (2401.08671) and other DeepSpeed papers)

Collaborating with their co-authors to reduce inference costs for enterprise-specific tasks, they observed that inputs are often significantly larger than outputs. This is because itโ€™s in the nature of enterprises to analyze enormous amounts of information trying to extract valuable insights, which are much shorter. To address this, they developed SwiftKV SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation (2410.03960), an optimization that reduces LLM inference costs by up to 75% for Meta Llama LLMs, enhancing efficiency and performance in enterprise AI tasks.

Today they are open-sourcing SwiftKV ( Snowflake/Llama-3.1-SwiftKV-8B-Instruct) and ArcticTrainging Platform.
In our new episode "15 minutes with a Researcher" they explain how SwiftKV works, its applicability to other architectures, its limitations, and additional methods to further reduce computation costs in inference.
Watch the full 15 min interview here (https://youtu.be/9x1k7eXe-6Q?si=4_HQOyi1CPHgvlrx)
reacted to zamal's post with ๐Ÿš€ about 14 hours ago
view post
Post
458
zamal/Multimodal-Chat-PDF

๐Ÿš€ Introducing Chat PDF Multimodal ๐Ÿ’ฌ

Interact with your PDF documents like never before! ๐Ÿคฏ
Extract text & images, then ask context-aware questions based on both. Powered by RAG techniques & multimodal LLMs. Perfect for studying, research & more! ๐Ÿ“๐Ÿ‘€
Try it out now!!!! โœ๏ธ

#LlavaNext #MultimodalAI #Transformers
reacted to nroggendorff's post with ๐Ÿ˜Žโค๏ธ about 15 hours ago
view post
Post
804
maybe a page where you can find open orgs to get started in collaboration with hf. i see so many people that dont have a direction.


i dont have ulterior motives, so dont ask
  • 1 reply
ยท
reacted to alibabasglab's post with ๐Ÿš€ about 15 hours ago
reacted to openfree's post with ๐Ÿš€ about 15 hours ago
view post
Post
1046
๐Ÿงช Chemical Genesis: Advanced Chemical Structure Analysis Tool
Welcome to Chemical Genesis - an intuitive and powerful tool for analyzing chemical structures and generating detailed reports! This application combines state-of-the-art vision-language models with user-friendly features to enhance your chemical analysis workflow.
๐ŸŒŸ Key Features

Dual Model Support: Choose between ChemQwen-1 and ChemQwen-2 for your analysis needs
Interactive Analysis: Upload chemical structure images and ask specific questions
Professional Documentation: Generate custom PDF/DOCX reports with your analysis results
Flexible Formatting: Customize font size, line spacing, text alignment, and image dimensions

๐Ÿ’ก How to Use

Select your preferred model (ChemQwen-1 or ChemQwen-2)
Upload a chemical structure image
Ask your question about the structure
Get AI-powered analysis in real-time
Generate a professional document with your results

๐Ÿ“„ Document Generation Options

Format: Choose between PDF and DOCX
Styling: Adjust font size (8-24pt), line spacing (0.5-3.0)
Layout: Select text alignment and image size preferences
Export: Download your formatted document with one click

๐Ÿ”ฌ Technical Details
Built with:

Hugging Face Transformers
Qwen2VL Architecture
Gradio Interface
PyTorch (CUDA-enabled)

๐Ÿš€ Get Started
Try it now! Simply upload your chemical structure image and start exploring. Perfect for:

Research Documentation
Chemical Analysis Reports
Structure Verification
Educational Materials

๐Ÿ“ Note
For optimal results, please use clear, high-resolution images of chemical structures. The system works best with well-defined molecular diagrams and chemical notations.

VIDraft/ChemGenesis
reacted to burtenshaw's post with ๐Ÿ˜Ž๐Ÿค—๐Ÿš€ about 15 hours ago
reacted to ZennyKenny's post with ๐Ÿ‘ about 15 hours ago
view post
Post
263
On-demand audio transcription is an often-requested service without many good options on the market.

Using Hugging Face Spaces with Gradio SDK and the OpenAI Whisper model, I've put together a simple interface that supports the transcription and summarisation of audio files up to five minutes in length, completely open source and running on CPU upgrade. The cool thing is that it's built without a dedicated inference endpoint, completely on public infrastructure.

Check it out: ZennyKenny/AudioTranscribe

I wrote a short article about the backend mechanics for those who are interested: https://huggingface.co/blog/ZennyKenny/on-demand-public-transcription
reacted to nataliaElv's post with ๐Ÿค—๐Ÿ”ฅโค๏ธ about 15 hours ago
view post
Post
707
New chapter in the Hugging Face NLP course! ๐Ÿค— ๐Ÿš€

We've added a new chapter about the very basics of Argilla to the Hugging Face NLP course. Learn how to set up an Argilla instance, load & annotate datasets, and export them to the Hub.ย 

Any feedback for improvements welcome!

https://huggingface.co/learn/nlp-course/chapter10
reacted to davidberenstein1957's post with ๐Ÿ‘€ about 15 hours ago
reacted to not-lain's post with ๐Ÿ”ฅ๐Ÿค— about 15 hours ago
view post
Post
220
we now have more than 2000 public AI models using ModelHubMixin๐Ÿค—
reacted to merve's post with โค๏ธ about 16 hours ago
view post
Post
601
Everything that happened this week in open AI, a recap ๐Ÿค  merve/jan-17-releases-678a673a9de4a4675f215bf5

๐Ÿ‘€ Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance

๐Ÿ’ฌ LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens ๐Ÿคฏ
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D ๐Ÿง™๐Ÿปโ€โ™‚๏ธ
- ReaderLM-v2 is a new HTML parsing model by Jina AI

- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3

๐Ÿ–ผ๏ธ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture

๐Ÿ—ฃ๏ธ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities

๐Ÿ“– Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm
reacted to jxm's post with ๐Ÿš€ about 16 hours ago
view post
Post
148
New state-of-the-art BERT-size retrieval model: *cde-small-v2* ๐Ÿฅณ๐Ÿพ

Hi everyone! We at Cornell are releasing a new retrieval model this week. It uses the contextual embeddings framework, is based on ModernBERT backbone, and gets state-of-the-art results on the MTEB benchmark for its model size (140M parameters). cde-small-v2 gets an average score of 65.6 across the 56 datasets and sees improvements from our previous model in *every* task domain (retrieval, classification, etc.).

We made a lot of changes to make this model work. First of all, ModernBERT has a better tokenizer, which probably helped this work out-of-the-box. We also followed the principles from the CDE paper and used harder clusters and better hard-negative filtering, which showed a small performance improvement. And we made a few small changes that have been shown to work on the larger models: we disabled weight decay, masked out the prefix tokens during pooling, and added a residual connection from the first-stage to the second-stage for better gradient flow.

We're still looking for a computer sponsor to help us scale CDE to larger models. Since it's now state-of-the-art at the 100M parameter scale, it seems to be a reasonable bet that we could train a state-of-the-art large model if we had the GPUs. If you're interested in helping with this, please reach out!

Here's a link to the model: jxm/cde-small-v2
And here's a link to the paper: Contextual Document Embeddings (2410.02525)
reacted to MonsterMMORPG's post with ๐Ÿ‘€ about 16 hours ago
view post
Post
361
Most Powerful Vision Model CogVLM 2 now works amazing on Windows with new Triton pre-compiled wheels - 19 Examples - Locally tested with 4-bit quantization - Second example is really wild - Can be used for image captioning or any image vision task

The APP and the installers : https://www.patreon.com/posts/120193330

Check below screenshots to see how to use it

Currently the APP works amazing with 4-bit quantization very fast

I am searching to lower VRAM usage even further with like adding CPU-Offloading and other stuff if possible

Previously we were lacking Triton but it now works perfect

My installer installs into a Python 3.10 VENV completely isolated and clean

You can see entire APP and installer source code

If you get Triton error make sure to delete your Triton cache after installing the app like below

C:\Users\Furkan.triton

Hugging Face repo with sample code : THUDM/cogvlm2-llama3-chat-19B

GitHub repo : https://github.com/THUDM/CogVLM2

Triton Windows : https://github.com/woct0rdho/triton-windows/releases