Why choose between strong LLM reasoning and efficient models?
Use DeepSeek to generate high-quality training data, then distil that knowledge into ModernBERT answerdotai/ModernBERT-base for fast, efficient classification.
โ Hosting our own inference was not enough: now the Hub 4 new inference providers: fal, Replicate, SambaNova Systems, & Together AI.
Check model cards on the Hub: you can now, in 1 click, use inference from various providers (cf video demo)
Their inference can also be used through our Inference API client. There, you can use either your custom provider key, or your HF token, then billing will be handled directly on your HF account, as a way to centralize all expenses.
๐ธ Also, PRO users get 2$ inference credits per month!
Simple summary on DeepSeek AI's Janus-Pro: A fresh take on multimodal AI! It builds on its predecessor, Janus, by tweaking the training methodology rather than the model architecture. The result? Improved performance in understanding and generating multimodal data.
Janus-Pro uses a three-stage training strategy, similar to Janus, but with key modifications: โฆ Stage 1 & 2: Focus on separate training for specific objectives, rather than mixing data. โฆ Stage 3: Fine-tuning with a careful balance of multimodal data.
Benchmarks show Janus-Pro holds its own against specialized models like TokenFlow XL and MetaMorph, and other multimodal models like SD3 Medium and DALL-E 3.
The main limitation? Low image resolution (384x384). However, this seems like a strategic choice to focus on establishing a solid "recipe" for multimodal models. Future work will likely leverage this recipe and increased computing power to achieve higher resolutions.
๐ The open source community is unstoppable: 4M total downloads for DeepSeek models on Hugging Face, with 3.2M coming from the +600 models created by the community.
If you haven't seen yet, we just released Inference Providers ๐
> 4 new serverless inference providers on the Hub ๐คฏ > Use your HF API key or personal key with all providers ๐ > Chat with Deepseek R1, V3, and more on HF Hub ๐ > We support Sambanova, TogetherAI, Replicate, and Fal.ai ๐ช
Best of all, we don't charge any markup on top of the provider ๐ซฐ Have you tried it out yet? HF Pro accounts get $2 of free usage for the provider inference.
Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:
- Original release: 8 models, 540K downloads. Just the beginning...
- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5Mโnearly 5X the originals.
The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.
When you empower builders, innovation explodes. For everyone. ๐
The most popular community model? @bartowski's DeepSeek-R1-Distill-Qwen-32B-GGUF version โ 1M downloads alone.