Qwen2.5-Omni is soooo good that people build multimodal reasoning models off of it ๐ฅน > KE-Team/Ke-Omni-R-3B is open-source audio reasoning model sota on average of benchmarks, based on Qwen/Qwen2.5-Omni-3B ๐ฃ๏ธ > Haoz0206/Omni-R1 is a video reasoning model with pixel level grounding (see below) and it's super competitive โฏ๏ธ based on Qwen/Qwen2.5-Omni-7B
vision LMs are saturated over benchmarks, so we built vibe eval ๐ฌ
> compare different models with refreshed in-the-wild examples in different categories ๐ค > submit your favorite model for eval no numbers -- just vibes!
emerging trend: models that can understand image + text and generate image + text
don't miss out โคต๏ธ > MMaDA: single 8B diffusion model aligned with CoT (reasoning!) + UniGRPO Gen-Verse/MMaDA > BAGEL: 7B MoT model based on Qwen2.5, SigLIP-so-400M, Flux VAE ByteDance-Seed/BAGEL both by ByteDance! ๐ฑ
multimodal ๐ฌ๐ผ๏ธ > new moondream (VLM) is out: it's 4-bit quantized (with QAT) version of moondream-2b, runs on 2.5GB VRAM at 184 tps with only 0.6% drop in accuracy (OS) ๐ > ByteDance released BAGEL-7B, an omni model that understands and generates both image + text. they also released Dolphin, a document parsing VLM ๐ฌ (OS) > Google DeepMind dropped MedGemma in I/O, VLM that can interpret medical scans, and Gemma 3n, an omni model with competitive LLM performance
> MMaDa is a new 8B diffusion language model that can generate image and text
LLMs > Mistral released Devstral, a 24B coding assistant (OS) ๐ฉ๐ปโ๐ป > Fairy R1-32B is a new reasoning model -- distilled version of DeepSeek-R1-Distill-Qwen-32B (OS) > NVIDIA released ACEReason-Nemotron-14B, new 14B math and code reasoning model > sarvam-m is a new Indic LM with hybrid thinking mode, based on Mistral Small (OS) > samhitika-0.0.1 is a new Sanskrit corpus (BookCorpus translated with Gemma3-27B)
image generation ๐จ > MTVCrafter is a new human motion animation generator
- Itโs still free! - Video 1 walks you through onboarding to the course - The first live session is next week! - You can now get a certificate via exam app - We improved and written material with interactive quizzes
If youโre studying MCP and want a live, interactive, visual, certified course, then join us on the hub!
> first reasoning model for robotics > based on Qwen 2.5-VL-7B, use with Hugging Face transformers or vLLM ๐ค > comes with SFT & alignment datasets and a new benchmark ๐
hey hey @mradermacher - VB from Hugging Face here, we'd love to onboard you over to our optimised xet backend! ๐ฅ
as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!
in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.
LLMs ๐ฌ > Alibaba Qwen released WorldPM-72B, new World Preference Model trained with 15M preference samples (OS) > II-Medical-8B, new LLM for medical reasoning that comes in 8B by Intelligent-Internet > TRAIL is a new dataset by Patronus for trace error reasoning for agents (OS)
Multimodal ๐ผ๏ธ๐ฌ > Salesforce Research released BLIP3o, a new any-to-any model with image-text input and image-text output ๐ฌitโs based on an image encoder, a text decoder and a DiT, and comes in 8B > They also released pre-training and fine-tuning datasets > MMMG is a multimodal generation benchmark for image, audio, text (interleaved)
Image Generation โฏ๏ธ > Alibaba Wan-AI released Wan2.1-VACE, video foundation model for image and text to video, video-to-audio and more tasks, comes in 1.3B and 14B (OS) > ZuluVision released MoviiGen1.1, new cinematic video generation model based on Wan 2.1 14B (OS) > multimodalart released isometric-skeumorphic-3d-bnb, an isometric 3D asset generator (like AirBnB assets) based on Flux > LTX-Video-0.9.7-distilled is a new real-time video generation (text and image to video) model by Lightricks > Hidream_t2i_human_preference is a new text-to-image preference dataset by Rapidata with 195k human responses from 38k annotators
Audio ๐ฃ๏ธ > stabilityai released stable-audio-open-small new text-to-audio model > TEN-framework released ten-vad, voice activity detection model (OS)
We're thrilled to announce the launch of our comprehensive Model Context Protocol (MCP) Course! This free program is designed to take learners from foundational understanding to practical application of MCP in AI.
In this course, you will: ๐ Study Model Context Protocol in theory, design, and practice. ๐งโ๐ป Learn to use established MCP SDKs and frameworks. ๐พ Share your projects and explore applications created by the community. ๐ Participate in challenges and evaluate your MCP implementations. ๐ Earn a certificate of completion.
At the end of this course, you'll understand how MCP works and how to build your own AI applications that leverage external data and tools using the latest MCP standards.