HuggingFaceM4

Enterprise
company
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

HuggingFaceM4's activity

merveΒ 
posted an update about 10 hours ago
merveΒ 
posted an update 1 day ago
view post
Post
910
Past week was insanely packed for open AI! 😱
Luckily we picked some highlights for you ❀️ lfg!

πŸ’¬ LLMs/VLMs
> Deepseek 🐳 released deepseek-ai/DeepSeek-R1-0528, 38B model, only 0.2 and 1.4 points behind o3 in AIME 24/25 🀯 they also released an 8B distilled version based on Qwen3 (OS) deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
> Xiaomi released MiMo-7B-RL (LLM for code and math) and MiMo-VL-7B-RL (VLM for visual reasoning, GUI agentic task and general use) (OS) 😍 XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212
> NVIDIA released , new reasoning model nvidia/Nemotron-Research-Reasoning-Qwen-1.5B
> DS: MiniMax released https://huggingface.co/MiniMaxAI/SynLogic, new 49k logical reasoning examples across 35 tasks including solving cipher, sudoku and more!

πŸ–ΌοΈ Image/Video Generation
> tencent released tencent/HunyuanPortrait, a new model for consistent portrait generation with SVD Research license. They also released tencent/HunyuanVideo-Avatar, audio driven avatar generation (OS)
> showlab released showlab/OmniConsistency, consistent stylization model (OS)
> Rapidata/text-2-video-human-preferences-veo3 is a new T2V preference dataset based on videos from Veo3 with 46k examples (OS)

AudioπŸ—£οΈ
> https://huggingface.co/ResembleAI/Chatterbox is a new 500M text-to-speech model preferred more than ElevenLabs (OS) 😍
> PlayHT/PlayDiffusion is a new speech editing model (OS)

Other
> https://huggingface.co/NX-AI/TiReX is a new time series foundation model
> Yandex released a huge (4.79B examples!) video recommendation dataset https://huggingface.co/yandex/yambda

OS ones have Apache2.0 or MIT licenses, find more models and datasets here merve/releases-30-may-6840097345e0b1e915bff843
merveΒ 
posted an update 1 day ago
view post
Post
867
Yesterday was the day of vision language action models (VLAs)!

> SmolVLA: open-source small VLA for robotics by Hugging Face LeRobot team πŸ€–
Blog: https://huggingface.co/blog/smolvla
Model: lerobot/smolvla_base

> Holo-1: 3B & 7B web/computer use agentic VLAs by H Company πŸ’»
Model family: Hcompany/holo1-683dd1eece7eb077b96d0cbd
Demo: https://huggingface.co/spaces/multimodalart/Holo1
Blog: https://huggingface.co/blog/Hcompany/holo1
super exciting times!!
m-ricΒ 
posted an update 1 day ago
view post
Post
664
If you didn't yet, you should read the technical report for SmolVLA, published yesterday by the Hugging Face robotics team!
➑️ Amongst other ideas, it introduces "Async inference" to boost their robot actions.

Robots have a problem: performing the actions takes time (Unlike agents where action executions are near-instant!)
Most often, robots wait until they've finished performing actions to start thinking about hte next steps. This is a huge latency cost!

So the team decided to have the PolicyServer (aka the"thinking" part) restart early : instead of waiting for the n observations they just sent to be completed, they gather the observations after k < n steps, and start preparing the next actions based on that while the steps are running until n, to directly send their next steps.

➑️ This boosted robot throughput by ~30%! (nearly 2Γ— tasks per time window).

gg @cadene and team! πŸ‘

Report here: SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics (2506.01844)
abidlabsΒ 
posted an update 2 days ago
view post
Post
1072
The Gradio x Agents x MCP hackathon keeps growing! We now have more $1,000,000 in credit for participants and and >$16,000 in cash prizes for winners.

We've kept registration open until the end of this week, so join and let's build cool stuff together as a community: ysharma/gradio-hackathon-registration-2025
merveΒ 
posted an update 2 days ago
merveΒ 
posted an update 3 days ago
merveΒ 
posted an update 4 days ago
view post
Post
1094
New GUI model by Salesforce AI & Uni HK: Jedi
tianbaoxiexxx/Jedi xlangai/Jedi-7B-1080p πŸ€—
Based on Qwen2.5-VL with Apache 2.0 license

prompt with below screenshot β†’ select "find more"
  • 2 replies
Β·
merveΒ 
posted an update 6 days ago
view post
Post
1939
HOT: MiMo-VL new 7B vision LMs by Xiaomi surpassing gpt-4o (Mar), competitive in GUI agentic + reasoning tasks ❀️‍πŸ”₯ XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212

not only that, but also MIT license & usable with transformers πŸ”₯
clemΒ 
posted an update 7 days ago
view post
Post
5310
Today, we're unveiling two new open-source AI robots! HopeJR for $3,000 & Reachy Mini for $300 πŸ€–πŸ€–πŸ€–

Let's go open-source AI robotics!
Β·
merveΒ 
posted an update 7 days ago
view post
Post
2684
introducing: VLM vibe eval πŸͺ­ visionLMsftw/VLMVibeEval

vision LMs are saturated over benchmarks, so we built vibe eval πŸ’¬

> compare different models with refreshed in-the-wild examples in different categories 🀠
> submit your favorite model for eval
no numbers -- just vibes!
jeffboudierΒ 
posted an update 8 days ago
merveΒ 
posted an update 9 days ago
view post
Post
2532
emerging trend: models that can understand image + text and generate image + text

don't miss out ‡️
> MMaDA: single 8B diffusion model aligned with CoT (reasoning!) + UniGRPO Gen-Verse/MMaDA
> BAGEL: 7B MoT model based on Qwen2.5, SigLIP-so-400M, Flux VAE ByteDance-Seed/BAGEL
both by ByteDance! 😱

I keep track of all any input β†’ any output models here https://huggingface.co/collections/merve/any-to-any-models-6822042ee8eb7fb5e38f9b62
  • 1 reply
Β·
m-ricΒ 
posted an update 10 days ago
view post
Post
2553
A new research paper from KAIST builds on smolagents to push boundaries of distillation πŸ₯³
➑️ "Distilling LLM Agent into Small Models with Retrieval and Code Tools" teaches that, when trying to distil reasoning capability from a strong LLM ("teacher") into a smaller one ("student"), it's much better to use Agent traces than CoT traces.

Advantages are:
1. Improved generalization
Intuitively, this is because your agent can encounter more "surprising" results by interacting with its environment : for example, a web research called by the LLM teacher in agent mode can bring results that the LLM teacher would not have generated in CoT.

2. Reduce hallucinations
The trace won't hallucinate tool call outputs!

Thank you @akseljoonas for mentioning this paper!
merveΒ 
posted an update 10 days ago
view post
Post
3108
what happened in open AI past week? so many vision LM & omni releases πŸ”₯ merve/releases-23-may-68343cb970bbc359f9b5fb05

multimodal πŸ’¬πŸ–ΌοΈ
> new moondream (VLM) is out: it's 4-bit quantized (with QAT) version of moondream-2b, runs on 2.5GB VRAM at 184 tps with only 0.6% drop in accuracy (OS) 🌚
> ByteDance released BAGEL-7B, an omni model that understands and generates both image + text. they also released Dolphin, a document parsing VLM 🐬 (OS)
> Google DeepMind dropped MedGemma in I/O, VLM that can interpret medical scans, and Gemma 3n, an omni model with competitive LLM performance

> MMaDa is a new 8B diffusion language model that can generate image and text



LLMs
> Mistral released Devstral, a 24B coding assistant (OS) πŸ‘©πŸ»β€πŸ’»
> Fairy R1-32B is a new reasoning model -- distilled version of DeepSeek-R1-Distill-Qwen-32B (OS)
> NVIDIA released ACEReason-Nemotron-14B, new 14B math and code reasoning model
> sarvam-m is a new Indic LM with hybrid thinking mode, based on Mistral Small (OS)
> samhitika-0.0.1 is a new Sanskrit corpus (BookCorpus translated with Gemma3-27B)

image generation 🎨
> MTVCrafter is a new human motion animation generator
  • 1 reply
Β·
clemΒ 
posted an update 11 days ago
view post
Post
3213
It's just become easier to share your apps on the biggest AI app store (aka HF spaces) for unlimited storage, more visibility and community interactions.

Just pick a React, Svelte, or Vue template when you create your space or add app_build_command: npm run build in your README's YAML and app_file: build/index.html in your README's YAML block.

Or follow this link: https://huggingface.co/new-space?sdk=static

Let's build!
  • 1 reply
Β·
jeffboudierΒ 
posted an update 13 days ago
merveΒ 
posted an update 14 days ago
view post
Post
2581
Google released MedGemma on I/O'25 πŸ‘ google/medgemma-release-680aade845f90bec6a3f60c4

> 4B and 27B instruction fine-tuned vision LMs and a 4B pre-trained vision LM for medicine
> available with transformers from the get-go πŸ€—

they also released a cool demo for scan reading ➑️ google/rad_explain

use with transformers ‡️
  • 1 reply
Β·