Data Agents

Enterprise
community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

data-agents's activity

thomwolfΒ 
posted an update about 20 hours ago
view post
Post
1548
If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.

At Hugging Faceβ€”in robotics and across all AI fieldsβ€”we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!

You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at pollen-robotics

We're so excited to build and share more open-source robots with the world in the coming months!
  • 1 reply
Β·
m-ricΒ 
posted an update 15 days ago
view post
Post
2129
πŸš€ DeepSeek R1 moment has come for GUI agents: Rule-based Reinforcement Learning gives better results than SFT with 500x smaller datasets!

Traditionally (by which I mean "in the last few months"), GUI agents have been trained with supervised fine-tuning (SFT). This meant, collecting huge datasets of screen captures from people using computers, and using these to fine-tune your model. πŸ“š

πŸ‘‰ But last week, a new paper introduced UI-R1, applying DeepSeek's R1-style rule-based reinforcement learning (RL) specifically to GUI action prediction tasks.
This is big news: with RL, maybe we could build good agents without the need for huge datasets.

UI-R1 uses a unified reward function that evaluates multiple responses from models, optimizing via policy algorithms like Group Relative Policy Optimization (GRPO).

Specifically, the reward function assesses:
🎯 Action type accuracy: Does the predicted action match the ground truth?
πŸ“ Coordinate accuracy (specifically for clicks): Is the predicted click within the correct bounding box?
πŸ“‘ Output format: Does the model clearly articulate both its reasoning and final action?

Using just 136 carefully selected mobile tasksβ€”compared to 76,000 tasks for larger models like OS-Atlasβ€”UI-R1 shows significant efficiency and improved performance:
πŸ“ˆ Boosted action prediction accuracy from 76% to 89% on AndroidControl.
🌐 Outperformed larger, SFT-trained models (e.g., OS-Atlas-7B), demonstrating superior results with vastly fewer data points (136 tasks vs. 76K).
πŸ” Enhanced adaptability and generalization, excelling even in out-of-domain scenarios.

The paper tests this RL-based method only in low-level GUI tasks. Could it generalize to more complex interactions? 🧐

Read the full paper here πŸ‘‰ UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning (2503.21620)
thomwolfΒ 
posted an update 16 days ago
view post
Post
3174
The new DeepSite space is really insane for vibe-coders
enzostvs/deepsite

With the wave of vibe-coding-optimized LLMs like the latest open-source DeepSeek model (version V3-0324), you can basically prompt out-of-the-box and create any app and game in one-shot.

It feels so powerful to me, no more complex framework or under-the-hood prompt engineering to have a working text-to-app tool.

AI is eating the world and *open-source* AI is eating AI itself!

PS: and even more meta is that the DeepSite app and DeepSeek model are both fully open-source code => time to start recursively improve?

PPS: you still need some inference hosting unless you're running the 600B param model at home, so check the very nice list of HF Inference Providers for this model: deepseek-ai/DeepSeek-V3-0324
  • 1 reply
Β·
freddyaboultonΒ 
posted an update 21 days ago
view post
Post
1465
Ever wanted to share your AI creations with friends? ✨

Screenshots are fine, but imagine letting others play with your ACTUAL model!

Introducing Gradio deep links πŸ”— - now you can share interactive AI apps, not just images.

Add a gr.DeepLinkButton to any app and get shareable URLs that let ANYONE experiment with your models.

m-ricΒ 
posted an update about 1 month ago
view post
Post
4821
smolagents now support vLLM! πŸ₯³

As one of the most popular local inference solutions, the community had been asking us to integrate vLLM: after a heavy refactoring of our LLM classes, we've just released smolagents 1.11.0, with a brand new VLLMModel class.

Go try it and tell us what you think!

https://github.com/huggingface/smolagents/blob/45b2c86857b7f7657daaa74e4d17d347e9e2c4a4/src/smolagents/models.py#L497
thomwolfΒ 
posted an update about 1 month ago
view post
Post
2808
We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: ⚑️OlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions
freddyaboultonΒ 
posted an update about 1 month ago
view post
Post
1925
Privacy matters when talking to AI! πŸ”‡

We've just added a microphone mute button to FastRTC in our latest update (v0.0.14). Now you control exactly what your LLM hears.

Plus lots more features in this release! Check them out:
https://github.com/freddyaboulton/fastrtc/releases/tag/0.0.14
lewtunΒ 
posted an update about 1 month ago
view post
Post
2355
Introducing OlympicCoder: a series of open reasoning models that can solve olympiad-level programming problems πŸ§‘β€πŸ’»

- 7B open-r1/OlympicCoder-7B
- 32B open-r1/OlympicCoder-32B

We find that OlympicCoder models outperform Claude 3.7 Sonnet, as well as others over 100x larger πŸ’ͺ

Together with the models, we are releasing:

πŸ“ŠCodeForces-CoTs: new dataset of code problems from the most popular competitive coding platform, with R1 traces in C++ and Python open-r1/codeforces-cots

πŸ† IOI'2024: a new benchmark of VERY hard programming problems where even frontier models struggle to match human performance open-r1/ioi

For links to the models and datasets, check out our latest progress report from Open R1: https://huggingface.co/blog/open-r1/update-3
  • 1 reply
Β·
m-ricΒ 
posted an update about 1 month ago
view post
Post
1033
Our new Agentic leaderboard is now live!πŸ’₯

If you ever asked which LLM is best for powering agents, we've just made a leaderboard that ranks them all! Built with @albertvillanova , this ranks LLMs powering a smolagents CodeAgent on subsets of various benchmarks. βœ…

πŸ† GPT-4.5 comes on top, even beating reasoning models like DeepSeek-R1 or o1. And Claude-3.7-Sonnet is a close second!

The leaderboard also allows you to show the scores of vanilla LLMs (without any agentic setup) on the same benchmarks: this shows the huge improvements brought by agentic setups. πŸ’ͺ

(Note that results will be added manually, so the leaderboard might not always have the latest LLMs)
  • 1 reply
Β·
freddyaboultonΒ 
posted an update about 2 months ago
view post
Post
3237
Getting WebRTC and Websockets right in python is very tricky. If you've tried to wrap an LLM in a real-time audio layer then you know what I'm talking about.

That's where FastRTC comes in! It makes WebRTC and Websocket streams super easy with minimal code and overhead.

Check out our org: hf.co/fastrtc
m-ricΒ 
posted an update about 2 months ago
view post
Post
4822
We now have a Deep Research for academia: SurveyX automatically writes academic surveys nearly indistinguishable from human-written ones πŸ”₯

Researchers from Beijing and Shanghai just published the first application of a deep research system to academia: their algorithm, given a question, can give you a survey of all papers on the subject.

To make a research survey, you generally follow two steps, preparation (collect and organize papers) and writing (outline creation, writing, polishing). Researchers followed the same two steps and automated them.

🎯 For the preparation part, a key part is find all the important references on the given subject.
Researchers first cast a wide net of all relevant papers. But then finding the really important ones is like distilling knowledge from a haystack of information. To solve this challenge, they built an β€œAttributeTree” object that structures key information from citations. Ablating these AttributeTrees significantly decreased structure and synthesis scores, so they were really useful!

πŸ“ For the writing part, key was to get a synthesis that's both short and true. This is not easy to get with LLMs! So they used methods like LLM-based deduplication to shorten the too verbose listings made by LLMs, and RAG to grab original quotes instead of made-up ones.

As a result, their system outperforms previous approaches by far!

As assessed by LLM-judges, the quality score os SurveyX even approaches this of human experts, with 4.59/5 vs 4.75/5 πŸ†

I advise you to read the paper, it's a great overview of the kind of assistants that we'll get in the short future! πŸ‘‰ SurveyX: Academic Survey Automation via Large Language Models (2502.14776)
Their website shows examples of generated surveys πŸ‘‰ http://www.surveyx.cn/
m-ricΒ 
posted an update about 2 months ago
view post
Post
3089
Less is More for Reasoning (LIMO): a 32B model fine-tuned with 817 examples can beat o1-preview on math reasoning! 🀯

Do we really need o1's huge RL procedure to see reasoning emerge? It seems not.
Researchers from Shanghai Jiaotong University just demonstrated that carefully selected examples can boost math performance in large language models using SFT β€”no huge datasets or RL procedures needed.

Their procedure allows Qwen2.5-32B-Instruct to jump from 6.5% to 57% on AIME and from 59% to 95% on MATH, while using only 1% of the data in previous approaches.

⚑ The Less-is-More Reasoning Hypothesis:
β€£ Minimal but precise examples that showcase optimal reasoning patterns matter more than sheer quantity
β€£ Pre-training knowledge plus sufficient computational resources at inference levels up math skills

➑️ Core techniques:
β€£ High-quality reasoning chains with self-verification steps
β€£ 817 handpicked problems that encourage deeper reasoning
β€£ Enough inference-time computation to allow extended reasoning

πŸ’ͺ Efficiency gains:
β€£ Only 817 examples instead of 100k+
β€£ 40.5% absolute improvement across 10 diverse benchmarks, outperforming models trained on 100x more data

This really challenges the notion that SFT leads to memorization rather than generalization! And opens up reasoning to GPU-poor researchers πŸš€

Read the full paper here πŸ‘‰Β  LIMO: Less is More for Reasoning (2502.03387)