Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published 4 days ago • 38
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 14 days ago • 40
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 6 days ago • 98
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 17 items • Updated 4 days ago • 111
OpenR1-Math Collection Dataset and SFT model distilled from DeepSeek-R1. Check out our blog post for more details: https://huggingface.co/blog/open-r1/update-2 • 3 items • Updated 13 days ago • 7
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 208
view article Article Introducing smolagents: simple agents that write actions in code. Dec 31, 2024 • 915
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16 • 71