DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 128
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks Paper • 2504.05118 • Published Apr 7 • 25
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning Paper • 2504.08600 • Published Apr 11 • 29
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce Paper • 2504.11343 • Published Apr 15 • 16
OTC: Optimal Tool Calls via Reinforcement Learning Paper • 2504.14870 • Published about 1 month ago • 33
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models Paper • 2504.15716 • Published 30 days ago • 9
WebThinker: Empowering Large Reasoning Models with Deep Research Capability Paper • 2504.21776 • Published 21 days ago • 50
DeepCritic: Deliberate Critique with Large Language Models Paper • 2505.00662 • Published 20 days ago • 50
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published 10 days ago • 75
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Paper • 2505.09343 • Published 8 days ago • 55
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published 3 days ago • 21
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning Paper • 2505.11896 • Published 5 days ago • 50