Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents Paper • 2505.22954 • Published May 29 • 12
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments Paper • 2504.03160 • Published Apr 4 • 1
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 406
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 39
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 87