The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published about 1 month ago • 212
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning Paper • 2508.21113 • Published Aug 28 • 108
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning Paper • 2508.16949 • Published Aug 23 • 22
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published Aug 22 • 149
NVIDIA Nemotron Collection Open, Production-ready Enterprise Models. Nvidia Open Model license. • 4 items • Updated 6 days ago • 62
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models Paper • 2508.10751 • Published Aug 14 • 27