Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs Paper • 2509.25779 • Published Sep 30, 2025 • 18
HardTests: Synthesizing High-Quality Test Cases for LLM Coding Paper • 2505.24098 • Published May 30, 2025 • 43
Search-R1 Collection Preliminary checkpoints with outcome-only RL. • 15 items • Updated Aug 12, 2025 • 13