Efficient Agents: Building Effective Agents While Reducing Cost Paper • 2508.02694 • Published 29 days ago • 81
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction Paper • 2508.03613 • Published 17 days ago • 11
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search Paper • 2508.02091 • Published 19 days ago • 13
Tool-integrated Reinforcement Learning for Repo Deep Search Paper • 2508.03012 • Published 18 days ago • 18
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference Paper • 2508.02193 • Published 19 days ago • 126
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks Paper • 2507.19634 • Published 28 days ago • 9
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity Paper • 2507.21848 • Published 24 days ago • 7
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding Paper • 2507.19427 • Published 28 days ago • 18
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning Paper • 2507.14111 • Published Jul 18 • 22
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge Paper • 2507.21183 • Published 27 days ago • 13
Diversity-Enhanced Reasoning for Subjective Questions Paper • 2507.20187 • Published 26 days ago • 23
Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning Paper • 2507.21049 • Published 25 days ago • 40
Goal Alignment in LLM-Based User Simulators for Conversational AI Paper • 2507.20152 • Published 27 days ago • 4
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty Paper • 2507.16806 • Published Jul 22 • 6
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities Paper • 2507.19766 • Published 28 days ago • 14