Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published 8 days ago • 109
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization Paper • 2508.14460 • Published 7 days ago • 78
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning Paper • 2508.03501 • Published 22 days ago • 53
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published 20 days ago • 165
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence Paper • 2507.21046 • Published 30 days ago • 79
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22 • 118
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published Jun 5 • 68
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • Jul 8 • 639
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published May 30 • 270
Rank1: Test-Time Compute for Reranking in Information Retrieval Paper • 2502.18418 • Published Feb 25 • 28
Generating Physically Stable and Buildable LEGO Designs from Text Paper • 2505.05469 • Published May 8 • 28