SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published 2 days ago • 28
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization Paper • 2505.23387 • Published May 29 • 9
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • 11 days ago • 554
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published 10 days ago • 67
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper • 2507.06181 • Published 10 days ago • 39
ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention Paper • 2507.01004 • Published 17 days ago • 10
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published 18 days ago • 44
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published 24 days ago • 46
MiniMax-M1 Collection MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. • 6 items • Updated 16 days ago • 108
AceReason Collection Math and Code reasoning model trained through reinforcement learning (RL) • 7 items • Updated 8 days ago • 13
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 254
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 170
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 133
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning Paper • 2505.16400 • Published May 22 • 33