OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published 2 days ago • 30
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering Paper • 2506.09050 • Published 17 days ago • 7
Generative AI Act II: Test Time Scaling Drives Cognition Engineering Paper • 2504.13828 • Published Apr 18 • 17
Generative AI Act II: Test Time Scaling Drives Cognition Engineering Paper • 2504.13828 • Published Apr 18 • 17
Generative AI Act II: Test Time Scaling Drives Cognition Engineering Paper • 2504.13828 • Published Apr 18 • 17 • 3
Generative AI Act II: Test Time Scaling Drives Cognition Engineering Paper • 2504.13828 • Published Apr 18 • 17 • 3
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper • 2504.02587 • Published Apr 3 • 30
Evaluating Mathematical Reasoning Beyond Accuracy Paper • 2404.05692 • Published Apr 8, 2024 • 2
O1 Replication Journey: A Strategic Progress Report -- Part 1 Paper • 2410.18982 • Published Oct 8, 2024 • 3
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World Paper • 2412.17589 • Published Dec 23, 2024 • 13
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models Paper • 2501.03124 • Published Jan 6 • 14
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published Nov 25, 2024 • 49