Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute Paper • 2506.15882 • Published Jun 18 • 2
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19 • 126
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR Paper • 2507.15778 • Published Jul 21 • 19
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning Paper • 2507.19457 • Published 30 days ago • 24