Sherlock: Self-Correcting Reasoning in Vision-Language Models Paper • 2505.22651 • Published 8 days ago • 50
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing Paper • 2505.21600 • Published 9 days ago • 68
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published 8 days ago • 114