The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published 13 days ago • 120
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models Paper • 2505.14810 • Published 21 days ago • 61
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond Paper • 2503.21614 • Published Mar 27 • 39
LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid Paper • 2502.07563 • Published Feb 11 • 24
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published Jan 22 • 63