REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published Jan 4 • 100
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 135
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks Paper • 2504.15521 • Published Apr 22 • 64