DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 2 days ago • 69
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs Paper • 2503.11751 • Published 6 days ago • 15
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 16 items • Updated about 22 hours ago • 108
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper • 2502.07346 • Published Feb 11 • 51