VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks Paper • 2504.05118 • Published Apr 7 • 25
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published Feb 11 • 52
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning Paper • 2502.03275 • Published Feb 5 • 17
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 98
Running 563 563 Scaling test-time compute 📈 Enhance math problem solving by scaling test-time compute
kenhktsui/open-react-retrieval-multi-neg-result-new-kw Viewer • Updated Aug 7, 2023 • 25.2k • 21 • 3