ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Paper • 2506.18896 • Published 30 days ago • 28
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published 22 days ago • 70
AceReason Collection Math and Code reasoning model trained through reinforcement learning (RL) • 7 items • Updated 1 day ago • 13
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy Paper • 2506.13284 • Published Jun 16 • 24
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs Paper • 2506.14245 • Published Jun 17 • 39