Light-R1 Collection Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond • 7 items • Updated 15 days ago • 11
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 270