Nickyang/FastCuRL-Stage-123

FastCuRL Overview

We release FastCuRL-1.5B-Preview, a slow-thinking reasoning model that outperforms the previous SoTA DeepScaleR-1.5B-Preview with 50% training steps! We adapt a novel curriculum-guided iterative lengthening reinforcement learning to the DeepSeek-R1-Distill-Qwen-1.5B and observe continuous performance improvement as training steps increase. To better reproduce our work and advance research progress, we open-source our code, model, and data.

Code: https://github.com/nick7nlp/FastCuRL

Paper: https://arxiv.org/abs/2503.17287

These files contain the first, second, and third stages of FastCuRL-1.5B-Preview.