FastCuRL Overview
We release FastCuRL-1.5B-Preview, a slow-thinking reasoning model that outperforms the previous SoTA DeepScaleR-1.5B-Preview with 50% training steps! We adapt a novel curriculum-guided iterative lengthening reinforcement learning to the DeepSeek-R1-Distill-Qwen-1.5B and observe continuous performance improvement as training steps increase. To better reproduce our work and advance research progress, we open-source our code, model, and data.
Code: https://github.com/nick7nlp/FastCuRL
Paper: https://arxiv.org/abs/2503.17287
These files contain the first, second, and third stages of FastCuRL-1.5B-Preview.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.