Qwen2.5-1.5B-GRPO-MATH-1EPOCH
Description:
A GRPO-fine-tuned version of Qwen2.5-1.5B trained on the MATH dataset.
Citation
@article{zhao2025learning,
title={Learning to Reason without External Rewards},
author={Zhao, Xuandong and Kang, Zhewei and Feng, Aosong and Levine, Sergey and Song, Dawn},
journal={arXiv preprint arXiv:2505.19590},
year={2025}
}
@article{sha2024deepseekmath,
title = {DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models},
author = {Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and โฆ Guo, Daya},
journal = {arXiv preprint arXiv:2402.03300},
year = {2024},
}
- Downloads last month
- 24
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for sunblaze-ucb/Qwen2.5-1.5B-GRPO-MATH-1EPOCH
Base model
Qwen/Qwen2.5-1.5B