Qwen2.5-1.5B-GRPO-MATH-1EPOCH

Description:

A GRPO-fine-tuned version of Qwen2.5-1.5B trained on the MATH dataset.


Citation

@article{zhao2025learning,
  title={Learning to Reason without External Rewards},
  author={Zhao, Xuandong and Kang, Zhewei and Feng, Aosong and Levine, Sergey and Song, Dawn},
  journal={arXiv preprint arXiv:2505.19590},
  year={2025}
}

@article{sha2024deepseekmath,
  title     = {DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models},
  author    = {Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and โ€ฆ Guo, Daya},
  journal   = {arXiv preprint arXiv:2402.03300},
  year      = {2024},
}
Downloads last month
24
Safetensors
Model size
1.78B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sunblaze-ucb/Qwen2.5-1.5B-GRPO-MATH-1EPOCH

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(142)
this model