Qwen2.5-GRPO-3B / README.md
back-prop's picture
Update README.md
451c63c verified
---
base_model: Qwen/Qwen2.5-3B
license: apache-2.0
datasets:
- math
metrics:
- accuracy
pipeline_tag: text-generation
language:
- en
---
# Qwen2.5-3B-GRPO-MATH-1EPOCH
**Description:**
A GRPO-fine-tuned version of Qwen2.5-3B trained on the MATH dataset.
---
## Citation
```bibtex
@article{sha2024deepseekmath,
title = {DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models},
author = {Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and … Guo, Daya},
journal = {arXiv preprint arXiv:2402.03300},
year = {2024},
}
```