|
|
--- |
|
|
base_model: Qwen/Qwen2.5-3B |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- math |
|
|
metrics: |
|
|
- accuracy |
|
|
pipeline_tag: text-generation |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Qwen2.5-3B-GRPO-MATH-1EPOCH |
|
|
|
|
|
**Description:** |
|
|
|
|
|
A GRPO-fine-tuned version of Qwen2.5-3B trained on the MATH dataset. |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{sha2024deepseekmath, |
|
|
title = {DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}, |
|
|
author = {Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and … Guo, Daya}, |
|
|
journal = {arXiv preprint arXiv:2402.03300}, |
|
|
year = {2024}, |
|
|
} |
|
|
``` |
|
|
|
|
|
|