Qwen2.5-GRPO-3B / README.md

back-prop

Update README.md

451c63c verified 5 months ago

preview code

raw

history blame contribute delete

628 Bytes

metadata

base_model: Qwen/Qwen2.5-3B
license: apache-2.0
datasets:
  - math
metrics:
  - accuracy
pipeline_tag: text-generation
language:
  - en

Qwen2.5-3B-GRPO-MATH-1EPOCH

Description:

A GRPO-fine-tuned version of Qwen2.5-3B trained on the MATH dataset.

Citation

@article{sha2024deepseekmath,
  title     = {DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models},
  author    = {Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and … Guo, Daya},
  journal   = {arXiv preprint arXiv:2402.03300},
  year      = {2024},
}