back-prop
/

Qwen2.5-GRPO-3B

Text Generation

Model card Files Files and versions

Qwen2.5-GRPO-3B / README.md

back-prop's picture

Update README.md

451c63c verified 5 months ago

|

history blame contribute delete

628 Bytes

	---
	base_model: Qwen/Qwen2.5-3B
	license: apache-2.0
	datasets:
	- math
	metrics:
	- accuracy
	pipeline_tag: text-generation
	language:
	- en
	---

	# Qwen2.5-3B-GRPO-MATH-1EPOCH

	Description:

	A GRPO-fine-tuned version of Qwen2.5-3B trained on the MATH dataset.

	---

	## Citation

	```bibtex
	@article{sha2024deepseekmath,
	title = {DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models},
	author = {Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and … Guo, Daya},
	journal = {arXiv preprint arXiv:2402.03300},
	year = {2024},
	}
	```