back-prop
/

Qwen2.5-GRPO-3B

Text Generation

Model card Files Files and versions

back-prop commited on Jun 2

Commit

2972ae6

·

verified ·

1 Parent(s): 79477d5

Create README.md

Files changed (1) hide show

README.md +52 -0

README.md ADDED Viewed

	@@ -0,0 +1,52 @@

+---
+base_model: Qwen/Qwen2.5-3B
+license: apache-2.0
+datasets:
+  - math
+metrics:
+  - accuracy
+pipeline_tag: text-generation
+language:
+  - en
+---
+# Qwen2.5-3B-GRPO-MATH-1EPOCH
+**Description:**
+A GRPO-fine-tuned version of Qwen2.5-3B-Instruct trained on the MATH dataset. It is optimized to produce more accurate contest-style math solutions.
+---
+## Usage
+```python
+from transformers import pipeline
+generator = pipeline(
+    "text-generation",
+    model="USERNAME/Qwen-2.5-3B-GRPO-Math",
+    device="cuda"
+)
+prompt = "Evaluate the integral ∫₀¹ x² dx."
+result = generator(
+    [{"role": "user", "content": prompt}],
+    max_new_tokens=50,
+    return_full_text=False
+)[0]
+print(result["generated_text"])
+````
+---
+## Citation
+```bibtex
+@article{sha2024deepseekmath,
+  title     = {DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models},
+  author    = {Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and … Guo, Daya},
+  journal   = {arXiv preprint arXiv:2402.03300},
+  year      = {2024},
+}
+```