File size: 2,691 Bytes
666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d 666f574 405c23d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
library_name: transformers
license: apache-2.0
base_model:
- Qwen/Qwen2.5-Math-7B
---
# Qwen2.5-Math-7B-Oat-Zero
## Links
- 📜 [Paper](https://github.com/sail-sg/understand-r1-zero/blob/main/understand-r1-zero.pdf)
- 💻 [GitHub](https://github.com/sail-sg/understand-r1-zero)
- 🤗 [Oat-Zero Collection](https://huggingface.co/collections/sail/oat-zero-understanding-r1-zero-like-training-67dcdb07b9f3eb05f1501c4a)
## Introduction
This model is trained by the minimalist R1-Zero recipe introduced in our paper:
- **Algorithm**: Dr. DRPO
- **Data**: level 3-5 questions from MATH dataset
- **Base model**: [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
- **Template**: Qwen-Math
Evaluation results on widely used math benchmarks are shown below:
<img src="https://raw.githubusercontent.com/sail-sg/understand-r1-zero/refs/heads/main/assets/benchmark_table.png" width=100%/>
## Usage
```python
import vllm
def apply_qwen_math_template(question: str):
return (
"<|im_start|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<|im_end|>\n<|im_start|>user\n"
+ question
+ "<|im_end|>\n<|im_start|>assistant\n"
)
def apply_r1_template(question: str):
return (
"A conversation between User and Assistant. The User asks a question, and the Assistant solves it. The Assistant first thinks about the reasoning process in the mind and then provides the User with the answer. "
"The reasoning process is enclosed within <think> </think> and answer is enclosed within <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.\nUser: "
+ question
+ "\nAssistant: <think>"
)
model_name = "sail/Qwen2.5-Math-7B-Oat-Zero"
sampling_params = vllm.SamplingParams(
n=1,
temperature=0,
top_p=1,
max_tokens=3000,
)
model = vllm.LLM(
model_name,
max_model_len=4096,
dtype="bfloat16",
enable_prefix_caching=True,
)
if "Llama-3.2-3B-Oat-Zero" in model_name:
apply_template = apply_r1_template
else:
apply_template = apply_qwen_math_template
prompts = [
"How many positive whole-number divisors does 196 have?"
]
prompts = list(map(apply_template, prompts))
outputs = model.generate(prompts, sampling_params)
print(outputs)
```
## Citation
```latex
@misc{liu2025understanding,
title={Understanding R1-Zero-Like Training: A Critical Perspective},
author={Zichen Liu and Changyu Chen and Wenjun Li and Penghui Qi and Tianyu Pang and Chao Du and Wee Sun Lee and Min Lin},
year={2025},
howpublished={\url{https://github.com/sail-sg/understand-r1-zero}},
}
```
|