|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
base_model: |
|
- Qwen/Qwen2.5-Math-7B |
|
--- |
|
|
|
# Qwen2.5-Math-7B-Oat-Zero |
|
|
|
## Links |
|
|
|
- π [Paper](https://github.com/sail-sg/understand-r1-zero/blob/main/understand-r1-zero.pdf) |
|
- π» [GitHub](https://github.com/sail-sg/understand-r1-zero) |
|
- π€ [Oat-Zero Collection](https://huggingface.co/collections/sail/oat-zero-understanding-r1-zero-like-training-67dcdb07b9f3eb05f1501c4a) |
|
|
|
## Introduction |
|
|
|
This model is trained by the minimalist R1-Zero recipe introduced in our paper: |
|
- **Algorithm**: Dr. DRPO |
|
- **Data**: level 3-5 questions from MATH dataset |
|
- **Base model**: [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) |
|
- **Template**: Qwen-Math |
|
|
|
Evaluation results on widely used math benchmarks are shown below: |
|
|
|
<img src="https://raw.githubusercontent.com/sail-sg/understand-r1-zero/refs/heads/main/assets/benchmark_table.png" width=100%/> |
|
|
|
## Usage |
|
|
|
```python |
|
import vllm |
|
|
|
|
|
def apply_qwen_math_template(question: str): |
|
return ( |
|
"<|im_start|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<|im_end|>\n<|im_start|>user\n" |
|
+ question |
|
+ "<|im_end|>\n<|im_start|>assistant\n" |
|
) |
|
|
|
def apply_r1_template(question: str): |
|
return ( |
|
"A conversation between User and Assistant. The User asks a question, and the Assistant solves it. The Assistant first thinks about the reasoning process in the mind and then provides the User with the answer. " |
|
"The reasoning process is enclosed within <think> </think> and answer is enclosed within <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.\nUser: " |
|
+ question |
|
+ "\nAssistant: <think>" |
|
) |
|
|
|
model_name = "sail/Qwen2.5-Math-7B-Oat-Zero" |
|
|
|
sampling_params = vllm.SamplingParams( |
|
n=1, |
|
temperature=0, |
|
top_p=1, |
|
max_tokens=3000, |
|
) |
|
|
|
model = vllm.LLM( |
|
model_name, |
|
max_model_len=4096, |
|
dtype="bfloat16", |
|
enable_prefix_caching=True, |
|
) |
|
|
|
if "Llama-3.2-3B-Oat-Zero" in model_name: |
|
apply_template = apply_r1_template |
|
else: |
|
apply_template = apply_qwen_math_template |
|
|
|
prompts = [ |
|
"How many positive whole-number divisors does 196 have?" |
|
] |
|
prompts = list(map(apply_template, prompts)) |
|
outputs = model.generate(prompts, sampling_params) |
|
|
|
print(outputs) |
|
``` |
|
|
|
## Citation |
|
|
|
```latex |
|
@misc{liu2025understanding, |
|
title={Understanding R1-Zero-Like Training: A Critical Perspective}, |
|
author={Zichen Liu and Changyu Chen and Wenjun Li and Penghui Qi and Tianyu Pang and Chao Du and Wee Sun Lee and Min Lin}, |
|
year={2025}, |
|
howpublished={\url{https://github.com/sail-sg/understand-r1-zero}}, |
|
} |
|
``` |
|
|