File size: 2,691 Bytes
666f574
 
405c23d
 
 
666f574
 
405c23d
666f574
405c23d
666f574
405c23d
 
 
666f574
405c23d
666f574
405c23d
 
 
 
 
666f574
405c23d
666f574
405c23d
666f574
405c23d
666f574
405c23d
 
666f574
 
405c23d
 
 
 
 
 
666f574
405c23d
 
 
 
 
 
 
666f574
405c23d
666f574
405c23d
 
 
 
 
 
666f574
405c23d
 
 
 
 
 
666f574
405c23d
 
 
 
666f574
405c23d
 
 
 
 
666f574
405c23d
 
666f574
405c23d
666f574
405c23d
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
library_name: transformers
license: apache-2.0
base_model:
- Qwen/Qwen2.5-Math-7B
---

# Qwen2.5-Math-7B-Oat-Zero

## Links

- 📜 [Paper](https://github.com/sail-sg/understand-r1-zero/blob/main/understand-r1-zero.pdf)
- 💻 [GitHub](https://github.com/sail-sg/understand-r1-zero)
- 🤗 [Oat-Zero Collection](https://huggingface.co/collections/sail/oat-zero-understanding-r1-zero-like-training-67dcdb07b9f3eb05f1501c4a)

## Introduction

This model is trained by the minimalist R1-Zero recipe introduced in our paper: 
- **Algorithm**: Dr. DRPO
- **Data**: level 3-5 questions from MATH dataset
- **Base model**: [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
- **Template**: Qwen-Math

Evaluation results on widely used math benchmarks are shown below:

<img src="https://raw.githubusercontent.com/sail-sg/understand-r1-zero/refs/heads/main/assets/benchmark_table.png" width=100%/>

## Usage

```python
import vllm


def apply_qwen_math_template(question: str):
    return (
        "<|im_start|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<|im_end|>\n<|im_start|>user\n"
        + question
        + "<|im_end|>\n<|im_start|>assistant\n"
    )

def apply_r1_template(question: str):
    return (
        "A conversation between User and Assistant. The User asks a question, and the Assistant solves it. The Assistant first thinks about the reasoning process in the mind and then provides the User with the answer. "
        "The reasoning process is enclosed within <think> </think> and answer is enclosed within <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.\nUser: "
        + question
        + "\nAssistant: <think>"
    )

model_name = "sail/Qwen2.5-Math-7B-Oat-Zero"

sampling_params = vllm.SamplingParams(
    n=1,
    temperature=0,
    top_p=1,
    max_tokens=3000,
)

model = vllm.LLM(
    model_name,
    max_model_len=4096,
    dtype="bfloat16",
    enable_prefix_caching=True,
)

if "Llama-3.2-3B-Oat-Zero" in model_name:
    apply_template = apply_r1_template
else:
    apply_template = apply_qwen_math_template

prompts = [
    "How many positive whole-number divisors does 196 have?"
]
prompts = list(map(apply_template, prompts))
outputs = model.generate(prompts, sampling_params)

print(outputs)
```

## Citation

```latex
@misc{liu2025understanding,
  title={Understanding R1-Zero-Like Training: A Critical Perspective},
  author={Zichen Liu and Changyu Chen and Wenjun Li and Penghui Qi and Tianyu Pang and Chao Du and Wee Sun Lee and Min Lin},
  year={2025},
  howpublished={\url{https://github.com/sail-sg/understand-r1-zero}},
}
```