Safetensors
qwen2
LRM
hybrid_reasoning
efficient_reasoning

Improve model card: Add pipeline tag, library name, and Github link

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +93 -19
README.md CHANGED
@@ -1,32 +1,35 @@
1
  ---
2
- license: mit
3
- datasets:
4
- - agentica-org/DeepScaleR-Preview-Dataset
5
  base_model:
6
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 
 
 
7
  tags:
8
  - LRM
9
  - hybrid_reasoning
10
  - efficient_reasoning
 
 
11
  ---
12
 
13
  # AdaptThink: LLM Can Learn When to Think
14
 
15
  <p align="center">
16
- 🤗 <a href="https://huggingface.co/collections/THU-KEG/adaptthink-682a1059aa9f5102c4fa0470" target="_blank">HF Collections</a> • 💻 <a href="" target="_blank">Github Repo</a> • 📃 <a href="https://arxiv.org/abs/2505.13417" target="_blank">Paper</a>
17
  </p>
18
 
19
  ## 🔍 Table of Contents
20
  - [🤖️ AdaptThink](#adapt_think)
21
  - [⚙️ Released Models](#model)
 
22
  - [📊 Evaluation](#evaluation)
 
23
  - [📝 Citation](#citation)
24
 
25
  <a name="adapt_think"></a>
26
  ## 🤖️ AdaptThink
27
- We present **AdapThink**, a novel reinforcement learning (RL) algorithm that enables reasoning models to adaptively choose between **Thinking** and **NoThinking** modes according to the difficulty of each input problem, thereby achieving automatic hybrid reasoning. Specifically, the model engages in thinking only when the problem is determined to be challenging; for other simple question, it will bypass the thinking process and directly produce a concise final solution. This approach substantially reduces inference costs while further improving overall performance.
28
-
29
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/JaeJiBwLkcwAuexRAkLX5.png)
30
 
31
 
32
 
@@ -34,7 +37,7 @@ We present **AdapThink**, a novel reinforcement learning (RL) algorithm that ena
34
  ## ⚙️ Released Models
35
 
36
  ### All Available Datasets and Models
37
- We apply the AdaptThink algorithm on DeepSeek-R1-Distill-Qwen-1.5B with $\delta$ from 0 to 0.1, and DeepSeek-R1-Distill-Qwen-7B with $\delta=0.05$. A larger $\large$ results in a higher proportion of NoThinking responses, which reduces more inference costs but also diminish the resultant improvement in accuracy.
38
 
39
  All the trained models are available on HuggingFace.
40
 
@@ -50,25 +53,97 @@ All the trained models are available on HuggingFace.
50
  | AdaptThink-7B-delta0.05 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-7B-delta0.05) |
51
 
52
  <a name="training"></a>
 
 
 
53
 
54
- ## 📊 Evaluation Results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
  We list our evaluation results as follows:
57
- ##### 1. Comparison with existing methods for efficient reasoning on mathematics datasets
 
58
 
59
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/ZLV8ZfEet1dp-4jyzBxiG.png)
 
60
 
61
- ##### 2. Nothinking responses ratio and accuracy across different difficulty levels on MATH500
 
62
 
63
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/GUNfW9qO2aaT9_lo1XXPf.png)
 
64
 
65
- ##### 3. Comparison of different $\delta$ values
 
 
 
66
 
67
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/RXrXwxVSAYlR3-_t0GUwV.png)
68
 
69
- ##### 4. Evaluation results on MMLU
 
70
 
71
- <img width="1000" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/19K2u6PNmYz3gx3JnHgn4.png">
72
 
73
  <a name="citation"></a>
74
  ## 📝 Citation
@@ -83,5 +158,4 @@ If you find our work useful, please consider citing LongReward:
83
  url={https://arxiv.org/abs/2505.13417}
84
  year={2025}
85
  }
86
- ```
87
-
 
1
  ---
 
 
 
2
  base_model:
3
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
4
+ datasets:
5
+ - agentica-org/DeepScaleR-Preview-Dataset
6
+ license: mit
7
  tags:
8
  - LRM
9
  - hybrid_reasoning
10
  - efficient_reasoning
11
+ pipeline_tag: text-generation
12
+ library_name: transformers
13
  ---
14
 
15
  # AdaptThink: LLM Can Learn When to Think
16
 
17
  <p align="center">
18
+ 🤗 <a href="https://huggingface.co/collections/THU-KEG/adaptthink-682a1059aa9f5102c4fa0470" target="_blank">HF Collections</a> • 💻 <a href="https://github.com/THU-KEG/AdaptThink" target="_blank">Github Repo</a> • 📃 <a href="https://arxiv.org/abs/2505.13417" target="_blank">Paper</a>
19
  </p>
20
 
21
  ## 🔍 Table of Contents
22
  - [🤖️ AdaptThink](#adapt_think)
23
  - [⚙️ Released Models](#model)
24
+ - [🔥 Training](#training)
25
  - [📊 Evaluation](#evaluation)
26
+ - [🧐 Cases](#cases)
27
  - [📝 Citation](#citation)
28
 
29
  <a name="adapt_think"></a>
30
  ## 🤖️ AdaptThink
31
+ We present **AdapThink**, a novel reinforcement learning (RL) algorithm that enables reasoning models to adaptively choose between **Thinking** and **NoThinking** modes according to the difficulty of each input problem, thereby achieving automatic hybrid reasoning. Specifically, the model engages in thinking only when the problem is determined to be challenging; for other simple questions, it will bypass the thinking process and directly produce a concise final solution. This approach substantially reduces inference costs while further improving overall performance.
32
+ <img width="1327" alt="image" src="https://github.com/user-attachments/assets/35f62f31-3210-4f11-98cb-06d73e0231e8" />
 
33
 
34
 
35
 
 
37
  ## ⚙️ Released Models
38
 
39
  ### All Available Datasets and Models
40
+ We apply the AdaptThink algorithm on DeepSeek-R1-Distill-Qwen-1.5B with $\delta$ from 0 to 0.1, and DeepSeek-R1-Distill-Qwen-7B with $\delta=0.05$. A larger $\large$ results in a higher proportion of NoThinking responses, which reduces more inference costs but also diminishes the resultant improvement in accuracy.
41
 
42
  All the trained models are available on HuggingFace.
43
 
 
53
  | AdaptThink-7B-delta0.05 | [🤗 HF Repo](https://huggingface.co/THU-KEG/AdaptThink-7B-delta0.05) |
54
 
55
  <a name="training"></a>
56
+ ## 🔥 Training
57
+
58
+ Our training code is based on [VeRL](https://github.com/volcengine/verl) framework.
59
 
60
+ ### 1. Creating Environment
61
+ We use [vLLM](https://github.com/vllm-project/vllm) 0.8.2, which supports [flash-attention](https://github.com/Dao-AILab/flash-attention).
62
+ ```
63
+ conda create -n adapt_think python=3.10
64
+ pip install -r requirements.txt
65
+ pip install flash-attn --no-build-isolation
66
+ ```
67
+
68
+ ### 2. Check the chat template in HF models
69
+ After you download DeepSeek models, you should check `chat_template` in `tokenizer_config.json` to ensure the template ends with `<|Assistant|><think>\
70
+ `, otherwise there will be bugs when running our code.
71
+
72
+ ### 3. Pre-sampling from reference models
73
+ First, we need to pre-sample multiple responses from the reference model for each training problem to evaluate its instance-level accuracy. The sampling process will take several hours. For convenience, we have released our post-processed results in `./data/train/ref_results`, which can be directly used for training.
74
+ ```
75
+ # Initialize VLLM server. Set tensor_parallel_size to 8 for 7B model
76
+ vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --served_model_name DeepSeek-R1-Distill-Qwen-1.5B --tensor_parallel_size 4
77
+
78
+ # Sampling 16 responses for each training problem.
79
+ python src/presampling_ref_responses.py --K 16 --dataset_path ./data/train/deepscaler.json --model_name DeepSeek-R1-Distill-Qwen-1.5B --max_tokens 16384
80
+
81
+ # Postprocess to get instance-level accuracy
82
+ python src/postprocess_ref_results.py --input_path ./data/train/ref_presampling/DeepSeek-R1-Distill-Qwen-1.5B_deepscaler_n0_K16_len16384.json --output_path ./data/train/ref_results/DeepSeek-R1-Distill-Qwen-1.5B_deepscaler_K16_len16384.json
83
+ ```
84
+
85
+ ### 4. Preprocess training and test Datasets
86
+ ```
87
+ bash scripts/preprocess_dataset.sh
88
+ ```
89
+
90
+ ### 5. Training
91
+ The training context size, batch size, and the learning rate are set to 16K, 128, and 2e-6, respectively. We train the models for 1 epoch, which is 314 steps in total. For the 1.5B model, we use one 8\*H800 node and cost about 32 hours. For the 7B model, we use four 8\*H800 nodes and cost about 28 hours. Finally, we select the checkpoints on 300 and 150 steps for the 1.5B and 7B models, respectively, where the models' accuracy and response lengths achieve a good balance.
92
+
93
+ To facilitate the training process, you can set a larger learning rate, such as 5e-5. However, it may make the training more unstable.
94
+ ```
95
+ # 1.5b, single-node
96
+ bash scripts/run_adapt_think_1.5b_deepscaler_16k_delta0.05_btz128_lr2e-6.sh
97
+
98
+ # 7b, single-node
99
+ bash scripts/run_adapt_think_7b_deepscaler_16k_delta0.05_btz128_lr2e-6.sh
100
+
101
+ # 7b, multi-node
102
+ bash submit_mpi.sh scripts/run_adapt_think_7b_deepscaler_16k_delta0.05_btz128_lr2e-6_multinode.sh
103
+ ```
104
+
105
+
106
+ <a name="evaluation"></a>
107
+ ## 📊 Evaluation
108
+ During training, VeRL will automatically evaluate on you selected test sets for every `trainer.test_freq` step.
109
+
110
+ We also provide additional scripts for evaluation.
111
+
112
+ ```
113
+ # convert checkpoint to HF model
114
+ bash scripts/convert_to_hf.sh
115
+
116
+ # eval
117
+ bash scripts/run_eval_verl_hf.sh
118
+ ```
119
+
120
+ You can also evaluate downloaded HF models by running:
121
+ ```
122
+ bash scripts/run_eval_hf.sh
123
+ ```
124
 
125
  We list our evaluation results as follows:
126
+ #### 1. Comparison with existing methods for efficient reasoning on mathematics datasets
127
+ <img width="1447" alt="image" src="https://github.com/user-attachments/assets/53592ec3-17d9-4c4b-99ee-1868b5c82238" />
128
 
129
+ #### 2. Nothinking responses ratio and accuracy across different difficulty levels on MATH500
130
+ <img width="1462" alt="image" src="https://github.com/user-attachments/assets/cc2de266-b67a-47ab-835d-9bce922b13fc" />
131
 
132
+ #### 3. Comparison of different $\delta$ values
133
+ <img width="1444" alt="image" src="https://github.com/user-attachments/assets/41c86f73-68f8-4d71-ac75-2033c43b964b" />
134
 
135
+ #### 4. Evaluation results on MMLU
136
+ <img width="500" alt="image" src="https://github.com/user-attachments/assets/fdd20adc-b879-4105-8420-0851944c507f" />
137
 
138
+ ## 🧐 Cases
139
+ ### Simple problem
140
+ ![image](https://github.com/user-attachments/assets/1f6aaa1c-a1c8-4d49-92c5-2e1b219a643a)
141
+ ![image](https://github.com/user-attachments/assets/1c3d2dbc-5a98-4066-a8a8-90afff0fc7a3)
142
 
 
143
 
144
+ ### Difficult problem
145
+ ![image](https://github.com/user-attachments/assets/500a0377-3be4-48a2-b5a0-a98c7d228a30)
146
 
 
147
 
148
  <a name="citation"></a>
149
  ## 📝 Citation
 
158
  url={https://arxiv.org/abs/2505.13417}
159
  year={2025}
160
  }
161
+ ```