qiuxi337 commited on
Commit
106f27c
·
verified ·
1 Parent(s): 2632a6c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +122 -112
README.md CHANGED
@@ -1,112 +1,122 @@
1
- # Gemma-3-12B-GRPO trained with GRPO via LoRA
2
-
3
- Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments with LoRA using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-A6000 Ada (48GB VRAM).
4
-
5
- ## Evaluation Results
6
-
7
- The model is evaluated on four benchmark datasets: MMLU, MMLU-Pro, CMMU, GSM8K, GPQA. The experimental results are summarized in Table 1, with comprehensive analyses provided in the Detailed Results section.
8
-
9
- <center><strong>Tab.1 Evaluation results.</strong></center>
10
-
11
- | Dataset | Gemma-3-12b-it | Gemma3-12b-GRPO |
12
- | :-----: | :------------: | :-------------: |
13
- | MMLU | 65.51 | 70.13 |
14
- | MMLU-Pro | 60.17 | 59.99 |
15
- | CMMLU | 54.81 | 57.07 |
16
- | GSM8K | 91.58 | 91.81 |
17
- | GPQA | 34.98 | 34.23 |
18
-
19
- ## Requirements
20
-
21
- ```shell
22
- pip install torch==2.6.0 torchaudio==2.6.0 torchvision==0.21.0 -i --index-url https://download.pytorch.org/whl/cu124
23
- pip install transformer vllm bitsandbytes peft
24
- pip install flash-attn --no-build-isolation
25
- ```
26
-
27
- ## Run with vLLM
28
-
29
- You can use the following script to run with vLLM.
30
-
31
- ```shell
32
- vllm serve qiuxi337/gemma-3-12b-it-grpo \
33
- --gpu-memory-utilization 0.85 \
34
- --max-model-len 4096 \
35
- --served-model-name gemma3-12b-grpo \
36
- --api-key your_api_key
37
- ```
38
-
39
- ## Detail Results
40
-
41
- ### MMLU
42
-
43
- ![mmlu](asserts/mmlu.png)
44
-
45
- **Fig.1 The results on the MMLU benchmark.**
46
-
47
- ![MMLU_Humanities](asserts/mmlu_humanities.png)
48
-
49
- **Fig.2 The results on the MMLU-Humanities**
50
-
51
- ![MMLU_Social_Science](asserts/mmlu_social_science.png)
52
-
53
- **Fig.3 The results on the MMLU-Social Science**
54
-
55
- ![MMLU_STEM](asserts/mmlu_stem.png)
56
-
57
- **Fig.4 The results on the MMLU-STEM**
58
-
59
- ![MMLU_others](asserts/mmlu_other.png)
60
-
61
- **Fig.5 The results on the MMLU-Other**
62
-
63
- ### MMLU-Pro
64
-
65
- ![MMLU_Pro](asserts/mmlu_pro.png)
66
-
67
- **Fig.6 The results on the MMLU-Pro**
68
-
69
- ### CMMLU
70
-
71
- ![cmmlu](asserts/cmmlu.png)
72
-
73
- **Fig.7 The results on the CMMLU benchmark.**
74
-
75
- ![CMMLU_Humanities](asserts/cmmlu_humanities.png)
76
-
77
- **Fig.8 The results on the CMMLU-Humanities**
78
-
79
- ![CMMLU_Social_Science](asserts/cmmlu_social_science.png)
80
-
81
- **Fig.9 The results on the CMMLU-Social Science**
82
-
83
- ![CMMLU_STEM](asserts/cmmlu_stem.png)
84
-
85
- **Fig.10 The results on the CMMLU-STEM**
86
-
87
- ![CMMLU_others](asserts/cmmlu_other.png)
88
-
89
- **Fig.11 The results on the CMMLU-Other**
90
-
91
- ![CMMLU_China_Specific](asserts/cmmlu_china_specific.png)
92
-
93
- **Fig.12 The results on the CMMLU-China Specific**
94
-
95
- ## Acknowledge
96
-
97
- [Gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it)
98
-
99
- [Unlsoth](https://github.com/unslothai/unsloth)
100
-
101
- ## Citation
102
-
103
- ```ini
104
- @software{Qiu_Open-Medical-R1,
105
- author = {Qiu, Zhongxi and Zhang, Zhang and Hu, Yan and Li, Heng and Liu, Jiang},
106
- license = {MIT},
107
- title = {{Open-Medical-R1}},
108
- url = {https://github.com/Qsingle/open-medical-r1},
109
- version = {0.1}
110
- }
111
- ```
112
-
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ datasets:
4
+ - GBaker/MedQA-USMLE-4-options-hf
5
+ base_model:
6
+ - google/gemma-3-12b-it
7
+ library_name: transformers
8
+ tags:
9
+ - biology
10
+ - medical
11
+ ---
12
+ # Gemma-3-12B-GRPO trained with GRPO via LoRA
13
+
14
+ Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments with LoRA using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-A6000 Ada (48GB VRAM).
15
+
16
+ ## Evaluation Results
17
+
18
+ The model is evaluated on four benchmark datasets: MMLU, MMLU-Pro, CMMU, GSM8K, GPQA. The experimental results are summarized in Table 1, with comprehensive analyses provided in the Detailed Results section.
19
+
20
+ <center><strong>Tab.1 Evaluation results.</strong></center>
21
+
22
+ | Dataset | Gemma-3-12b-it | Gemma3-12b-GRPO |
23
+ | :-----: | :------------: | :-------------: |
24
+ | MMLU | 65.51 | 70.13 |
25
+ | MMLU-Pro | 60.17 | 59.99 |
26
+ | CMMLU | 54.81 | 57.07 |
27
+ | GSM8K | 91.58 | 91.81 |
28
+ | GPQA | 34.98 | 34.23 |
29
+
30
+ ## Requirements
31
+
32
+ ```shell
33
+ pip install torch==2.6.0 torchaudio==2.6.0 torchvision==0.21.0 -i --index-url https://download.pytorch.org/whl/cu124
34
+ pip install transformer vllm bitsandbytes peft
35
+ pip install flash-attn --no-build-isolation
36
+ ```
37
+
38
+ ## Run with vLLM
39
+
40
+ You can use the following script to run with vLLM.
41
+
42
+ ```shell
43
+ vllm serve qiuxi337/gemma-3-12b-it-grpo \
44
+ --gpu-memory-utilization 0.85 \
45
+ --max-model-len 4096 \
46
+ --served-model-name gemma3-12b-grpo \
47
+ --api-key your_api_key
48
+ ```
49
+
50
+ ## Detail Results
51
+
52
+ ### MMLU
53
+
54
+ ![mmlu](asserts/mmlu.png)
55
+
56
+ **Fig.1 The results on the MMLU benchmark.**
57
+
58
+ ![MMLU_Humanities](asserts/mmlu_humanities.png)
59
+
60
+ **Fig.2 The results on the MMLU-Humanities**
61
+
62
+ ![MMLU_Social_Science](asserts/mmlu_social_science.png)
63
+
64
+ **Fig.3 The results on the MMLU-Social Science**
65
+
66
+ ![MMLU_STEM](asserts/mmlu_stem.png)
67
+
68
+ **Fig.4 The results on the MMLU-STEM**
69
+
70
+ ![MMLU_others](asserts/mmlu_other.png)
71
+
72
+ **Fig.5 The results on the MMLU-Other**
73
+
74
+ ### MMLU-Pro
75
+
76
+ ![MMLU_Pro](asserts/mmlu_pro.png)
77
+
78
+ **Fig.6 The results on the MMLU-Pro**
79
+
80
+ ### CMMLU
81
+
82
+ ![cmmlu](asserts/cmmlu.png)
83
+
84
+ **Fig.7 The results on the CMMLU benchmark.**
85
+
86
+ ![CMMLU_Humanities](asserts/cmmlu_humanities.png)
87
+
88
+ **Fig.8 The results on the CMMLU-Humanities**
89
+
90
+ ![CMMLU_Social_Science](asserts/cmmlu_social_science.png)
91
+
92
+ **Fig.9 The results on the CMMLU-Social Science**
93
+
94
+ ![CMMLU_STEM](asserts/cmmlu_stem.png)
95
+
96
+ **Fig.10 The results on the CMMLU-STEM**
97
+
98
+ ![CMMLU_others](asserts/cmmlu_other.png)
99
+
100
+ **Fig.11 The results on the CMMLU-Other**
101
+
102
+ ![CMMLU_China_Specific](asserts/cmmlu_china_specific.png)
103
+
104
+ **Fig.12 The results on the CMMLU-China Specific**
105
+
106
+ ## Acknowledge
107
+
108
+ [Gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it)
109
+
110
+ [Unlsoth](https://github.com/unslothai/unsloth)
111
+
112
+ ## Citation
113
+
114
+ ```ini
115
+ @software{Qiu_Open-Medical-R1,
116
+ author = {Qiu, Zhongxi and Zhang, Zhang and Hu, Yan and Li, Heng and Liu, Jiang},
117
+ license = {MIT},
118
+ title = {{Open-Medical-R1}},
119
+ url = {https://github.com/Qsingle/open-medical-r1},
120
+ version = {0.1}
121
+ }
122
+ ```