qiuxi337
/

gemma-3-12b-it-grpo

@@ -1,112 +1,122 @@
-# Gemma-3-12B-GRPO trained with GRPO via LoRA
-Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments with LoRA using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-A6000 Ada (48GB VRAM).
-## Evaluation Results
-The model is evaluated on four benchmark datasets: MMLU, MMLU-Pro, CMMU, GSM8K, GPQA. The experimental results are summarized in Table 1, with comprehensive analyses provided in the Detailed Results section.
-<center><strong>Tab.1 Evaluation results.</strong></center>
-| Dataset | Gemma-3-12b-it | Gemma3-12b-GRPO |
-| :-----: | :------------: | :-------------: |
-|  MMLU   |     65.51      |     70.13      |
-|  MMLU-Pro   |     60.17      |     59.99      |
-|  CMMLU   |     54.81     |     57.07     |
-|  GSM8K   |     91.58     |     91.81     |
-|  GPQA   |     34.98     |     34.23     |
-## Requirements
-```shell
-pip install torch==2.6.0 torchaudio==2.6.0 torchvision==0.21.0 -i --index-url https://download.pytorch.org/whl/cu124
-pip install transformer vllm bitsandbytes peft
-pip install flash-attn --no-build-isolation
-```
-## Run with vLLM
-You can use the following script to run with vLLM.
-```shell
-vllm serve qiuxi337/gemma-3-12b-it-grpo \
-     --gpu-memory-utilization 0.85 \
-     --max-model-len 4096 \
-     --served-model-name gemma3-12b-grpo \
-     --api-key your_api_key
-```
-## Detail Results
-### MMLU
-![mmlu](asserts/mmlu.png)
-**Fig.1 The results on the MMLU benchmark.**
-![MMLU_Humanities](asserts/mmlu_humanities.png)
-**Fig.2  The results on the MMLU-Humanities**
-![MMLU_Social_Science](asserts/mmlu_social_science.png)
-**Fig.3  The results on the MMLU-Social Science**
-![MMLU_STEM](asserts/mmlu_stem.png)
-**Fig.4  The results on the MMLU-STEM**
-![MMLU_others](asserts/mmlu_other.png)
-**Fig.5  The results on the MMLU-Other**
-### MMLU-Pro
-![MMLU_Pro](asserts/mmlu_pro.png)
-**Fig.6  The results on the MMLU-Pro**
-### CMMLU
-![cmmlu](asserts/cmmlu.png)
-**Fig.7  The results on the CMMLU benchmark.**
-![CMMLU_Humanities](asserts/cmmlu_humanities.png)
-**Fig.8  The results on the CMMLU-Humanities**
-![CMMLU_Social_Science](asserts/cmmlu_social_science.png)
-**Fig.9  The results on the CMMLU-Social Science**
-![CMMLU_STEM](asserts/cmmlu_stem.png)
-**Fig.10  The results on the CMMLU-STEM**
-![CMMLU_others](asserts/cmmlu_other.png)
-**Fig.11  The results on the CMMLU-Other**
-![CMMLU_China_Specific](asserts/cmmlu_china_specific.png)
-**Fig.12  The results on the CMMLU-China Specific**
-## Acknowledge
-[Gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it)
-[Unlsoth](https://github.com/unslothai/unsloth)
-## Citation
-```ini
-@software{Qiu_Open-Medical-R1,
-author = {Qiu, Zhongxi and Zhang, Zhang and Hu, Yan and Li, Heng and Liu, Jiang},
-license = {MIT},
-title = {{Open-Medical-R1}},
-url = {https://github.com/Qsingle/open-medical-r1},
-version = {0.1}
-}
-```

+---
+license: gemma
+datasets:
+- GBaker/MedQA-USMLE-4-options-hf
+base_model:
+- google/gemma-3-12b-it
+library_name: transformers
+tags:
+- biology
+- medical
+---
+# Gemma-3-12B-GRPO trained with GRPO via LoRA
+Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments with LoRA using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-A6000 Ada (48GB VRAM).
+## Evaluation Results
+The model is evaluated on four benchmark datasets: MMLU, MMLU-Pro, CMMU, GSM8K, GPQA. The experimental results are summarized in Table 1, with comprehensive analyses provided in the Detailed Results section.
+<center><strong>Tab.1 Evaluation results.</strong></center>
+| Dataset | Gemma-3-12b-it | Gemma3-12b-GRPO |
+| :-----: | :------------: | :-------------: |
+|  MMLU   |     65.51      |     70.13      |
+|  MMLU-Pro   |     60.17      |     59.99      |
+|  CMMLU   |     54.81     |     57.07     |
+|  GSM8K   |     91.58     |     91.81     |
+|  GPQA   |     34.98     |     34.23     |
+## Requirements
+```shell
+pip install torch==2.6.0 torchaudio==2.6.0 torchvision==0.21.0 -i --index-url https://download.pytorch.org/whl/cu124
+pip install transformer vllm bitsandbytes peft
+pip install flash-attn --no-build-isolation
+```
+## Run with vLLM
+You can use the following script to run with vLLM.
+```shell
+vllm serve qiuxi337/gemma-3-12b-it-grpo \
+     --gpu-memory-utilization 0.85 \
+     --max-model-len 4096 \
+     --served-model-name gemma3-12b-grpo \
+     --api-key your_api_key
+```
+## Detail Results
+### MMLU
+![mmlu](asserts/mmlu.png)
+**Fig.1 The results on the MMLU benchmark.**
+![MMLU_Humanities](asserts/mmlu_humanities.png)
+**Fig.2  The results on the MMLU-Humanities**
+![MMLU_Social_Science](asserts/mmlu_social_science.png)
+**Fig.3  The results on the MMLU-Social Science**
+![MMLU_STEM](asserts/mmlu_stem.png)
+**Fig.4  The results on the MMLU-STEM**
+![MMLU_others](asserts/mmlu_other.png)
+**Fig.5  The results on the MMLU-Other**
+### MMLU-Pro
+![MMLU_Pro](asserts/mmlu_pro.png)
+**Fig.6  The results on the MMLU-Pro**
+### CMMLU
+![cmmlu](asserts/cmmlu.png)
+**Fig.7  The results on the CMMLU benchmark.**
+![CMMLU_Humanities](asserts/cmmlu_humanities.png)
+**Fig.8  The results on the CMMLU-Humanities**
+![CMMLU_Social_Science](asserts/cmmlu_social_science.png)
+**Fig.9  The results on the CMMLU-Social Science**
+![CMMLU_STEM](asserts/cmmlu_stem.png)
+**Fig.10  The results on the CMMLU-STEM**
+![CMMLU_others](asserts/cmmlu_other.png)
+**Fig.11  The results on the CMMLU-Other**
+![CMMLU_China_Specific](asserts/cmmlu_china_specific.png)
+**Fig.12  The results on the CMMLU-China Specific**
+## Acknowledge
+[Gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it)
+[Unlsoth](https://github.com/unslothai/unsloth)
+## Citation
+```ini
+@software{Qiu_Open-Medical-R1,
+author = {Qiu, Zhongxi and Zhang, Zhang and Hu, Yan and Li, Heng and Liu, Jiang},
+license = {MIT},
+title = {{Open-Medical-R1}},
+url = {https://github.com/Qsingle/open-medical-r1},
+version = {0.1}
+}
+```