--- license: gemma datasets: - GBaker/MedQA-USMLE-4-options-hf base_model: - google/gemma-3-12b-it library_name: transformers tags: - biology - medical --- # Gemma-3-12B-GRPO trained with GRPO via LoRA Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments with LoRA using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-A6000 Ada (48GB VRAM). ## Evaluation Results The model is evaluated on four benchmark datasets: MMLU, MMLU-Pro, CMMU, GSM8K, GPQA. The experimental results are summarized in Table 1, with comprehensive analyses provided in the Detailed Results section.
Tab.1 Evaluation results.
| Dataset | Gemma-3-12b-it | Gemma3-12b-GRPO | | :-----: | :------------: | :-------------: | | MMLU | 65.51 | 70.13 | | MMLU-Pro | 60.17 | 59.99 | | CMMLU | 54.81 | 57.07 | | GSM8K | 91.58 | 91.81 | | GPQA | 34.98 | 34.23 | ## Requirements ```shell pip install torch==2.6.0 torchaudio==2.6.0 torchvision==0.21.0 -i --index-url https://download.pytorch.org/whl/cu124 pip install transformer vllm bitsandbytes peft pip install flash-attn --no-build-isolation ``` ## Run with vLLM You can use the following script to run with vLLM. ```shell vllm serve qiuxi337/gemma-3-12b-it-grpo \ --gpu-memory-utilization 0.85 \ --max-model-len 4096 \ --served-model-name gemma3-12b-grpo \ --api-key your_api_key ``` ## Detail Results ### MMLU ![mmlu](asserts/mmlu.png) **Fig.1 The results on the MMLU benchmark.** ![MMLU_Humanities](asserts/mmlu_humanities.png) **Fig.2 The results on the MMLU-Humanities** ![MMLU_Social_Science](asserts/mmlu_social_science.png) **Fig.3 The results on the MMLU-Social Science** ![MMLU_STEM](asserts/mmlu_stem.png) **Fig.4 The results on the MMLU-STEM** ![MMLU_others](asserts/mmlu_other.png) **Fig.5 The results on the MMLU-Other** ### MMLU-Pro ![MMLU_Pro](asserts/mmlu_pro.png) **Fig.6 The results on the MMLU-Pro** ### CMMLU ![cmmlu](asserts/cmmlu.png) **Fig.7 The results on the CMMLU benchmark.** ![CMMLU_Humanities](asserts/cmmlu_humanities.png) **Fig.8 The results on the CMMLU-Humanities** ![CMMLU_Social_Science](asserts/cmmlu_social_science.png) **Fig.9 The results on the CMMLU-Social Science** ![CMMLU_STEM](asserts/cmmlu_stem.png) **Fig.10 The results on the CMMLU-STEM** ![CMMLU_others](asserts/cmmlu_other.png) **Fig.11 The results on the CMMLU-Other** ![CMMLU_China_Specific](asserts/cmmlu_china_specific.png) **Fig.12 The results on the CMMLU-China Specific** ## Acknowledge [Gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it) [Unlsoth](https://github.com/unslothai/unsloth) ## Citation ```ini @software{Qiu_Open-Medical-R1, author = {Qiu, Zhongxi and Zhang, Zhang and Hu, Yan and Li, Heng and Liu, Jiang}, license = {MIT}, title = {{Open-Medical-R1}}, url = {https://github.com/Qsingle/open-medical-r1}, version = {0.1} } ```