qiuxi337
/

gemma-3-12b-it-grpo

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions Community

Add pipeline tag and Github link

#1

by nielsr HF Staff - opened Apr 22

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +9 -5

README.md CHANGED Viewed

@@ -1,18 +1,22 @@
 ---
-license: gemma
-datasets:
-- GBaker/MedQA-USMLE-4-options-hf
 base_model:
 - google/gemma-3-12b-it
 library_name: transformers
 tags:
 - biology
 - medical
 ---
 # Gemma-3-12B-GRPO trained with GRPO via LoRA
 Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments with LoRA using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-A6000 Ada (48GB VRAM).
 ## Evaluation Results
 The model is evaluated on four benchmark datasets: MMLU, MMLU-Pro, CMMU, GSM8K, GPQA. The experimental results are summarized in Table 1, with comprehensive analyses provided in the Detailed Results section.
@@ -24,8 +28,8 @@ The model is evaluated on four benchmark datasets: MMLU, MMLU-Pro, CMMU, GSM8K,
 |  MMLU   |     65.51      |     70.13      |
 |  MMLU-Pro   |     60.17      |     59.99      |
 |  CMMLU   |     54.81     |     57.07     |
-|  GSM8K   |     91.58     |     91.81     |
-|  GPQA   |     34.98     |     34.23     |
 ## Requirements

 ---
 base_model:
 - google/gemma-3-12b-it
+datasets:
+- GBaker/MedQA-USMLE-4-options-hf
 library_name: transformers
+license: gemma
 tags:
 - biology
 - medical
+pipeline_tag: text-generation
 ---
 # Gemma-3-12B-GRPO trained with GRPO via LoRA
 Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments with LoRA using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-A6000 Ada (48GB VRAM).
+Code: https://github.com/Qsingle/open-medical-r1
 ## Evaluation Results
 The model is evaluated on four benchmark datasets: MMLU, MMLU-Pro, CMMU, GSM8K, GPQA. The experimental results are summarized in Table 1, with comprehensive analyses provided in the Detailed Results section.
 |  MMLU   |     65.51      |     70.13      |
 |  MMLU-Pro   |     60.17      |     59.99      |
 |  CMMLU   |     54.81     |     57.07     |
+|  GSM8K   |     91.58     |     91.81      |
+|  GPQA   |     34.98     |     34.23      |
 ## Requirements