Add pipeline tag and Github link

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +9 -5
README.md CHANGED
@@ -1,18 +1,22 @@
1
  ---
2
- license: gemma
3
- datasets:
4
- - GBaker/MedQA-USMLE-4-options-hf
5
  base_model:
6
  - google/gemma-3-12b-it
 
 
7
  library_name: transformers
 
8
  tags:
9
  - biology
10
  - medical
 
11
  ---
 
12
  # Gemma-3-12B-GRPO trained with GRPO via LoRA
13
 
14
  Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments with LoRA using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-A6000 Ada (48GB VRAM).
15
 
 
 
16
  ## Evaluation Results
17
 
18
  The model is evaluated on four benchmark datasets: MMLU, MMLU-Pro, CMMU, GSM8K, GPQA. The experimental results are summarized in Table 1, with comprehensive analyses provided in the Detailed Results section.
@@ -24,8 +28,8 @@ The model is evaluated on four benchmark datasets: MMLU, MMLU-Pro, CMMU, GSM8K,
24
  | MMLU | 65.51 | 70.13 |
25
  | MMLU-Pro | 60.17 | 59.99 |
26
  | CMMLU | 54.81 | 57.07 |
27
- | GSM8K | 91.58 | 91.81 |
28
- | GPQA | 34.98 | 34.23 |
29
 
30
  ## Requirements
31
 
 
1
  ---
 
 
 
2
  base_model:
3
  - google/gemma-3-12b-it
4
+ datasets:
5
+ - GBaker/MedQA-USMLE-4-options-hf
6
  library_name: transformers
7
+ license: gemma
8
  tags:
9
  - biology
10
  - medical
11
+ pipeline_tag: text-generation
12
  ---
13
+
14
  # Gemma-3-12B-GRPO trained with GRPO via LoRA
15
 
16
  Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments with LoRA using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-A6000 Ada (48GB VRAM).
17
 
18
+ Code: https://github.com/Qsingle/open-medical-r1
19
+
20
  ## Evaluation Results
21
 
22
  The model is evaluated on four benchmark datasets: MMLU, MMLU-Pro, CMMU, GSM8K, GPQA. The experimental results are summarized in Table 1, with comprehensive analyses provided in the Detailed Results section.
 
28
  | MMLU | 65.51 | 70.13 |
29
  | MMLU-Pro | 60.17 | 59.99 |
30
  | CMMLU | 54.81 | 57.07 |
31
+ | GSM8K | 91.58 | 91.81 |
32
+ | GPQA | 34.98 | 34.23 |
33
 
34
  ## Requirements
35