Add pipeline tag and Github link
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -1,18 +1,22 @@
|
|
1 |
---
|
2 |
-
license: gemma
|
3 |
-
datasets:
|
4 |
-
- GBaker/MedQA-USMLE-4-options-hf
|
5 |
base_model:
|
6 |
- google/gemma-3-12b-it
|
|
|
|
|
7 |
library_name: transformers
|
|
|
8 |
tags:
|
9 |
- biology
|
10 |
- medical
|
|
|
11 |
---
|
|
|
12 |
# Gemma-3-12B-GRPO trained with GRPO via LoRA
|
13 |
|
14 |
Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments with LoRA using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-A6000 Ada (48GB VRAM).
|
15 |
|
|
|
|
|
16 |
## Evaluation Results
|
17 |
|
18 |
The model is evaluated on four benchmark datasets: MMLU, MMLU-Pro, CMMU, GSM8K, GPQA. The experimental results are summarized in Table 1, with comprehensive analyses provided in the Detailed Results section.
|
@@ -24,8 +28,8 @@ The model is evaluated on four benchmark datasets: MMLU, MMLU-Pro, CMMU, GSM8K,
|
|
24 |
| MMLU | 65.51 | 70.13 |
|
25 |
| MMLU-Pro | 60.17 | 59.99 |
|
26 |
| CMMLU | 54.81 | 57.07 |
|
27 |
-
| GSM8K | 91.58 | 91.81
|
28 |
-
| GPQA | 34.98 | 34.23
|
29 |
|
30 |
## Requirements
|
31 |
|
|
|
1 |
---
|
|
|
|
|
|
|
2 |
base_model:
|
3 |
- google/gemma-3-12b-it
|
4 |
+
datasets:
|
5 |
+
- GBaker/MedQA-USMLE-4-options-hf
|
6 |
library_name: transformers
|
7 |
+
license: gemma
|
8 |
tags:
|
9 |
- biology
|
10 |
- medical
|
11 |
+
pipeline_tag: text-generation
|
12 |
---
|
13 |
+
|
14 |
# Gemma-3-12B-GRPO trained with GRPO via LoRA
|
15 |
|
16 |
Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments with LoRA using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-A6000 Ada (48GB VRAM).
|
17 |
|
18 |
+
Code: https://github.com/Qsingle/open-medical-r1
|
19 |
+
|
20 |
## Evaluation Results
|
21 |
|
22 |
The model is evaluated on four benchmark datasets: MMLU, MMLU-Pro, CMMU, GSM8K, GPQA. The experimental results are summarized in Table 1, with comprehensive analyses provided in the Detailed Results section.
|
|
|
28 |
| MMLU | 65.51 | 70.13 |
|
29 |
| MMLU-Pro | 60.17 | 59.99 |
|
30 |
| CMMLU | 54.81 | 57.07 |
|
31 |
+
| GSM8K | 91.58 | 91.81 |
|
32 |
+
| GPQA | 34.98 | 34.23 |
|
33 |
|
34 |
## Requirements
|
35 |
|