Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,7 @@ metrics:
|
|
5 |
base_model:
|
6 |
- google/gemma-3-12b-it
|
7 |
---
|
8 |
-
# Gemma-3-12B trained with GRPO via 4-bit PEFT
|
9 |
|
10 |
Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments under 4-bit quantization conditions (Q-LoRA) using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-4090 (24GB VRAM).
|
11 |
|
|
|
5 |
base_model:
|
6 |
- google/gemma-3-12b-it
|
7 |
---
|
8 |
+
# Gemma-3-12B-GRPO trained with GRPO via 4-bit PEFT
|
9 |
|
10 |
Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments under 4-bit quantization conditions (Q-LoRA) using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-4090 (24GB VRAM).
|
11 |
|