qiuxi337 commited on
Commit
66bb0a6
·
verified ·
1 Parent(s): 3e72d4a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -5,7 +5,7 @@ metrics:
5
  base_model:
6
  - google/gemma-3-12b-it
7
  ---
8
- # Gemma-3-12B trained with GRPO via 4-bit PEFT
9
 
10
  Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments under 4-bit quantization conditions (Q-LoRA) using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-4090 (24GB VRAM).
11
 
 
5
  base_model:
6
  - google/gemma-3-12b-it
7
  ---
8
+ # Gemma-3-12B-GRPO trained with GRPO via 4-bit PEFT
9
 
10
  Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments under 4-bit quantization conditions (Q-LoRA) using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-4090 (24GB VRAM).
11