qiuxi337
/

gemma-3-12b-bnb-grpo

Model card Files Files and versions Community

qiuxi337 commited on Apr 2

Commit

66bb0a6

·

verified ·

1 Parent(s): 3e72d4a

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ metrics:
 base_model:
 - google/gemma-3-12b-it
 ---
-# Gemma-3-12B trained with GRPO via 4-bit PEFT
 Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments under 4-bit quantization conditions (Q-LoRA) using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-4090 (24GB VRAM).

 base_model:
 - google/gemma-3-12b-it
 ---
+# Gemma-3-12B-GRPO trained with GRPO via 4-bit PEFT
 Due to limited available computational resources, we randomly sampled 500 data points from MedQA-USMLE using a methodology and conducted preliminary GRPO experiments under 4-bit quantization conditions (Q-LoRA) using the [Unsloth](https://github.com/unslothai/unsloth) framework. We are now releasing this as a preview version. More experiments and explorations are currently underway, and a technical report is in preparation. Thank you for your patience. We conduct the experiments on one RTX-4090 (24GB VRAM).