Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ base_model: unsloth/DeepSeek-R1-0528-Qwen3-8B
|
|
15 |
|
16 |
This repository contains LoRA adapters for the `unsloth/DeepSeek-R1-0528-Qwen3-8B` model, fine-tuned on the `open-r1/DAPO-Math-17k-Processed` dataset for German mathematical reasoning tasks.
|
17 |
|
18 |
-
This model was trained using the GRPO (
|
19 |
|
20 |
## Model Details
|
21 |
|
|
|
15 |
|
16 |
This repository contains LoRA adapters for the `unsloth/DeepSeek-R1-0528-Qwen3-8B` model, fine-tuned on the `open-r1/DAPO-Math-17k-Processed` dataset for German mathematical reasoning tasks.
|
17 |
|
18 |
+
This model was trained using the GRPO (Group Relative Policy Optimization) algorithm.
|
19 |
|
20 |
## Model Details
|
21 |
|