jquad
/

DeepSeek-R1-0528-Qwen3-8B-German-GRPO

Model card Files Files and versions Community

jquad commited on 25 days ago

Commit

c93050e

·

verified ·

1 Parent(s): d9c6e0d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ base_model: unsloth/DeepSeek-R1-0528-Qwen3-8B
 This repository contains LoRA adapters for the `unsloth/DeepSeek-R1-0528-Qwen3-8B` model, fine-tuned on the `open-r1/DAPO-Math-17k-Processed` dataset for German mathematical reasoning tasks.
-This model was trained using the GRPO (Grounded Reward-aware Policy Optimization) algorithm.
 ## Model Details

 This repository contains LoRA adapters for the `unsloth/DeepSeek-R1-0528-Qwen3-8B` model, fine-tuned on the `open-r1/DAPO-Math-17k-Processed` dataset for German mathematical reasoning tasks.
+This model was trained using the GRPO (Group Relative Policy Optimization) algorithm.
 ## Model Details