jquad commited on
Commit
c93050e
·
verified ·
1 Parent(s): d9c6e0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -15,7 +15,7 @@ base_model: unsloth/DeepSeek-R1-0528-Qwen3-8B
15
 
16
  This repository contains LoRA adapters for the `unsloth/DeepSeek-R1-0528-Qwen3-8B` model, fine-tuned on the `open-r1/DAPO-Math-17k-Processed` dataset for German mathematical reasoning tasks.
17
 
18
- This model was trained using the GRPO (Grounded Reward-aware Policy Optimization) algorithm.
19
 
20
  ## Model Details
21
 
 
15
 
16
  This repository contains LoRA adapters for the `unsloth/DeepSeek-R1-0528-Qwen3-8B` model, fine-tuned on the `open-r1/DAPO-Math-17k-Processed` dataset for German mathematical reasoning tasks.
17
 
18
+ This model was trained using the GRPO (Group Relative Policy Optimization) algorithm.
19
 
20
  ## Model Details
21