---
language: de
license: apache-2.0
tags:
- grpo
- lora
- german
- math-reasoning
- deepseek
- unsloth
base_model: unsloth/DeepSeek-R1-0528-Qwen3-8B
---

# LoRA fine-tune of unsloth/DeepSeek-R1-0528-Qwen3-8B for German Mathematical Reasoning

This repository contains LoRA adapters for the `unsloth/DeepSeek-R1-0528-Qwen3-8B` model, fine-tuned on the `open-r1/DAPO-Math-17k-Processed` dataset for German mathematical reasoning tasks.

This model was trained using the GRPO (Group Relative Policy Optimization) algorithm.

## Model Details

- **Base Model:** `unsloth/DeepSeek-R1-0528-Qwen3-8B`
- **Fine-tuning Method:** LoRA with GRPO
- **Language:** German

## Training Configuration

- **Dataset:** `open-r1/DAPO-Math-17k-Processed`
- **Learning Rate:** `5e-06`
- **Max Training Steps:** `100`
- **Max Sequence Length:** `1024`
- **GRPO Generations:** `4`
- **GRPO Temperature:** `1`

### LoRA Configuration

- **Rank:** `32`
- **Alpha:** `64`
- **Dropout:** `0`

## How to use

To use these LoRA adapters, load the base model from this repository.

```python
from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "jquad/DeepSeek-R1-0528-Qwen3-8B-German-GRPO", # LoRA adapters are loaded automatically
    max_seq_length = 1024,
    dtype = None,
    load_in_4bit = True,
)

# Example prompt
prompt = "Was ist 2 + 2?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Intended Use

This model is intended for mathematical reasoning in German. It has been fine-tuned on a specialized dataset and may not be suitable for general-purpose tasks.

**This is a research artifact and should not be used in production without further evaluation.**