LoRA fine-tune of unsloth/DeepSeek-R1-0528-Qwen3-8B for German Mathematical Reasoning

This repository contains LoRA adapters for the unsloth/DeepSeek-R1-0528-Qwen3-8B model, fine-tuned on the open-r1/DAPO-Math-17k-Processed dataset for German mathematical reasoning tasks.

This model was trained using the GRPO (Group Relative Policy Optimization) algorithm.

Model Details

Base Model: unsloth/DeepSeek-R1-0528-Qwen3-8B
Fine-tuning Method: LoRA with GRPO
Language: German

Training Configuration

Dataset: open-r1/DAPO-Math-17k-Processed
Learning Rate: 5e-06
Max Training Steps: 100
Max Sequence Length: 1024
GRPO Generations: 4
GRPO Temperature: 1

LoRA Configuration

Rank: 32
Alpha: 64
Dropout: 0

How to use

To use these LoRA adapters, load the base model from this repository.

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "jquad/DeepSeek-R1-0528-Qwen3-8B-German-GRPO", # LoRA adapters are loaded automatically
    max_seq_length = 1024,
    dtype = None,
    load_in_4bit = True,
)

# Example prompt
prompt = "Was ist 2 + 2?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

This model is intended for mathematical reasoning in German. It has been fine-tuned on a specialized dataset and may not be suitable for general-purpose tasks.

This is a research artifact and should not be used in production without further evaluation.