LoRA fine-tune of unsloth/DeepSeek-R1-0528-Qwen3-8B for German Mathematical Reasoning

This repository contains LoRA adapters for the unsloth/DeepSeek-R1-0528-Qwen3-8B model, fine-tuned on the open-r1/DAPO-Math-17k-Processed dataset for German mathematical reasoning tasks.

This model was trained using the GRPO (Group Relative Policy Optimization) algorithm.

Model Details

  • Base Model: unsloth/DeepSeek-R1-0528-Qwen3-8B
  • Fine-tuning Method: LoRA with GRPO
  • Language: German

Training Configuration

  • Dataset: open-r1/DAPO-Math-17k-Processed
  • Learning Rate: 5e-06
  • Max Training Steps: 100
  • Max Sequence Length: 1024
  • GRPO Generations: 4
  • GRPO Temperature: 1

LoRA Configuration

  • Rank: 32
  • Alpha: 64
  • Dropout: 0

How to use

To use these LoRA adapters, load the base model from this repository.

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "jquad/DeepSeek-R1-0528-Qwen3-8B-German-GRPO", # LoRA adapters are loaded automatically
    max_seq_length = 1024,
    dtype = None,
    load_in_4bit = True,
)

# Example prompt
prompt = "Was ist 2 + 2?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

This model is intended for mathematical reasoning in German. It has been fine-tuned on a specialized dataset and may not be suitable for general-purpose tasks.

This is a research artifact and should not be used in production without further evaluation.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jquad/DeepSeek-R1-0528-Qwen3-8B-German-GRPO