LoRA fine-tune of unsloth/DeepSeek-R1-0528-Qwen3-8B for German Mathematical Reasoning
This repository contains LoRA adapters for the unsloth/DeepSeek-R1-0528-Qwen3-8B
model, fine-tuned on the open-r1/DAPO-Math-17k-Processed
dataset for German mathematical reasoning tasks.
This model was trained using the GRPO (Group Relative Policy Optimization) algorithm.
Model Details
- Base Model:
unsloth/DeepSeek-R1-0528-Qwen3-8B
- Fine-tuning Method: LoRA with GRPO
- Language: German
Training Configuration
- Dataset:
open-r1/DAPO-Math-17k-Processed
- Learning Rate:
5e-06
- Max Training Steps:
100
- Max Sequence Length:
1024
- GRPO Generations:
4
- GRPO Temperature:
1
LoRA Configuration
- Rank:
32
- Alpha:
64
- Dropout:
0
How to use
To use these LoRA adapters, load the base model from this repository.
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "jquad/DeepSeek-R1-0528-Qwen3-8B-German-GRPO", # LoRA adapters are loaded automatically
max_seq_length = 1024,
dtype = None,
load_in_4bit = True,
)
# Example prompt
prompt = "Was ist 2 + 2?"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Intended Use
This model is intended for mathematical reasoning in German. It has been fine-tuned on a specialized dataset and may not be suitable for general-purpose tasks.
This is a research artifact and should not be used in production without further evaluation.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for jquad/DeepSeek-R1-0528-Qwen3-8B-German-GRPO
Base model
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
Finetuned
unsloth/DeepSeek-R1-0528-Qwen3-8B