|
--- |
|
language: de |
|
license: apache-2.0 |
|
tags: |
|
- grpo |
|
- lora |
|
- german |
|
- math-reasoning |
|
- deepseek |
|
- unsloth |
|
base_model: unsloth/DeepSeek-R1-0528-Qwen3-8B |
|
--- |
|
|
|
# LoRA fine-tune of unsloth/DeepSeek-R1-0528-Qwen3-8B for German Mathematical Reasoning |
|
|
|
This repository contains LoRA adapters for the `unsloth/DeepSeek-R1-0528-Qwen3-8B` model, fine-tuned on the `open-r1/DAPO-Math-17k-Processed` dataset for German mathematical reasoning tasks. |
|
|
|
This model was trained using the GRPO (Group Relative Policy Optimization) algorithm. |
|
|
|
## Model Details |
|
|
|
- **Base Model:** `unsloth/DeepSeek-R1-0528-Qwen3-8B` |
|
- **Fine-tuning Method:** LoRA with GRPO |
|
- **Language:** German |
|
|
|
## Training Configuration |
|
|
|
- **Dataset:** `open-r1/DAPO-Math-17k-Processed` |
|
- **Learning Rate:** `5e-06` |
|
- **Max Training Steps:** `100` |
|
- **Max Sequence Length:** `1024` |
|
- **GRPO Generations:** `4` |
|
- **GRPO Temperature:** `1` |
|
|
|
### LoRA Configuration |
|
|
|
- **Rank:** `32` |
|
- **Alpha:** `64` |
|
- **Dropout:** `0` |
|
|
|
## How to use |
|
|
|
To use these LoRA adapters, load the base model from this repository. |
|
|
|
```python |
|
from unsloth import FastLanguageModel |
|
import torch |
|
|
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name = "jquad/DeepSeek-R1-0528-Qwen3-8B-German-GRPO", # LoRA adapters are loaded automatically |
|
max_seq_length = 1024, |
|
dtype = None, |
|
load_in_4bit = True, |
|
) |
|
|
|
# Example prompt |
|
prompt = "Was ist 2 + 2?" |
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
outputs = model.generate(**inputs, max_new_tokens=20) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
## Intended Use |
|
|
|
This model is intended for mathematical reasoning in German. It has been fine-tuned on a specialized dataset and may not be suitable for general-purpose tasks. |
|
|
|
**This is a research artifact and should not be used in production without further evaluation.** |
|
|