--- language: de license: apache-2.0 tags: - grpo - lora - german - math-reasoning - deepseek - unsloth base_model: unsloth/DeepSeek-R1-0528-Qwen3-8B --- # LoRA fine-tune of unsloth/DeepSeek-R1-0528-Qwen3-8B for German Mathematical Reasoning This repository contains LoRA adapters for the `unsloth/DeepSeek-R1-0528-Qwen3-8B` model, fine-tuned on the `open-r1/DAPO-Math-17k-Processed` dataset for German mathematical reasoning tasks. This model was trained using the GRPO (Group Relative Policy Optimization) algorithm. ## Model Details - **Base Model:** `unsloth/DeepSeek-R1-0528-Qwen3-8B` - **Fine-tuning Method:** LoRA with GRPO - **Language:** German ## Training Configuration - **Dataset:** `open-r1/DAPO-Math-17k-Processed` - **Learning Rate:** `5e-06` - **Max Training Steps:** `100` - **Max Sequence Length:** `1024` - **GRPO Generations:** `4` - **GRPO Temperature:** `1` ### LoRA Configuration - **Rank:** `32` - **Alpha:** `64` - **Dropout:** `0` ## How to use To use these LoRA adapters, load the base model from this repository. ```python from unsloth import FastLanguageModel import torch model, tokenizer = FastLanguageModel.from_pretrained( model_name = "jquad/DeepSeek-R1-0528-Qwen3-8B-German-GRPO", # LoRA adapters are loaded automatically max_seq_length = 1024, dtype = None, load_in_4bit = True, ) # Example prompt prompt = "Was ist 2 + 2?" inputs = tokenizer(prompt, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=20) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Intended Use This model is intended for mathematical reasoning in German. It has been fine-tuned on a specialized dataset and may not be suitable for general-purpose tasks. **This is a research artifact and should not be used in production without further evaluation.**