jquad
/

DeepSeek-R1-0528-Qwen3-8B-German-GRPO

Model card Files Files and versions Community

DeepSeek-R1-0528-Qwen3-8B-German-GRPO / README.md

jquad's picture

Update README.md

c93050e verified 25 days ago

|

history blame contribute delete

1.85 kB

	---
	language: de
	license: apache-2.0
	tags:
	- grpo
	- lora
	- german
	- math-reasoning
	- deepseek
	- unsloth
	base_model: unsloth/DeepSeek-R1-0528-Qwen3-8B
	---

	# LoRA fine-tune of unsloth/DeepSeek-R1-0528-Qwen3-8B for German Mathematical Reasoning

	This repository contains LoRA adapters for the `unsloth/DeepSeek-R1-0528-Qwen3-8B` model, fine-tuned on the `open-r1/DAPO-Math-17k-Processed` dataset for German mathematical reasoning tasks.

	This model was trained using the GRPO (Group Relative Policy Optimization) algorithm.

	## Model Details

	- Base Model: `unsloth/DeepSeek-R1-0528-Qwen3-8B`
	- Fine-tuning Method: LoRA with GRPO
	- Language: German

	## Training Configuration

	- Dataset: `open-r1/DAPO-Math-17k-Processed`
	- Learning Rate: `5e-06`
	- Max Training Steps: `100`
	- Max Sequence Length: `1024`
	- GRPO Generations: `4`
	- GRPO Temperature: `1`

	### LoRA Configuration

	- Rank: `32`
	- Alpha: `64`
	- Dropout: `0`

	## How to use

	To use these LoRA adapters, load the base model from this repository.

	```python
	from unsloth import FastLanguageModel
	import torch

	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name = "jquad/DeepSeek-R1-0528-Qwen3-8B-German-GRPO", # LoRA adapters are loaded automatically
	max_seq_length = 1024,
	dtype = None,
	load_in_4bit = True,
	)

	# Example prompt
	prompt = "Was ist 2 + 2?"
	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
	outputs = model.generate(**inputs, max_new_tokens=20)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Intended Use

	This model is intended for mathematical reasoning in German. It has been fine-tuned on a specialized dataset and may not be suitable for general-purpose tasks.

	This is a research artifact and should not be used in production without further evaluation.