Distilled Qwen Model - QLoRA

This model was created through knowledge distillation from Qwen/Qwen3-8B-Base to Qwen/Qwen3-0.6B-Base using QLoRA (Quantized Low-Rank Adaptation).

Model Details

Base Model: Qwen/Qwen3-0.6B-Base
Teacher Model: Qwen/Qwen3-8B-Base
Method: Knowledge Distillation with QLoRA
Dataset: MMLU (Massive Multitask Language Understanding)
Distillation Alpha: 0.7
Temperature: 4.0
Trainable Parameters: ~10M (1.66% of total parameters)

Training Details

Training Samples: 285
Epochs: 3
Batch Size: 4
Learning Rate: 0.0002
LoRA Rank: 16
LoRA Alpha: 32

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B-Base")
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B-Base")

# Load the distilled model
model = PeftModel.from_pretrained(base_model, "CarlOwOs/distilled-qwen3-0.6b-qlora-mmlu")

# For inference, merge and unload
model = model.merge_and_unload()

# Generate text
inputs = tokenizer("Question: What is the capital of France?\nA. London\nB. Berlin\nC. Paris\nD. Madrid\nAnswer:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation

This model should be evaluated on MCQA tasks using log-likelihood comparison, as implemented in the evaluation framework.

CarlOwOs
/

distilled-qwen3-0.6b-qlora-mmlu

Distilled Qwen Model - QLoRA

Model Details

Training Details

Usage

Evaluation

Model tree for CarlOwOs/distilled-qwen3-0.6b-qlora-mmlu

Dataset used to train CarlOwOs/distilled-qwen3-0.6b-qlora-mmlu