Distilled Qwen Model - QLoRA

This model was created through knowledge distillation from Qwen/Qwen3-8B-Base to Qwen/Qwen3-0.6B-Base using QLoRA (Quantized Low-Rank Adaptation).

Model Details

  • Base Model: Qwen/Qwen3-0.6B-Base
  • Teacher Model: Qwen/Qwen3-8B-Base
  • Method: Knowledge Distillation with QLoRA
  • Dataset: MMLU (Massive Multitask Language Understanding)
  • Distillation Alpha: 0.7
  • Temperature: 4.0
  • Trainable Parameters: ~10M (1.66% of total parameters)

Training Details

  • Training Samples: 285
  • Epochs: 3
  • Batch Size: 4
  • Learning Rate: 0.0002
  • LoRA Rank: 16
  • LoRA Alpha: 32

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B-Base")
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B-Base")

# Load the distilled model
model = PeftModel.from_pretrained(base_model, "CarlOwOs/distilled-qwen3-0.6b-qlora-mmlu")

# For inference, merge and unload
model = model.merge_and_unload()

# Generate text
inputs = tokenizer("Question: What is the capital of France?\nA. London\nB. Berlin\nC. Paris\nD. Madrid\nAnswer:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation

This model should be evaluated on MCQA tasks using log-likelihood comparison, as implemented in the evaluation framework.

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CarlOwOs/distilled-qwen3-0.6b-qlora-mmlu

Adapter
(10)
this model

Dataset used to train CarlOwOs/distilled-qwen3-0.6b-qlora-mmlu