Distilled Qwen Model - QLoRA
This model was created through knowledge distillation from Qwen/Qwen3-8B-Base to Qwen/Qwen3-0.6B-Base using QLoRA (Quantized Low-Rank Adaptation).
Model Details
- Base Model: Qwen/Qwen3-0.6B-Base
- Teacher Model: Qwen/Qwen3-8B-Base
- Method: Knowledge Distillation with QLoRA
- Dataset: MMLU (Massive Multitask Language Understanding)
- Distillation Alpha: 0.7
- Temperature: 4.0
- Trainable Parameters: ~10M (1.66% of total parameters)
Training Details
- Training Samples: 285
- Epochs: 3
- Batch Size: 4
- Learning Rate: 0.0002
- LoRA Rank: 16
- LoRA Alpha: 32
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# Load tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B-Base")
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B-Base")
# Load the distilled model
model = PeftModel.from_pretrained(base_model, "CarlOwOs/distilled-qwen3-0.6b-qlora-mmlu")
# For inference, merge and unload
model = model.merge_and_unload()
# Generate text
inputs = tokenizer("Question: What is the capital of France?\nA. London\nB. Berlin\nC. Paris\nD. Madrid\nAnswer:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Evaluation
This model should be evaluated on MCQA tasks using log-likelihood comparison, as implemented in the evaluation framework.
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for CarlOwOs/distilled-qwen3-0.6b-qlora-mmlu
Base model
Qwen/Qwen3-0.6B-Base