Convenient Quants
Collection
Small body, high IQ
•
10 items
•
Updated
Tuned for nice output, 165 tok/sec
Performance evaluation:
arc_challenge:
acc 0.263, acc_norm 0.294, stderr: 0.013
arc_easy:
acc 0.398, acc_norm 0.362, stderr 0.0098
boolq:
acc 0.378, stderr 0.0084
hellaswag:
acc 0.363, acc_norm 0.423, stderr: 0.0049
openbookqa:
acc 0.168, acc_norm 0.340, stderr: 0.0212
piqa:
acc 0.655, acc_norm 0.643, stderr: 0.011
winogrande:
acc 0.531, stderr: 0.014
This model Qwen3-0.6B-dwq6b-mlx was converted to MLX format from Qwen/Qwen3-0.6B using mlx-lm version 0.26.0.
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3-0.6B-dwq6b-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)