Panacea-MegaScience-Qwen3-1.7B-q4-hi-mlx

Top Quantizations

✅ Recommended High-Performers

🥇 q5 Quantization

Why: Highest winogrande (0.694 vs avg. 0.574), excellent ARC-Easy (avg. ~0.398).
Strength: Best balance of accuracy and robustness across tasks (especially winogrande & ARC-Easy).
Ideal for: Production deployments needing top end-to-end accuracy.

🥈 q6-hi Quantization

Why: Best ARC-Easy (0.398) + near-best winogrande (0.696).
Strength: Great precision for ARC tasks with minimal loss in boolq (0.622).
Ideal for: ARC-focused QA tasks or mixed-training pipelines.

🥉 q4-hi Quantization

Why: Best boolq (0.622 = tied with q5/q6), competitive hellaswag.
Strength: Lightweight optimization for quick inference on boolq/data-centric tasks.

Performance Insights

Winogrande Champion: q5 (0.694) → optimal for complex reasoning tasks.
Consistency King: q5 + q6-hi (both hit >90% of max winogrande scores).
Surprise: bf16 underperforms slightly on winogrande despite high ARC-Easy → good for baseline testing.
Cost-Saver: q4-hi (best boolq) but with minimal setup overhead.

Recommendation Summary

Use Case	                Top Quant	Key Advantage
Highest winogrande accuracy	q5           +26% vs bf16 (0.694 → 0.550)
ARC-Easy focus	            q6-hi        Highest ARC-Easy (0.398)
BoolQ-centric workflows	    q4-hi        Best boolq (0.622)
Balanced end-to-end	        q5           Best holistic median score

💡 Pro Tip: If latency is critical, deploy q5 for accuracy and q4-hi as a backup (minimal trade-off in boolq + wins on winogrande).

Visual Summary

Winogrande (↑) → q5 🥇 = 0.694  
ARC-Easy (↑)   → q6-hi 🥈 = 0.398  
BoolQ (↑)      → q4-hi 🥉 = 0.622  
Consistency     → q5/q6-hi (★★★★☆)

This model Panacea-MegaScience-Qwen3-1.7B-q4-hi-mlx was converted to MLX format from prithivMLmods/Panacea-MegaScience-Qwen3-1.7B using mlx-lm version 0.26.3.

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Panacea-MegaScience-Qwen3-1.7B-q4-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
12
Safetensors
Model size
323M params
Tensor type
F16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nightmedia/Panacea-MegaScience-Qwen3-1.7B-q4-hi-mlx

Finetuned
Qwen/Qwen3-1.7B
Quantized
(6)
this model

Dataset used to train nightmedia/Panacea-MegaScience-Qwen3-1.7B-q4-hi-mlx