license: apache-2.0 language: - en - zh pipeline_tag: text-generation tags: - mlx==0.26.2 - q5 - qwen3 - m3-ultra base_model: Qwen/Qwen3-14B

Qwen3-14B MLX Q5 Quantization

This is a Q5 (5-bit) quantized version of the Qwen3-14B model, optimized for MLX on Apple Silicon. This quantization offers an excellent balance between model quality and size, perfect for running advanced AI on consumer Apple Silicon devices.

Model Details

  • Base Model: Qwen/Qwen3-14B
  • Quantization: Q5 (5-bit) with group size 64
  • Format: MLX (Apple Silicon optimized)
  • Size: 9.5GB (from original 28GB bfloat16)
  • Compression: 66% size reduction
  • Architecture: Qwen3 with enhanced multilingual capabilities

Why Q5?

Q5 quantization provides:

  • Superior quality compared to Q4 while being smaller than Q6/Q8
  • Perfect for consumer Macs - runs smoothly on M1/M2/M3 with 16GB+ RAM
  • Minimal quality loss - retains ~98% of original model capabilities
  • Fast inference with MLX's unified memory architecture

Requirements

  • Apple Silicon Mac (M1/M2/M3/M4)
  • macOS 13.0+
  • Python 3.11+
  • MLX 0.26.0+
  • mlx-lm 0.22.5+
  • 16GB+ RAM recommended

Installation

# Using uv (recommended)
uv add mlx>=0.26.0 mlx-lm transformers

# Or with pip (not tested and obsolete)
pip install mlx>=0.26.0 mlx-lm transformers

Usage

Direct Generation

uv run mlx_lm.generate \
  --model LibraxisAI/Qwen3-14b-q5-mlx \
  --prompt "Explain the advantages of multilingual language models" \
  --max-tokens 500

Python API

from mlx_lm import load, generate

# Load model
model, tokenizer = load("LibraxisAI/Qwen3-14b-q5-mlx")

# Generate text
prompt = "ๅ†™ไธ€ไธชๅ…ณไบŽ้‡ๅญ่ฎก็ฎ—็š„็ฎ€็Ÿญไป‹็ป" # Chinese prompt
response = generate(
    model=model,
    tokenizer=tokenizer,
    prompt=prompt,
    max_tokens=500,
    temp=0.7
)
print(response)

HTTP Server

uv run mlx_lm.server \
  --model LibraxisAI/Qwen3-14b-q5-mlx \
  --host 0.0.0.0 \
  --port 8080

Performance Benchmarks

Tested on Mac Studio M3 Ultra (512GB):

Metric Value
Model Size 9.5GB
Peak Memory Usage ~12GB
Prompt Processing ~150 tokens/sec
Generation Speed ~25-30 tokens/sec
Max Context Length 8,192 tokens

Special Features

Qwen3-14B excels at:

  • Multilingual support - strong performance in Chinese and English
  • Code generation with multiple programming languages
  • Mathematical reasoning and problem solving
  • Balanced performance - ideal size for daily use

Limitations

โš ๏ธ Important: This Q5 model as for the release date, of this quant is NOT compatible with LM Studio (yet), which only supports 2, 3, 4, 6, and 8-bit quantizations & we didn't test it with Ollama or any other inference client. Use MLX directly or via the MLX server - we've created a comfortable, command generation script to run the server properly.

Conversion Details

This model was quantized using:

uv run mlx_lm.convert \
  --hf-path Qwen/Qwen3-14B \
  --mlx-path Qwen3-14b-q5-mlx \
  --dtype bfloat16 \
  -q --q-bits 5 --q-group-size 64

Frontier M3 Ultra Optimization

This model runs exceptionally well on all Apple Silicon, but for M3 Ultra:

import mlx.core as mx

# Set memory limits for optimal performance
mx.metal.set_memory_limit(50 * 1024**3)  # 50GB
mx.metal.set_cache_limit(10 * 1024**3)   # 10GB cache

Tools Included

We provide utility scripts for easy model management:

  1. convert-to-mlx.sh - Command generation tool - convert any model to MLX format with many options of customization and Q5 quantization support on mlx>=0.26.0
  2. mlx-serve.sh - Launch MLX server with custom parameters

Historical Note

The LibraxisAI Q5 models were among the first Q5 quantized MLX models available on Hugging Face, pioneering the use of 5-bit quantization for Apple Silicon optimization.

Citation

If you use this model, please cite:

@misc{qwen3-14b-q5-mlx,
  author = {LibraxisAI},
  title = {Qwen3-14B Q5 MLX - Multilingual Model for Apple Silicon},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/LibraxisAI/Qwen3-14b-q5-mlx}
}

License

This model follows the original Qwen license (Apache-2.0). See the base model card for full details.

Authors of the repository

Monika Szymanska Maciej Gad, DVM

Acknowledgments

  • Apple MLX team and community for the amazing 0.26.0+ framework
  • Qwen team at Alibaba for the excellent multilingual model
  • Klaudiusz-AI ๐Ÿ‰
Downloads last month
62
Safetensors
Model size
14.8B params
Tensor type
BF16
ยท
U32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LibraxisAI/Qwen3-14b-MLX-Q5

Finetuned
Qwen/Qwen3-14B
Quantized
(85)
this model