You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.


license: cc-by-nc-4.0 language: - en pipeline_tag: text-generation tags: - mlx==0.26.2 - q5 - command-r - m3-ultra base_model: CohereLabs/c4ai-command-a-03-2025

Command-R 03-2025 MLX Q5 Quantization

This is a Q5 (5-bit) quantized version of the Command-R model, optimized for MLX on Apple Silicon. This quantization offers an excellent balance between model quality and size, specifically designed for high-memory Apple Silicon systems like the M3 Ultra.

Model Details

  • Base Model: CohereLabs/c4ai-command-command-a-03-2025
  • Quantization: Q5 (5-bit) with group size 64
  • Format: MLX (Apple Silicon optimized)
  • Size: 71GB (from original 207GB bfloat16)
  • Compression: 66% size reduction
  • Performance: 8.6 tokens/sec on M3 Ultra

Why Q5?

Q5 quantization provides:

  • Superior quality compared to Q4 while being smaller than Q6/Q8
  • Optimal size for 128GB+ Apple Silicon systems
  • Minimal quality loss - retains ~98% of original model capabilities
  • Fast inference with MLX's unified memory architecture

Requirements

  • Apple Silicon Mac (M1/M2/M3/M4)
  • macOS 13.0+
  • Python 3.11+
  • MLX 0.26.0+
  • mlx-lm 0.22.5+
  • 80GB+ RAM recommended (128GB+ for full 128k context)

Installation

# Using uv (recommended)
uv add mlx>=0.26.0 mlx-lm transformers

# Or with pip (not tested and obsolete)
pip install mlx>=0.26.0 mlx-lm transformers

Usage

Direct Generation

uv run mlx_lm.generate \
  --model LibraxisAI/c4ai-command-a-03-2025-q5-mlx \
  --prompt "Explain quantum computing" \
  --max-tokens 500

Python API

from mlx_lm import load, generate

# Load model
model, tokenizer = load("LibraxisAI/c4ai-command-a-03-2025-q5-mlx")

# Generate text
prompt = "What are the benefits of Q5 quantization?"
response = generate(
    model=model,
    tokenizer=tokenizer,
    prompt=prompt,
    max_tokens=200,
    temp=0.7
)
print(response)

HTTP Server

uv run mlx_lm.server \
  --model LibraxisAI/c4ai-command-a-03-2025-q5-mlx \
  --host 0.0.0.0 \
  --port 8080

Performance Benchmarks

Tested on Mac Studio M3 Ultra (512GB):

Metric Value
Model Size 71GB
Peak Memory Usage 77.166 GB
Prompt Processing 89.634 tokens/sec
Generation Speed 8.631 tokens/sec
Max Context Length 131,072 tokens (128k)

Limitations

โš ๏ธ Important: This Q5 model as for the release date, of this quant is NOT compatible with LM Studio (yet), which only supports 2, 3, 4, 6, and 8-bit quantizations & we didn't test ot with Ollama or any other inference client. Use MLX directly or via the MLX server - we've created a comfortable, command generation script to run the server properly.

Conversion Details

This model was quantized using:

uv run mlx_lm.convert \
  --hf-path CohereLabs/c4ai-command-a-03-2025 \
  --mlx-path c4ai-command-a-03-2025-q5-mlx \
  --dtype bfloat16 \
  -q --q-bits 5 --q-group-size 64

Frontier M3 Ultra Optimization

This model is specifically optimized for the Mac Studio M3 Ultra setup with 512GB unified memory. For best performance:

import mlx.core as mx

# Set memory limits for large models
mx.metal.set_memory_limit(300 * 1024**3)  # 300GB
mx.metal.set_cache_limit(50 * 1024**3)    # 50GB cache

As the peak memory usage can be significantly bigger than for loaded but idle models.

Tools Included

We provide utility scripts for easy model management:

  1. convert-to-mlx.sh - Command generation tool - convert any model to MLX format with many options of customization and Q5 quantization support on mlx>=0.26.0
  2. mlx-serve.sh - Launch MLX server with custom parameters

Citation

If you use this model, please cite:

@misc{command-r-q5-mlx,
  author = {LibraxisAI},
  title = {Command-R Q5 MLX - Optimized for Apple Silicon},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/LibraxisAI/c4ai-command-a-03-2025-q5-mlx}
}

License

This model follows the original Command-R license (CC-BY-NC-4.0). See the base model card for full details.

Authors of the repository

Monika Szymanska Maciej Gad, DVM

Acknowledgments

  • Apple MLX team and community for the amazing 0.26.0+ framework
  • Cohere for the original Command-R model
  • Klaudiusz-AI ๐Ÿ‰
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LibraxisAI/c4ai-command-a-03-2025-q5-mlx

Quantized
(25)
this model