license: apache-2.0 | |
base_model: Qwen/Qwen3-32B | |
tags: | |
- mlx | |
- 3bit | |
- quantized | |
# Qwen3-32B 3bit MLX | |
This model is a 3-bit quantized version of [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) using MLX. | |
## Model Details | |
- **Quantization**: 3-bit | |
- **Framework**: MLX | |
- **Base Model**: Qwen/Qwen3-32B | |
- **Model Size**: ~12GB (3-bit quantized) | |
## Usage | |
```python | |
from mlx_lm import load, generate | |
model, tokenizer = load("KCh3dRi4n/Qwen3-32B-3bit") | |
prompt = "Hello, how are you?" | |
messages = [{"role": "user", "content": prompt}] | |
formatted_prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True) | |
response = generate(model, tokenizer, prompt=formatted_prompt, max_tokens=100) | |
print(response) | |
``` | |
## Requirements | |
- Apple Silicon Mac (M1/M2/M3) | |
- macOS 13.0+ | |
- Python 3.8+ | |
- MLX and mlx-lm packages | |
## Installation | |
```bash | |
pip install mlx mlx-lm | |
``` | |