Post
93
Wrote a quick one-liner to run Qwen3-Next-80B-A3B-Instruct-8bit with mlx-lm on MacOS with mlx-lm and uv:
... or if you prefer the more secure 2-liner version (if you check the script before executing):
I get like 45-50 tokens on an M3 Max, pretty happy with the generation speed!
Stats from the video:
curl -sL https://gist.githubusercontent.com/do-me/34516f7f4d8cc701da823089b09a3359/raw/5f3b7e92d3e5199fd1d4f21f817a7de4a8af0aec/prompt.py | uv run --with git+https://github.com/ml-explore/mlx-lm.git python - --prompt "What is the meaning of life?"
... or if you prefer the more secure 2-liner version (if you check the script before executing):
curl -sL https://gist.githubusercontent.com/do-me/34516f7f4d8cc701da823089b09a3359/raw/5f3b7e92d3e5199fd1d4f21f817a7de4a8af0aec/prompt.py -o prompt.py
uv run --with git+https://github.com/ml-explore/mlx-lm.git python prompt.py --prompt "What is the meaning of life?"
I get like 45-50 tokens on an M3 Max, pretty happy with the generation speed!
Stats from the video:
Prompt: 15 tokens, 80.972 tokens-per-sec
Generation: 256 tokens, 45.061 tokens-per-sec
Peak memory: 84.834 GB