@do-me on Hugging Face: "Wrote a quick one-liner to run Qwen3-Next-80B-A3B-Instruct-8bit with mlx-lm on…"

Post

199

Wrote a quick one-liner to run Qwen3-Next-80B-A3B-Instruct-8bit with mlx-lm on MacOS with mlx-lm and uv:

curl -sL https://gist.githubusercontent.com/do-me/34516f7f4d8cc701da823089b09a3359/raw/5f3b7e92d3e5199fd1d4f21f817a7de4a8af0aec/prompt.py | uv run --with git+https://github.com/ml-explore/mlx-lm.git python - --prompt "What is the meaning of life?"

... or if you prefer the more secure 2-liner version (if you check the script before executing):

curl -sL https://gist.githubusercontent.com/do-me/34516f7f4d8cc701da823089b09a3359/raw/5f3b7e92d3e5199fd1d4f21f817a7de4a8af0aec/prompt.py -o prompt.py
uv run --with git+https://github.com/ml-explore/mlx-lm.git python prompt.py --prompt "What is the meaning of life?"

I get like 45-50 tokens on an M3 Max, pretty happy with the generation speed!

Stats from the video:

Prompt: 15 tokens, 80.972 tokens-per-sec
Generation: 256 tokens, 45.061 tokens-per-sec
Peak memory: 84.834 GB

Join the conversation