Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
do-me 
posted an update 1 day ago
Post
199
Wrote a quick one-liner to run Qwen3-Next-80B-A3B-Instruct-8bit with mlx-lm on MacOS with mlx-lm and uv:

curl -sL https://gist.githubusercontent.com/do-me/34516f7f4d8cc701da823089b09a3359/raw/5f3b7e92d3e5199fd1d4f21f817a7de4a8af0aec/prompt.py | uv run --with git+https://github.com/ml-explore/mlx-lm.git python - --prompt "What is the meaning of life?"


... or if you prefer the more secure 2-liner version (if you check the script before executing):

curl -sL https://gist.githubusercontent.com/do-me/34516f7f4d8cc701da823089b09a3359/raw/5f3b7e92d3e5199fd1d4f21f817a7de4a8af0aec/prompt.py -o prompt.py
uv run --with git+https://github.com/ml-explore/mlx-lm.git python prompt.py --prompt "What is the meaning of life?"


I get like 45-50 tokens on an M3 Max, pretty happy with the generation speed!

Stats from the video:
Prompt: 15 tokens, 80.972 tokens-per-sec
Generation: 256 tokens, 45.061 tokens-per-sec
Peak memory: 84.834 GB
In this post