Ivan Fioravanti's picture

Ivan Fioravanti PRO

ivanfioravanti

AI & ML interests

None yet

Recent Activity

updated a model 8 days ago
mlx-community/DeepSeek-R1-0528-3bit
published a model 8 days ago
mlx-community/DeepSeek-R1-0528-3bit
published a model 9 days ago
mlx-community/DeepSeek-R1-0528-4bit
View all activity

Organizations

CoreView's profile picture MLX Vision's profile picture MLX Community's profile picture Social Post Explorers's profile picture Cognitive Computations's profile picture Hugging Face Discord Community's profile picture

ivanfioravanti's activity

upvoted an article 20 days ago
reacted to wolfram's post with 🔥 30 days ago
view post
Post
7182
Finally finished my extensive **Qwen 3 evaluations** across a range of formats and quantisations, focusing on **MMLU-Pro** (Computer Science).

A few take-aways stood out - especially for those interested in local deployment and performance trade-offs:

1️⃣ **Qwen3-235B-A22B** (via Fireworks API) tops the table at **83.66%** with ~55 tok/s.
2️⃣ But the **30B-A3B Unsloth** quant delivered **82.20%** while running locally at ~45 tok/s and with zero API spend.
3️⃣ The same Unsloth build is ~5x faster than Qwen's **Qwen3-32B**, which scores **82.20%** as well yet crawls at <10 tok/s.
4️⃣ On Apple silicon, the **30B MLX** port hits **79.51%** while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups.
5️⃣ The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** - that's why it's not even on the graph (50 % performance cut-off).

All local runs were done with LM Studio on an M4 MacBook Pro, using Qwen's official recommended settings.

**Conclusion:** Quantised 30B models now get you ~98 % of frontier-class accuracy - at a fraction of the latency, cost, and energy. For most local RAG or agent workloads, they're not just good enough - they're the new default.

Well done, Qwen - you really whipped the llama's ass! And to OpenAI: for your upcoming open model, please make it MoE, with toggleable reasoning, and release it in many sizes. *This* is the future!
·