Bertrand Chevrier's picture

Bertrand Chevrier

kramp

AI & ML interests

text 2 speech, ai for music writting

Recent Activity

Organizations

Hugging Face's profile picture Team 7's profile picture huggingPartyParis's profile picture Social Post Explorers's profile picture private beta for deeplinks's profile picture Fine Video's profile picture Hugging Face FineVideo's profile picture Changelog's profile picture

kramp's activity

upvoted a changelog 12 days ago
view changelog
Changelog

AI-generated Abstract summaries on Hugging Face Papers

65
upvoted a changelog 13 days ago
view changelog
Changelog

Filter by MCP compatibility available in HF Spaces

70
upvoted 2 articles 14 days ago
view article
Article

The Transformers Library: standardizing model definitions

By lysandre and 3 others
109
reacted to AdinaY's post with 🔥 14 days ago
view post
Post
2396
Dolphin 🔥 A multimodal document image parsing model from ByteDance
, built on an analyze-then-parse paradigm.

ByteDance/Dolphin

✨ MIT licensed
✨ Handles text, tables, figures & formulas via:
- Reading-order layout analysis
- Parallel parsing with smart prompts

upvoted an article 26 days ago
view article
Article

AI Personas: The Impact of Design Choices

By giadap and 1 other
14
reacted to wolfram's post with 👍 26 days ago
view post
Post
7166
Finally finished my extensive **Qwen 3 evaluations** across a range of formats and quantisations, focusing on **MMLU-Pro** (Computer Science).

A few take-aways stood out - especially for those interested in local deployment and performance trade-offs:

1️⃣ **Qwen3-235B-A22B** (via Fireworks API) tops the table at **83.66%** with ~55 tok/s.
2️⃣ But the **30B-A3B Unsloth** quant delivered **82.20%** while running locally at ~45 tok/s and with zero API spend.
3️⃣ The same Unsloth build is ~5x faster than Qwen's **Qwen3-32B**, which scores **82.20%** as well yet crawls at <10 tok/s.
4️⃣ On Apple silicon, the **30B MLX** port hits **79.51%** while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups.
5️⃣ The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** - that's why it's not even on the graph (50 % performance cut-off).

All local runs were done with LM Studio on an M4 MacBook Pro, using Qwen's official recommended settings.

**Conclusion:** Quantised 30B models now get you ~98 % of frontier-class accuracy - at a fraction of the latency, cost, and energy. For most local RAG or agent workloads, they're not just good enough - they're the new default.

Well done, Qwen - you really whipped the llama's ass! And to OpenAI: for your upcoming open model, please make it MoE, with toggleable reasoning, and release it in many sizes. *This* is the future!
·