5 4 6

Ed Addario PRO

eaddario

EAddario

AI & ML interests

None yet

Recent Activity

posted an update about 16 hours ago

Layer-wise and Pruned versions of cognitivecomputations/Dolphin-Mistral-24B-Venice-Edition * Tesor-wise: https://huggingface.co/eaddario/Dolphin-Mistral-24B-Venice-Edition-GGUF * Pruned: https://huggingface.co/eaddario/Dolphin-Mistral-24B-Venice-Edition-pruned-GGUF Summary in the model's card and test results in the ./scores directory. Questions/feedback is always welcomed.

reacted to AdinaY's post with 🚀 about 16 hours ago

The Chinese Open Source Heatmap is live 🔥 You can now track the companies/ research labs/ communities powering China’s open source AI movement. https://huggingface.co/spaces/zh-ai-community/model-release-heatmap-zh Some highlights: ✨Giant Tech are investing more in open source. -Alibaba: Full stack open ecosystem -Tecent: Hunyuan image/video/3D -Bytedance: Catching up fast in 2025 -Baidu: New player in open LLM ✨New players emerging post–DeepSeek moment. -Xiaomi -Red Note -Bilibili -MiniMax -Moonshot AI ✨Startup list is shifting fast! Those who find a direction aligned with their strengths are the ones who endure. -DeepSeek -MiniMax -StepFun -Moonshot AI -Zhipu AI -OpenBMB ✨Research Lab & Community are making key contributions. -BAAI -Shanghai AI Lab -OpenMOSS -MAP

reacted to stefan-french's post with 👍 about 16 hours ago

🚀 We just released the WASM Agent Blueprint! It shows how to run Python-based AI agents directly in your browser using WebAssembly (WASM) via Pyodide and the OpenAI Agents SDK. There are no installs, it runs straight in your browser. Try it out and explore the code 👉 https://github.com/mozilla-ai/wasm-agents-blueprint

View all activity

Organizations

Posts 12

Post

120

Layer-wise and Pruned versions of cognitivecomputations/Dolphin-Mistral-24B-Venice-Edition

* Tesor-wise: eaddario/Dolphin-Mistral-24B-Venice-Edition-GGUF

* Pruned: eaddario/Dolphin-Mistral-24B-Venice-Edition-pruned-GGUF

Summary in the model's card and test results in the ./scores directory. Questions/feedback is always welcomed.

Post

1664

Model pruning for the masses!

As of version [5740](https://github.com/ggml-org/llama.cpp/releases/tag/b5740), llama-quantize now supports layer pruning via the --prune-layers flag!

Findings so far are that removing one or two layers has a relatively moderate impact on quality. PPL and KLD suffer quite a lot, as expected considering that pruning changes the logits distribution, but the drop in inference quality, as reflected by tests' scores, is less pronounced.

For example, using the Q4_K_M variants as a benchmark, the average drop between eaddario/gemma-3-12b-it-pruned-GGUF and eaddario/gemma-3-12b-it-GGUF is < 3% (60.03 vs 61.65). Similar behaviour for eaddario/Qwen3-30B-A3B-pruned-GGUF and eaddario/Qwen3-30B-A3B-GGUF, albeit with a bit higher impact at ~5.5% (54.19 vs 57.36).

These results seem to confirm Xin Men's et al ShortGPT: Layers in Large Language Models are More Redundant Than You Expect (2403.03853)

Another interesting side-effect, at least with Qwen3-30B-A3B, is that pruning 3 or more layers makes the model forget English and reply in Chinese! but with still reasonable answers.

View all Posts

models 16

datasets 1

eaddario/imatrix-calibration

Updated May 24 • 2.97k • 3