Ed Addario's picture

Ed Addario PRO

eaddario

AI & ML interests

None yet

Recent Activity

updated a model 3 days ago
eaddario/Qwen3-30B-A3B-pruned-GGUF
published a model 5 days ago
eaddario/Qwen3-30B-A3B-pruned-GGUF
updated a model 5 days ago
eaddario/Qwen3-30B-A3B-GGUF
View all activity

Organizations

Hugging Face Discord Community's profile picture

eaddario's activity

New activity in eaddario/Qwen3-30B-A3B-GGUF 6 days ago
replied to their post 6 days ago
reacted to AdinaY's post with 👍 7 days ago
reacted to danieldk's post with 🤗 7 days ago
view post
Post
1540
We have been working on a project called kernels. kernels makes it possible to load compute kernels directly from the Hub! 🚀

We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:

- New layer API with torch.compile support.
- Experimental support for loading Apple Silicon Metal 🤘 Kernels.
- Generate wheels from Hub kernels for legacy deployments.

Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0
reacted to MJannik's post with 🤝 7 days ago
view post
Post
1592
Hi everyone, we’ve got big news! Starting today, all Langfuse product features are available as free OSS (MIT license).

You can now upgrade your self-hosted Langfuse to access features like:
- Managed LLM-as-a-Judge evaluations
- Annotation queues
- Prompt experiments
- LLM playground

We’re incredibly grateful for the support of this amazing community and can’t wait to hear your feedback on the new features!

More on this change here: https://langfuse.com/blog/2025-06-04-open-sourcing-langfuse-product
posted an update 8 days ago
view post
Post
1254
Layer-wise and Pruned versions of google/gemma-3-12b-it

After enhancing llama.cpp to handle user-defined quantization levels for arbitrary tensors (https://github.com/ggml-org/llama.cpp/pull/12511), I have added an option to prune whole layers (https://github.com/ggml-org/llama.cpp/pull/13037), and have published two versions of google/gemma-3-12b-it for demo and testing purposes:

* Tesor-wise: eaddario/gemma-3-12b-it-GGUF
* Pruned: eaddario/gemma-3-12b-it-pruned-GGUF

Even though the Perplexity scores of the pruned version are 3 times higher, the ARC, HellaSwag, MMLU, Truthful QA and WinoGrande scores are holding remarkably well, considering two layers were removed (26 and 29). This seems to support Xin Men et al conclusions in ShortGPT: Layers in Large Language Models are More Redundant Than You Expect (2403.03853)

Results summary in the model's card and test results in the ./scores directory. Questions/feedback is always welcomed.