Aurélien-Morgan CLAUDON

Aurelien-Morgan

AI & ML interests

None yet

Recent Activity

Organizations

ONNXConfig for all's profile picture Gradio-Blocks-Party's profile picture Keras Dreambooth Event's profile picture Blog-explorers's profile picture OpenLLM France's profile picture huggingPartyParis's profile picture ZeroGPU Explorers's profile picture LocalLLaMA's profile picture Cohere Labs Community's profile picture Open RL Leaderboard's profile picture Chinese LLMs on Hugging Face's profile picture Paris AI Running Club's profile picture cvmistralparis's profile picture Hugging Face Discord Community's profile picture Hugging Face Party @ PyTorch Conference's profile picture Nerdy Face's profile picture retrain-pipelines's profile picture

Aurelien-Morgan's activity

published an article 7 days ago
posted an update 8 days ago
view post
Post
3094
The Almighty function-caller

How would you like to build smart GenAi infrastructure ?
Give extensive tools memory to your edge agentic system,
And optimize the resources it takes to run yet a high-performance set of agents ?

We came up with a novel approach to function-calling at scale for smart companies and corporate-grade use-cases.

Read our full-fledged blog article on this here on Hugging Face :
https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller
reacted to danielhanchen's post with 🔥 9 days ago
view post
Post
5684
🦥 Introducing Unsloth Dynamic v2.0 GGUFs!
Our v2.0 quants set new benchmarks on 5-shot MMLU and KL Divergence, meaning you can now run & fine-tune quantized LLMs while preserving as much accuracy as possible.

Llama 4: unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
DeepSeek-R1: unsloth/DeepSeek-R1-GGUF-UD
Gemma 3: unsloth/gemma-3-27b-it-GGUF

We made selective layer quantization much smarter. Instead of modifying only a subset of layers, we now dynamically quantize all layers so every layer has a different bit. Now, our dynamic method can be applied to all LLM architectures, not just MoE's.

Blog with Details: https://docs.unsloth.ai/basics/dynamic-v2.0

All our future GGUF uploads will leverage Dynamic 2.0 and our hand curated 300K–1.5M token calibration dataset to improve conversational chat performance.

For accurate benchmarking, we built an evaluation framework to match the reported 5-shot MMLU scores of Llama 4 and Gemma 3. This allowed apples-to-apples comparisons between full-precision vs. Dynamic v2.0, QAT and standard iMatrix quants.

Dynamic v2.0 aims to minimize the performance gap between full-precision models and their quantized counterparts.
posted an update 9 days ago
view post
Post
645
retrain-pipelines 0.1.2 finally dropped. It comes with a hot Hugging Face Hub integration. Go check it out. We have 2 articles about it coming up. One already fully written so, be on the lookout !
@retrain-pipelines

Also, I'll be volunteering at GOSIM AI Paris 2025. If you're interested in chatting, hmu.
New activity in vlmbook/images 15 days ago

God speed

1
#1 opened 15 days ago by
Aurelien-Morgan
upvoted an article 19 days ago
view article
Article

Cohere on Hugging Face Inference Providers 🔥

124
upvoted an article 20 days ago
view article
Article

Hugging Face to sell open-source robots thanks to Pollen Robotics acquisition 🤖

42
reacted to jsulz's post with 🧠 27 days ago
view post
Post
3106
What does it mean when models share the same bytes?

We've investigated some quants and have seen that a considerable portion of quantizations of the same model share the same bytes and can be deduplicated to save considerable upload time for quantizers on the Hub.

This space where we crack open a repo from @bartowski shows we can get significant dedupe xet-team/quantization-dedup

You can get a sense of why by reading this write-up: https://github.com/bartowski1182/llm-knowledge/blob/main/quantization/quantization.md

But what about finetuned models?

Since going into production the xet-team has migrated hundreds of repositories on the Hub to our storage layer, including classic "pre-Hub" open-source models like FacebookAI/xlm-roberta-large (XLM-R) from FacebookAI

XLM-R, introduced in 2019, set new benchmarks for multilingual NLP by learning shared representations across 100 languages. It was then fine-tuned on English, Spanish, Dutch, and German, generating language-specific derivations for each - check out the paper here Unsupervised Cross-lingual Representation Learning at Scale (1911.02116)

These finetunes share much of the same architecture and layout as XLM-R with similar training methods and goals. It makes sense that they would share bytes, but it's still fascinating to see.

We put together a similar space to explore these models to see where they overlap - check it out for yourself xet-team/finetune-dedupe

The darker each block in the heatmap, the more the bytes are shared. Clicking on a repos blocks shows all other repos that share blocks.
  • 1 reply
·
upvoted an article about 1 month ago
view article
Article

The New and Fresh analytics in Inference Endpoints

19
New activity in huggingface/HuggingDiscussions about 1 month ago

[FEEDBACK] Follow

12
25
#14 opened over 1 year ago by
victor