Régis Pierrard

regisss

AI & ML interests

None yet

Recent Activity

Organizations

Hugging Face's profile picture Habana AI's profile picture Hugging Face Optimum's profile picture group2's profile picture Hugging Face H4's profile picture Hugging Face OSS Metrics's profile picture HuggingFace Doc Builds's profile picture Blog-explorers's profile picture AI Energy Score's profile picture Social Post Explorers's profile picture Hugging Face Machine Learning Optimization's profile picture Optimum Internal Testing's profile picture SLLHF's profile picture Privacy Preserving AI Hackathon (Zama, Hugging Face, Entrepreneur First)'s profile picture

regisss's activity

posted an update 7 days ago
view post
Post
1593
Nice paper comparing the fp8 inference efficiency of Nvidia H100 and Intel Gaudi2: An Investigation of FP8 Across Accelerators for LLM Inference (2502.01070)

The conclusion is interesting: "Our findings highlight that the Gaudi 2, by leveraging FP8, achieves higher throughput-to-power efficiency during LLM inference"

One aspect of AI hardware accelerators that is often overlooked is how they consume less energy than GPUs. It's nice to see researchers starting carrying out experiments to measure this!

Gaudi3 results soon...
reacted to fdaudens's post with 🔥❤️ 8 days ago
view post
Post
2653
⭐️ The AI Energy Score project just launched - this is a game-changer for making informed decisions about AI deployment.

You can now see exactly how much energy your chosen model will consume, with a simple 5-star rating system. Think appliance energy labels, but for AI.

Looking at transcription models on the leaderboard is fascinating: choosing between whisper-tiny or whisper-large-v3 can make a 7x difference. Real-time data on these tradeoffs changes everything.

166 models already evaluated across 10 different tasks, from text generation to image classification. The whole thing is public and you can submit your own models to test.

Why this matters:
- Teams can pick efficient models that still get the job done
- Developers can optimize for energy use from day one
- Organizations can finally predict their AI environmental impact

If you're building with AI at any scale, definitely worth checking out.

👉 leaderboard: https://lnkd.in/esrSxetj
👉 blog post: https://lnkd.in/eFJvzHi8

Huge work led by @sasha with @bgamazay @yjernite @sarahooker @regisss @meg
  • 1 reply
·
New activity in Habana/mamba 11 days ago

Upload 2 files

3
#3 opened 14 days ago by
zzhang37
New activity in regisss/bridgetower-newyorker-a100-8x about 1 month ago
upvoted an article about 1 month ago
view article
Article

Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference

68
posted an update 2 months ago
New activity in AIEnergyScore/Leaderboard 2 months ago
New activity in Habana/mamba 3 months ago

Upload 2 files

#2 opened 3 months ago by
zzhang37
New activity in Habana/mamba 3 months ago

Upload 2 files

2
#1 opened 3 months ago by
zzhang37
reacted to onekq's post with 🔥 4 months ago
view post
Post
1866
I'm now working on finetuning of coding models. If you are GPU-hungry like me, you will find quantized models very helpful. But quantization for finetuning and inference are different and incompatible. So I made two collections here.

Inference (GGUF, via Ollama, CPU is enough)
onekq-ai/ollama-ready-coding-models-67118c3cfa1af2cf04a926d6

Finetuning (Bitsandbytes, QLora, GPU is needed)
onekq-ai/qlora-ready-coding-models-67118771ce001b8f4cf946b2

For quantization, the inference models are far more popular on HF than finetuning models. I use https://huggingface.co/QuantFactory to generate inference models (GGUF), and there are a few other choices.

But there hasn't been such a service for finetuning models. DIY isn't too hard though. I made a few myself and you can find the script in the model cards. If the original model is small enough, you can even do it on a free T4 (available via Google Colab).

If you know a (small) coding model worthy of quantization, please let me know and I'd love to add it to the collections.
posted an update 4 months ago
view post
Post
1418
Interested in performing inference with an ONNX model?⚡️

The Optimum docs about model inference with ONNX Runtime is now much clearer and simpler!

You want to deploy your favorite model on the hub but you don't know how to export it to the ONNX format? You can do it in one line of code as follows:
from optimum.onnxruntime import ORTModelForSequenceClassification

# Load the model from the hub and export it to the ONNX format
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)

Check out the whole guide 👉 https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models
upvoted an article 4 months ago
view article
Article

Organizing a Privacy-preserving Hackathon

By binoua and 1 other
8
published an article 4 months ago
view article
Article

Organizing a Privacy-preserving Hackathon

By binoua and 1 other
8