Nicolas Patry

Narsil

AI & ML interests

None yet

Recent Activity

updated a model 14 days ago
Narsil/silero
published a model 14 days ago
Narsil/silero
reacted to fdaudens's post with ❤️ 16 days ago
Tried something new: an AI-generated podcast that breaks down the top research paper each day. Fully automated, now live on Spotify. I built this prototype to help keep up with the rapid pace of AI developments and, hopefully, make cutting-edge research more accessible. I don’t know about you, but just listening to a conversation about a paper really helps the content sink in for me. This build taught me a lot about full automation. If you’re into the technical weeds: Qwen3 runs on Inference to handle the script, Kokoro does the voice, and the whole thing gets published automatically thanks to the Hugging Face Jobs API and Gradio deployment. It’s not perfect yet — I’ll be monitoring for hallucinations and incoherence. The voice model still needs polish, but it’s a promising start. Would love to build this with the community — submit a PR or send feedback. It’s just a beta of an experimental idea! Big kudos to @m-ric, whose Open NotebookLM this is based on, and to @nielsr for his terrific work making research papers more accessible. - Podcast on Spotify: https://open.spotify.com/show/3PTucIW1w1GIkqTYm32ka7?si=c7a851f83e6d4331 (Apple Podcasts soon) - Code: https://huggingface.co/spaces/fdaudens/podcast-jobs - Open NotebookLM: https://huggingface.co/spaces/m-ric/open-notebooklm - Also super helpful, @qgallouedec's tutorial on HF Jobs API: https://huggingface.co/spaces/qgallouedec/run-hello-world/blob/main/README.md
View all activity

Organizations

Hugging Face's profile picture Safetensors's profile picture BigScience Workshop's profile picture Hugging Face Internal Testing Organization's profile picture superb's profile picture Deepmind's profile picture Text Generation Inference's profile picture BigScience Catalogue Data Dev's profile picture HuggingFaceM4's profile picture Hugging Face H4's profile picture Hugging Face Extreme-Scale's profile picture H4 Red Team's profile picture Code Llama's profile picture gg-hf's profile picture On-device Squad's profile picture hsramall's profile picture Tinkering's profile picture gg-tt's profile picture Hugging Face Discord Community's profile picture Meta Llama's profile picture nltpt's profile picture s0409's profile picture kernels-community's profile picture Kernels Tests's profile picture kozistr grant org's profile picture yofo's profile picture

Posts 3

view post
Post
1668
Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !



3x more tokens.

By reducing our memory footprint, we’re able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments.
13x faster

On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Dani ël de Kok for the beast data structure.
Zero config

That’s it. Remove all the flags your are using and you’re likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we don’t have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.

Read more: https://huggingface.co/docs/text-generation-inference/conceptual/chunking

Articles 4

Article
14

Hugging Face partners with Wiz Research to Improve AI Security