Florent Daudens's picture

Florent Daudens

fdaudens

AI & ML interests

AI & Journalism

Recent Activity

Organizations

Hugging Face's profile picture Hugging Face OSS Metrics's profile picture Hugging Face Smol Models Research's profile picture ZeroGPU Explorers's profile picture LeRobot's profile picture Journalists on Hugging Face's profile picture Major TOM's profile picture MLX Community's profile picture Social Post Explorers's profile picture Projet Spinoza's profile picture Dev Mode Explorers's profile picture Hugging Face for Legal's profile picture Hugging Face Discord Community's profile picture Big Science Social Impact Evaluation for Bias and Stereotypes's profile picture Dataset Tools's profile picture Hugging Face Science's profile picture Coordination Nationale pour l'IA's profile picture Data Is Better Together Contributor's profile picture Sandbox's profile picture Open R1's profile picture

Posts 144

view post
Post
2593
Forget everything you know about transcription models - NVIDIA's parakeet-tdt-0.6b-v2 changed the game for me!

Just tested it with Steve Jobs' Stanford speech and was speechless (pun intended). The video isn’t sped up.

3 things that floored me:
- Transcription took just 10 seconds for a 15-min file
- Got a CSV with perfect timestamps, punctuation & capitalization
- Stunning accuracy (correctly captured "Reed College" and other specifics)

NVIDIA also released a demo where you can click any transcribed segment to play it instantly.

The improvement is significant: number 1 on the ASR Leaderboard, 6% error rate (best in class) with complete commercial freedom (cc-by-4.0 license).

Time to update those Whisper pipelines! H/t @Steveeeeeeen for the finding!

Model: nvidia/parakeet-tdt-0.6b-v2
Demo: nvidia/parakeet-tdt-0.6b-v2
ASR Leaderboard: hf-audio/open_asr_leaderboard

Articles 3

Article
89

The NLP Course is becoming the LLM Course!