Post
2593
Forget everything you know about transcription models - NVIDIA's parakeet-tdt-0.6b-v2 changed the game for me!
Just tested it with Steve Jobs' Stanford speech and was speechless (pun intended). The video isn’t sped up.
3 things that floored me:
- Transcription took just 10 seconds for a 15-min file
- Got a CSV with perfect timestamps, punctuation & capitalization
- Stunning accuracy (correctly captured "Reed College" and other specifics)
NVIDIA also released a demo where you can click any transcribed segment to play it instantly.
The improvement is significant: number 1 on the ASR Leaderboard, 6% error rate (best in class) with complete commercial freedom (cc-by-4.0 license).
Time to update those Whisper pipelines! H/t @Steveeeeeeen for the finding!
Model: nvidia/parakeet-tdt-0.6b-v2
Demo: nvidia/parakeet-tdt-0.6b-v2
ASR Leaderboard: hf-audio/open_asr_leaderboard
Just tested it with Steve Jobs' Stanford speech and was speechless (pun intended). The video isn’t sped up.
3 things that floored me:
- Transcription took just 10 seconds for a 15-min file
- Got a CSV with perfect timestamps, punctuation & capitalization
- Stunning accuracy (correctly captured "Reed College" and other specifics)
NVIDIA also released a demo where you can click any transcribed segment to play it instantly.
The improvement is significant: number 1 on the ASR Leaderboard, 6% error rate (best in class) with complete commercial freedom (cc-by-4.0 license).
Time to update those Whisper pipelines! H/t @Steveeeeeeen for the finding!
Model: nvidia/parakeet-tdt-0.6b-v2
Demo: nvidia/parakeet-tdt-0.6b-v2
ASR Leaderboard: hf-audio/open_asr_leaderboard