Japanese?
I know you can do whatever you want, but just understand French and German is just popular and not spoken by many In the world, people who learns German and French language even they watch more Japanese content like anime and all In the world which is more popular, so transcription is required more for Japanese content than german and french language, people transcribes Anime and translates them in their languages, Korean content like kdrama, korean tv shows, Japanese tv shows are mostly in large number watched by non-native speakers, than those 2 german and french language, it's a reality so please I've been waiting for a Japanese speech to text with accurate world level timestamps better than any whisper timestamps out there since 2 years just make us happy by doing that, Thanks for you effort.
Hi
@riken12
, please check out the NIM for canary-1b: https://build.nvidia.com/nvidia/canary-1b-asr.
This model supports automatic speech-to-text recognition and automatic speech-to-text translation in following languages: Arabic (ar-AR), English (en-US, en-GB), Spanish (es-US, es-ES), German (de-DE), French (fr-FR), Hindi (hi-IN), Italian (it-IT), Portuguese (pt-BR), Japanese (ja-JP), Korean (ko-KR), and Russian (ru-RU) with punctuation and capitalization (PnC). Mandarin (zh-CN) is supported as a target language in the translation task.
Does it gives word level timestamps accurately than whisperX? and can I install It locally for my personal use case.