For usage instructions follow openai/whisper-large-v3-turbo
Turbo finetune with japanese tokenizer. Trained ~60M sequences with model progressively unfrozen from embeddings, decoder, full. Smaller vocab with ~1.6x bytes/token allows faster speed with 4 layers (10% larger decoder) vs 2 layer distil.
Quality bad. SOTA in short form general japanese but long form degraded too much and hallucination problems. I rescued it a little from a much worse state but probably gone too far to fully fix. (Reazon needs filtering)
Note for faster-whisper vocab changes make model.is_multilingual and suppress_tokens wrong. You shouldn't be using this with faster-whisper as long form is bad, but if you do please adjust the code as required.
Acknowledgements
- Train sets: OOPPEENN, Reazon, Common Voice 20, 小虫哥_, deepghs
- Test sets: KitsuneX07, TEDxJP, kotoba-tech, Saruwatari-lab, grider-withourai
- Downloads last month
- 102
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support