--- tags: - espnet - audio - automatic-speech-recognition - speech-translation - language-identification language: multilingual datasets: - owsm_ctc_v4 license: cc-by-4.0 metrics: - cer - bleu - accuracy library_name: espnet --- [OWSM-CTC](https://aclanthology.org/2024.acl-long.549/) (Peng et al., ACL 2024) is an encoder-only speech foundation model based on hierarchical multi-task self-conditioned CTC. OWSM-CTC v4 is trained on 320k hours of public audio data for multilingual speech recognition, any-to-any speech translation, and language identification, which follows the design of the project, [Open Whisper-style Speech Model (OWSM)](https://arxiv.org/abs/2401.16658). To use the pre-trained model, please install `espnet` and `espnet_model_zoo`. The requirements are: ``` librosa torch espnet espnet_model_zoo ``` **Example usage can be found in ESPnet:** https://github.com/espnet/espnet/tree/master/egs2/owsm_ctc_v3.1/s2t1