espnet
/

owsm_v4_base_102M

Automatic Speech Recognition

speech-translation

language-identification

Model card Files Files and versions

pyf98 commited on 5 days ago

Commit

4290277

·

verified ·

1 Parent(s): 5128950

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ OWSM aims to develop fully open speech foundation models using publicly availabl
 Inference examples can be found on our [project page](https://www.wavlab.org/activities/2024/owsm/).
 The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
-[OWSM v4]() is the latest version in the OWSM series, which significantly outperforms OWSM v3.1 in LID and multilingual ASR.
 Additionally, OWSM v4 applies 8 times subsampling (instead of 4 times in OWSM v3.1) to the log Mel features, leading to a final resolution of 80 ms in the encoder.
 When running inference, we recommend setting `maxlenratio=1.0` (default) instead of smaller values.

 Inference examples can be found on our [project page](https://www.wavlab.org/activities/2024/owsm/).
 The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
+[OWSM v4](https://arxiv.org/abs/2506.00338) is the latest version in the OWSM series, which significantly outperforms OWSM v3.1 in LID and multilingual ASR.
 Additionally, OWSM v4 applies 8 times subsampling (instead of 4 times in OWSM v3.1) to the log Mel features, leading to a final resolution of 80 ms in the encoder.
 When running inference, we recommend setting `maxlenratio=1.0` (default) instead of smaller values.