GetmanY1
/

wav2vec2-xlarge-fi-150k

Automatic Speech Recognition

Model card Files Files and versions

GetmanY1 commited on 9 days ago

Commit

eacc6a2

·

verified ·

1 Parent(s): 49ec877

Update README.md

Files changed (1) hide show

README.md +17 -1

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ The x-large model pre-trained on 16kHz sampled speech audio. When using the mode
 The Finnish Wav2Vec2 X-Large has the same architecture and uses the same training objective as the multilingual one described in [paper](https://www.isca-archive.org/interspeech_2022/babu22_interspeech.pdf). It is pre-trained on 158k hours of unlabeled Finnish speech, including [KAVI radio and television archive materials](https://kavi.fi/en/radio-ja-televisioarkistointia-vuodesta-2008/), Lahjoita puhetta (Donate Speech), Finnish Parliament, Finnish VoxPopuli.
-You can read more about the pre-trained model from [this paper](TODO). The training scripts are available on [GitHub](https://github.com/aalto-speech/large-scale-monolingual-speech-foundation-models).
 ## Intended uses & limitations
@@ -105,6 +105,22 @@ The pre-trained model was initialized with the following hyperparameters:
 - Pytorch 1.13.1+rocm5.2
 - Fairseq 0.12.2
 ## Team Members
 - Yaroslav Getman, [Hugging Face profile](https://huggingface.co/GetmanY1), [LinkedIn profile](https://www.linkedin.com/in/yaroslav-getman/)

 The Finnish Wav2Vec2 X-Large has the same architecture and uses the same training objective as the multilingual one described in [paper](https://www.isca-archive.org/interspeech_2022/babu22_interspeech.pdf). It is pre-trained on 158k hours of unlabeled Finnish speech, including [KAVI radio and television archive materials](https://kavi.fi/en/radio-ja-televisioarkistointia-vuodesta-2008/), Lahjoita puhetta (Donate Speech), Finnish Parliament, Finnish VoxPopuli.
+You can read more about the pre-trained model from [this paper](https://www.isca-archive.org/interspeech_2025/getman25_interspeech.html). The training scripts are available on [GitHub](https://github.com/aalto-speech/large-scale-monolingual-speech-foundation-models).
 ## Intended uses & limitations
 - Pytorch 1.13.1+rocm5.2
 - Fairseq 0.12.2
+## Citation
+If you use our models or scripts, please cite our article as:
+```bibtex
+@inproceedings{getman25_interspeech,
+  title     = {{Is your model big enough? Training and interpreting large-scale monolingual speech foundation models}},
+  author    = {{Yaroslav Getman and Tamás Grósz and Tommi Lehtonen and Mikko Kurimo}},
+  year      = {{2025}},
+  booktitle = {{Interspeech 2025}},
+  pages     = {{231--235}},
+  doi       = {{10.21437/Interspeech.2025-46}},
+  issn      = {{2958-1796}},
+}
+```
 ## Team Members
 - Yaroslav Getman, [Hugging Face profile](https://huggingface.co/GetmanY1), [LinkedIn profile](https://www.linkedin.com/in/yaroslav-getman/)