Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ The large model pre-trained on 16kHz sampled speech audio. When using the model
|
|
20 |
|
21 |
The Finnish Wav2Vec2 Large has the same architecture and uses the same training objective as the English and multilingual one described in [Paper](https://arxiv.org/abs/2006.11477). It is pre-trained on 158k hours of unlabeled Finnish speech, including [KAVI radio and television archive materials](https://kavi.fi/en/radio-ja-televisioarkistointia-vuodesta-2008/), Lahjoita puhetta (Donate Speech), Finnish Parliament, Finnish VoxPopuli.
|
22 |
|
23 |
-
You can read more about the pre-trained model from [this paper](
|
24 |
|
25 |
## Intended uses & limitations
|
26 |
|
@@ -105,6 +105,22 @@ The pre-trained model was initialized with the following hyperparameters:
|
|
105 |
- Pytorch 1.13.1+rocm5.2
|
106 |
- Fairseq 0.12.2
|
107 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
108 |
## Team Members
|
109 |
|
110 |
- Yaroslav Getman, [Hugging Face profile](https://huggingface.co/GetmanY1), [LinkedIn profile](https://www.linkedin.com/in/yaroslav-getman/)
|
|
|
20 |
|
21 |
The Finnish Wav2Vec2 Large has the same architecture and uses the same training objective as the English and multilingual one described in [Paper](https://arxiv.org/abs/2006.11477). It is pre-trained on 158k hours of unlabeled Finnish speech, including [KAVI radio and television archive materials](https://kavi.fi/en/radio-ja-televisioarkistointia-vuodesta-2008/), Lahjoita puhetta (Donate Speech), Finnish Parliament, Finnish VoxPopuli.
|
22 |
|
23 |
+
You can read more about the pre-trained model from [this paper](https://www.isca-archive.org/interspeech_2025/getman25_interspeech.html). The training scripts are available on [GitHub](https://github.com/aalto-speech/large-scale-monolingual-speech-foundation-models).
|
24 |
|
25 |
## Intended uses & limitations
|
26 |
|
|
|
105 |
- Pytorch 1.13.1+rocm5.2
|
106 |
- Fairseq 0.12.2
|
107 |
|
108 |
+
## Citation
|
109 |
+
|
110 |
+
If you use our models or scripts, please cite our article as:
|
111 |
+
|
112 |
+
```bibtex
|
113 |
+
@inproceedings{getman25_interspeech,
|
114 |
+
title = {{Is your model big enough? Training and interpreting large-scale monolingual speech foundation models}},
|
115 |
+
author = {{Yaroslav Getman and Tam谩s Gr贸sz and Tommi Lehtonen and Mikko Kurimo}},
|
116 |
+
year = {{2025}},
|
117 |
+
booktitle = {{Interspeech 2025}},
|
118 |
+
pages = {{231--235}},
|
119 |
+
doi = {{10.21437/Interspeech.2025-46}},
|
120 |
+
issn = {{2958-1796}},
|
121 |
+
}
|
122 |
+
```
|
123 |
+
|
124 |
## Team Members
|
125 |
|
126 |
- Yaroslav Getman, [Hugging Face profile](https://huggingface.co/GetmanY1), [LinkedIn profile](https://www.linkedin.com/in/yaroslav-getman/)
|