Adding links to the ParlaSpeech dataset / paper
#1
by
nljubesi
- opened
README.md
CHANGED
@@ -95,7 +95,7 @@ Full config can be found inside the `.nemo` files.
|
|
95 |
|
96 |
### Datasets
|
97 |
|
98 |
-
All the models in this collection are trained on ParlaSpeech-HR v1.0 Croatian dataset, which contains around 1665 hours of training data, 2.2 hours of development and 2.3 hours of test data after data cleaning.
|
99 |
|
100 |
## Performance
|
101 |
|
@@ -117,4 +117,10 @@ Since the model is trained just on ParlaSpeech-HR v1.0 dataset, the performance
|
|
117 |
|
118 |
- [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
|
119 |
|
120 |
-
- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
|
96 |
### Datasets
|
97 |
|
98 |
+
All the models in this collection are trained on ParlaSpeech-HR v1.0 Croatian dataset [4,5], which contains around 1665 hours of training data, 2.2 hours of development and 2.3 hours of test data after data cleaning.
|
99 |
|
100 |
## Performance
|
101 |
|
|
|
117 |
|
118 |
- [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
|
119 |
|
120 |
+
- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
121 |
+
|
122 |
+
- [4] [ParlaSpeech-HR dataset](http://hdl.handle.net/11356/1494)
|
123 |
+
|
124 |
+
- [5] [ParlaSpeech-HR - a Freely Available ASR Dataset for Croatian Bootstrapped from the ParlaMint Corpus](https://aclanthology.org/2022.parlaclarin-1.16/)
|
125 |
+
|
126 |
+
-
|