nvidia
/

stt_hr_conformer_transducer_large

@@ -95,7 +95,7 @@ Full config can be found inside the `.nemo` files.
 ### Datasets
-All the models in this collection are trained on ParlaSpeech-HR v1.0 Croatian dataset, which contains around 1665 hours of training data, 2.2 hours of development and 2.3 hours of test data after data cleaning.
 ## Performance
@@ -117,4 +117,10 @@ Since the model is trained just on ParlaSpeech-HR v1.0 dataset, the performance
 - [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
-- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)

 ### Datasets
+All the models in this collection are trained on ParlaSpeech-HR v1.0 Croatian dataset [4,5], which contains around 1665 hours of training data, 2.2 hours of development and 2.3 hours of test data after data cleaning.
 ## Performance
 - [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
+- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
+- [4] [ParlaSpeech-HR dataset](http://hdl.handle.net/11356/1494)
+- [5] [ParlaSpeech-HR - a Freely Available ASR Dataset for Croatian Bootstrapped from the ParlaMint Corpus](https://aclanthology.org/2022.parlaclarin-1.16/)
+  -