changed nemo toolkit to nemo framework
Browse files
README.md
CHANGED
@@ -278,7 +278,7 @@ To train, fine-tune or transcribe with canary-180m-flash, you will need to insta
|
|
278 |
|
279 |
## How to Use this Model
|
280 |
|
281 |
-
The model is available for use in the NeMo
|
282 |
|
283 |
Please refer to [our tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Canary_Multitask_Speech_Model.ipynb) for more details.
|
284 |
|
@@ -480,7 +480,7 @@ Model Fairness:
|
|
480 |
|
481 |
## Training
|
482 |
|
483 |
-
canary-180m-flash is trained using the NVIDIA NeMo
|
484 |
The model can be trained using this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_multitask/speech_to_text_aed.py) and [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/speech_multitask/fast-conformer_aed.yaml).
|
485 |
|
486 |
The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
@@ -628,7 +628,7 @@ canary-180m-flash is released under the CC-BY-4.0 license. By using this model,
|
|
628 |
|
629 |
[6] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
|
630 |
|
631 |
-
[7] [NVIDIA NeMo
|
632 |
|
633 |
[8] [EMMeTT: Efficient Multimodal Machine Translation Training](https://arxiv.org/abs/2409.13523)
|
634 |
|
|
|
278 |
|
279 |
## How to Use this Model
|
280 |
|
281 |
+
The model is available for use in the NeMo framework [7], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
|
282 |
|
283 |
Please refer to [our tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Canary_Multitask_Speech_Model.ipynb) for more details.
|
284 |
|
|
|
480 |
|
481 |
## Training
|
482 |
|
483 |
+
canary-180m-flash is trained using the NVIDIA NeMo framework [7] for a total of 219K steps with 2D bucketing [1] and optimal batch sizes set using OOMptimizer [8]. The model is trained on 32 NVIDIA A100 80GB GPUs.
|
484 |
The model can be trained using this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_multitask/speech_to_text_aed.py) and [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/speech_multitask/fast-conformer_aed.yaml).
|
485 |
|
486 |
The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
|
|
628 |
|
629 |
[6] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
|
630 |
|
631 |
+
[7] [NVIDIA NeMo Framework](https://github.com/NVIDIA/NeMo)
|
632 |
|
633 |
[8] [EMMeTT: Efficient Multimodal Machine Translation Training](https://arxiv.org/abs/2409.13523)
|
634 |
|