nvidia
/

canary-180m-flash

Automatic Speech Recognition

automatic-speech-translation

hf-asr-leaderboard

Model card Files Files and versions Community

ankitapasad commited on 6 days ago

Commit

92a6f85

·

verified ·

1 Parent(s): 903c2e5

changed nemo toolkit to nemo framework

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -278,7 +278,7 @@ To train, fine-tune or transcribe with canary-180m-flash, you will need to insta
 ## How to Use this Model
-The model is available for use in the NeMo toolkit [7], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
 Please refer to [our tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Canary_Multitask_Speech_Model.ipynb) for more details.
@@ -480,7 +480,7 @@ Model Fairness:
 ## Training
-canary-180m-flash is trained using the NVIDIA NeMo toolkit [7] for a total of 219K steps with 2D bucketing [1] and optimal batch sizes set using OOMptimizer [8]. The model is trained on 32 NVIDIA A100 80GB GPUs.
 The model can be trained using this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_multitask/speech_to_text_aed.py) and [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/speech_multitask/fast-conformer_aed.yaml).
 The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
@@ -628,7 +628,7 @@ canary-180m-flash is released under the CC-BY-4.0 license. By using this model,
 [6] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
-[7] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
 [8] [EMMeTT: Efficient Multimodal Machine Translation Training](https://arxiv.org/abs/2409.13523)

 ## How to Use this Model
+The model is available for use in the NeMo framework [7], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
 Please refer to [our tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Canary_Multitask_Speech_Model.ipynb) for more details.
 ## Training
+canary-180m-flash is trained using the NVIDIA NeMo framework [7] for a total of 219K steps with 2D bucketing [1] and optimal batch sizes set using OOMptimizer [8]. The model is trained on 32 NVIDIA A100 80GB GPUs.
 The model can be trained using this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_multitask/speech_to_text_aed.py) and [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/speech_multitask/fast-conformer_aed.yaml).
 The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
 [6] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
+[7] [NVIDIA NeMo Framework](https://github.com/NVIDIA/NeMo)
 [8] [EMMeTT: Efficient Multimodal Machine Translation Training](https://arxiv.org/abs/2409.13523)