nvidia
/

canary-180m-flash

@@ -381,7 +381,7 @@ python scripts/speech_to_text_aed_chunked_infer.py \
 ## Software Integration:
 **Runtime Engine(s):**
-* NeMo - 2.3.0 or higher <br>
 **Supported Hardware Microarchitecture Compatibility:** <br>
 * [NVIDIA Ampere] <br>
@@ -504,16 +504,16 @@ WER on [HuggingFace OpenASR leaderboard](https://huggingface.co/spaces/hf-audio/
 | **Version** | **Model**     | **RTFx**   | **AMI**   | **GigaSpeech**   | **LS Clean**   | **LS Other**   | **Earnings22**   | **SPGISpech**   | **Tedlium**   | **Voxpopuli**   |
 |:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
-| 2.2.0  | canary-180m-flash | 1233 | 14.86 | 10.51 | 1.87 | 3.83 | 13.33 | 2.26 | 3.98 | 6.35 |
 #### Inference speed on different systems
 We profiled inference speed on the OpenASR benchmark using the [real-time factor](https://github.com/NVIDIA/DeepLearningExamples/blob/master/Kaldi/SpeechRecognition/README.md#metrics) (RTFx) to quantify throughput.
 | **Version** | **Model**     | **System**   | **RTFx**   |
 |:-----------:|:-------------:|:------------:|:----------:|
-| 2.2.0 | canary-180m-flash | NVIDIA A100 | 1233 |
-| 2.2.0 | canary-180m-flash | NVIDIA H100 | 2041 |
-| 2.2.0 | canary-180m-flash | NVIDIA B200 | 2357 |
@@ -522,13 +522,13 @@ WER on [MLS](https://huggingface.co/datasets/facebook/multilingual_librispeech)
 | **Version** | **Model**  | **De**   | **Es**   | **Fr**   |
 |:---------:|:-----------:|:------:|:------:|:------:|
-| 2.2.0   | canary-180m-flash | 4.81 | 3.17 | 4.75 |
 WER on [MCV-16.1](https://commonvoice.mozilla.org/en/datasets) test set:
 | **Version** | **Model**  |  **En**   | **De**   | **Es**   | **Fr**   |
 |:---------:|:-----------:|:------:|:------:|:------:|:------:|
-| 2.2.0   | canary-180m-flash | 9.53 | 5.94 | 4.90 | 8.19 |
 More details on evaluation can be found at [HuggingFace ASR Leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard)
@@ -543,13 +543,13 @@ BLEU score:
 | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | **De->En** | **Es->En** | **Fr->En** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
-| 2.2.0       | canary-180m-flash | 	28.18   |  20.47   |   36.66    |   32.08   |   20.09    |  29.75    |
 COMET score:
 | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | **De->En** | **Es->En** | **Fr->En** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
-| 2.2.0       | canary-180m-flash | 	77.56   |  78.10   |   78.53    |   83.03   |   81.48    |  82.28    |
 [COVOST-v2](https://github.com/facebookresearch/covost) test set:
@@ -557,13 +557,13 @@ BLEU score:
 | **Version** | **Model** | **De->En** | **Es->En** | **Fr->En** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|
-| 2.2.0       | canary-180m-flash |   35.61   |    39.84    |   38.57    |
 COMET score:
 | **Version** | **Model** | **De->En** | **Es->En** | **Fr->En** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|
-| 2.2.0       | canary-180m-flash |   80.94   |    84.54    |   82.50    |
 [mExpresso](https://huggingface.co/facebook/seamless-expressive#mexpresso-multilingual-expresso) test set:
@@ -571,13 +571,13 @@ BLEU score:
 | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|
-| 2.2.0       | canary-180m-flash |   21.60    |   33.45   |   25.96   |
 COMET score:
 | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|
-| 2.2.0       | canary-180m-flash |   77.71    |   80.87   |   77.82   |
 ### Timestamp Prediction
@@ -585,7 +585,7 @@ F1-score on [Librispeech Test sets](https://www.openslr.org/12) at collar value
 | **Version** | **Model** | **test-clean** | **test-other** |
 |:-----------:|:---------:|:----------:|:----------:|
-| 2.2.0       | canary-180m-flash |   93.48    |   91.38   |
 ### Hallucination Robustness
@@ -593,14 +593,14 @@ Number of characters per minute on [MUSAN](https://www.openslr.org/17) 48 hrs ev
 | **Version** | **Model** | **# of character per minute** |
 |:-----------:|:---------:|:----------:|
-| 2.2.0       | canary-180m-flash |   91.52   |
 ### Noise Robustness
 WER on [Librispeech Test Clean](https://www.openslr.org/12) at different SNR (signal to noise ratio) levels of additive white noise
 | **Version** | **Model** | **SNR 10** | **SNR 5** | **SNR 0** | **SNR -5** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|
-| 2.2.0       | canary-180m-flash |    3.23   |   5.34   |   12.21   |    34.03  |
 ## Model Fairness Evaluation

 ## Software Integration:
 **Runtime Engine(s):**
+* NeMo - main <br>
 **Supported Hardware Microarchitecture Compatibility:** <br>
 * [NVIDIA Ampere] <br>
 | **Version** | **Model**     | **RTFx**   | **AMI**   | **GigaSpeech**   | **LS Clean**   | **LS Other**   | **Earnings22**   | **SPGISpech**   | **Tedlium**   | **Voxpopuli**   |
 |:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
+| main  | canary-180m-flash | 1233 | 14.86 | 10.51 | 1.87 | 3.83 | 13.33 | 2.26 | 3.98 | 6.35 |
 #### Inference speed on different systems
 We profiled inference speed on the OpenASR benchmark using the [real-time factor](https://github.com/NVIDIA/DeepLearningExamples/blob/master/Kaldi/SpeechRecognition/README.md#metrics) (RTFx) to quantify throughput.
 | **Version** | **Model**     | **System**   | **RTFx**   |
 |:-----------:|:-------------:|:------------:|:----------:|
+| main | canary-180m-flash | NVIDIA A100 | 1233 |
+| main | canary-180m-flash | NVIDIA H100 | 2041 |
+| main | canary-180m-flash | NVIDIA B200 | 2357 |
 | **Version** | **Model**  | **De**   | **Es**   | **Fr**   |
 |:---------:|:-----------:|:------:|:------:|:------:|
+| main   | canary-180m-flash | 4.81 | 3.17 | 4.75 |
 WER on [MCV-16.1](https://commonvoice.mozilla.org/en/datasets) test set:
 | **Version** | **Model**  |  **En**   | **De**   | **Es**   | **Fr**   |
 |:---------:|:-----------:|:------:|:------:|:------:|:------:|
+| main   | canary-180m-flash | 9.53 | 5.94 | 4.90 | 8.19 |
 More details on evaluation can be found at [HuggingFace ASR Leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard)
 | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | **De->En** | **Es->En** | **Fr->En** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
+| main       | canary-180m-flash | 	28.18   |  20.47   |   36.66    |   32.08   |   20.09    |  29.75    |
 COMET score:
 | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | **De->En** | **Es->En** | **Fr->En** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
+| main       | canary-180m-flash | 	77.56   |  78.10   |   78.53    |   83.03   |   81.48    |  82.28    |
 [COVOST-v2](https://github.com/facebookresearch/covost) test set:
 | **Version** | **Model** | **De->En** | **Es->En** | **Fr->En** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|
+| main       | canary-180m-flash |   35.61   |    39.84    |   38.57    |
 COMET score:
 | **Version** | **Model** | **De->En** | **Es->En** | **Fr->En** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|
+| main       | canary-180m-flash |   80.94   |    84.54    |   82.50    |
 [mExpresso](https://huggingface.co/facebook/seamless-expressive#mexpresso-multilingual-expresso) test set:
 | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|
+| main       | canary-180m-flash |   21.60    |   33.45   |   25.96   |
 COMET score:
 | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|
+| main       | canary-180m-flash |   77.71    |   80.87   |   77.82   |
 ### Timestamp Prediction
 | **Version** | **Model** | **test-clean** | **test-other** |
 |:-----------:|:---------:|:----------:|:----------:|
+| main       | canary-180m-flash |   93.48    |   91.38   |
 ### Hallucination Robustness
 | **Version** | **Model** | **# of character per minute** |
 |:-----------:|:---------:|:----------:|
+| main       | canary-180m-flash |   91.52   |
 ### Noise Robustness
 WER on [Librispeech Test Clean](https://www.openslr.org/12) at different SNR (signal to noise ratio) levels of additive white noise
 | **Version** | **Model** | **SNR 10** | **SNR 5** | **SNR 0** | **SNR -5** |
 |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|
+| main       | canary-180m-flash |    3.23   |   5.34   |   12.21   |    34.03  |
 ## Model Fairness Evaluation