ankitapasad commited on
Commit
d572552
·
verified ·
1 Parent(s): 5a2919b

changed version to main

Browse files
Files changed (1) hide show
  1. README.md +16 -16
README.md CHANGED
@@ -381,7 +381,7 @@ python scripts/speech_to_text_aed_chunked_infer.py \
381
 
382
  ## Software Integration:
383
  **Runtime Engine(s):**
384
- * NeMo - 2.3.0 or higher <br>
385
 
386
  **Supported Hardware Microarchitecture Compatibility:** <br>
387
  * [NVIDIA Ampere] <br>
@@ -504,16 +504,16 @@ WER on [HuggingFace OpenASR leaderboard](https://huggingface.co/spaces/hf-audio/
504
 
505
  | **Version** | **Model** | **RTFx** | **AMI** | **GigaSpeech** | **LS Clean** | **LS Other** | **Earnings22** | **SPGISpech** | **Tedlium** | **Voxpopuli** |
506
  |:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
507
- | 2.2.0 | canary-180m-flash | 1233 | 14.86 | 10.51 | 1.87 | 3.83 | 13.33 | 2.26 | 3.98 | 6.35 |
508
 
509
  #### Inference speed on different systems
510
  We profiled inference speed on the OpenASR benchmark using the [real-time factor](https://github.com/NVIDIA/DeepLearningExamples/blob/master/Kaldi/SpeechRecognition/README.md#metrics) (RTFx) to quantify throughput.
511
 
512
  | **Version** | **Model** | **System** | **RTFx** |
513
  |:-----------:|:-------------:|:------------:|:----------:|
514
- | 2.2.0 | canary-180m-flash | NVIDIA A100 | 1233 |
515
- | 2.2.0 | canary-180m-flash | NVIDIA H100 | 2041 |
516
- | 2.2.0 | canary-180m-flash | NVIDIA B200 | 2357 |
517
 
518
 
519
 
@@ -522,13 +522,13 @@ WER on [MLS](https://huggingface.co/datasets/facebook/multilingual_librispeech)
522
 
523
  | **Version** | **Model** | **De** | **Es** | **Fr** |
524
  |:---------:|:-----------:|:------:|:------:|:------:|
525
- | 2.2.0 | canary-180m-flash | 4.81 | 3.17 | 4.75 |
526
 
527
 
528
  WER on [MCV-16.1](https://commonvoice.mozilla.org/en/datasets) test set:
529
  | **Version** | **Model** | **En** | **De** | **Es** | **Fr** |
530
  |:---------:|:-----------:|:------:|:------:|:------:|:------:|
531
- | 2.2.0 | canary-180m-flash | 9.53 | 5.94 | 4.90 | 8.19 |
532
 
533
 
534
  More details on evaluation can be found at [HuggingFace ASR Leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard)
@@ -543,13 +543,13 @@ BLEU score:
543
 
544
  | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | **De->En** | **Es->En** | **Fr->En** |
545
  |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
546
- | 2.2.0 | canary-180m-flash | 28.18 | 20.47 | 36.66 | 32.08 | 20.09 | 29.75 |
547
 
548
  COMET score:
549
 
550
  | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | **De->En** | **Es->En** | **Fr->En** |
551
  |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
552
- | 2.2.0 | canary-180m-flash | 77.56 | 78.10 | 78.53 | 83.03 | 81.48 | 82.28 |
553
 
554
  [COVOST-v2](https://github.com/facebookresearch/covost) test set:
555
 
@@ -557,13 +557,13 @@ BLEU score:
557
 
558
  | **Version** | **Model** | **De->En** | **Es->En** | **Fr->En** |
559
  |:-----------:|:---------:|:----------:|:----------:|:----------:|
560
- | 2.2.0 | canary-180m-flash | 35.61 | 39.84 | 38.57 |
561
 
562
  COMET score:
563
 
564
  | **Version** | **Model** | **De->En** | **Es->En** | **Fr->En** |
565
  |:-----------:|:---------:|:----------:|:----------:|:----------:|
566
- | 2.2.0 | canary-180m-flash | 80.94 | 84.54 | 82.50 |
567
 
568
  [mExpresso](https://huggingface.co/facebook/seamless-expressive#mexpresso-multilingual-expresso) test set:
569
 
@@ -571,13 +571,13 @@ BLEU score:
571
 
572
  | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** |
573
  |:-----------:|:---------:|:----------:|:----------:|:----------:|
574
- | 2.2.0 | canary-180m-flash | 21.60 | 33.45 | 25.96 |
575
 
576
  COMET score:
577
 
578
  | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** |
579
  |:-----------:|:---------:|:----------:|:----------:|:----------:|
580
- | 2.2.0 | canary-180m-flash | 77.71 | 80.87 | 77.82 |
581
 
582
 
583
  ### Timestamp Prediction
@@ -585,7 +585,7 @@ F1-score on [Librispeech Test sets](https://www.openslr.org/12) at collar value
585
 
586
  | **Version** | **Model** | **test-clean** | **test-other** |
587
  |:-----------:|:---------:|:----------:|:----------:|
588
- | 2.2.0 | canary-180m-flash | 93.48 | 91.38 |
589
 
590
 
591
  ### Hallucination Robustness
@@ -593,14 +593,14 @@ Number of characters per minute on [MUSAN](https://www.openslr.org/17) 48 hrs ev
593
 
594
  | **Version** | **Model** | **# of character per minute** |
595
  |:-----------:|:---------:|:----------:|
596
- | 2.2.0 | canary-180m-flash | 91.52 |
597
 
598
  ### Noise Robustness
599
  WER on [Librispeech Test Clean](https://www.openslr.org/12) at different SNR (signal to noise ratio) levels of additive white noise
600
 
601
  | **Version** | **Model** | **SNR 10** | **SNR 5** | **SNR 0** | **SNR -5** |
602
  |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|
603
- | 2.2.0 | canary-180m-flash | 3.23 | 5.34 | 12.21 | 34.03 |
604
 
605
  ## Model Fairness Evaluation
606
 
 
381
 
382
  ## Software Integration:
383
  **Runtime Engine(s):**
384
+ * NeMo - main <br>
385
 
386
  **Supported Hardware Microarchitecture Compatibility:** <br>
387
  * [NVIDIA Ampere] <br>
 
504
 
505
  | **Version** | **Model** | **RTFx** | **AMI** | **GigaSpeech** | **LS Clean** | **LS Other** | **Earnings22** | **SPGISpech** | **Tedlium** | **Voxpopuli** |
506
  |:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
507
+ | main | canary-180m-flash | 1233 | 14.86 | 10.51 | 1.87 | 3.83 | 13.33 | 2.26 | 3.98 | 6.35 |
508
 
509
  #### Inference speed on different systems
510
  We profiled inference speed on the OpenASR benchmark using the [real-time factor](https://github.com/NVIDIA/DeepLearningExamples/blob/master/Kaldi/SpeechRecognition/README.md#metrics) (RTFx) to quantify throughput.
511
 
512
  | **Version** | **Model** | **System** | **RTFx** |
513
  |:-----------:|:-------------:|:------------:|:----------:|
514
+ | main | canary-180m-flash | NVIDIA A100 | 1233 |
515
+ | main | canary-180m-flash | NVIDIA H100 | 2041 |
516
+ | main | canary-180m-flash | NVIDIA B200 | 2357 |
517
 
518
 
519
 
 
522
 
523
  | **Version** | **Model** | **De** | **Es** | **Fr** |
524
  |:---------:|:-----------:|:------:|:------:|:------:|
525
+ | main | canary-180m-flash | 4.81 | 3.17 | 4.75 |
526
 
527
 
528
  WER on [MCV-16.1](https://commonvoice.mozilla.org/en/datasets) test set:
529
  | **Version** | **Model** | **En** | **De** | **Es** | **Fr** |
530
  |:---------:|:-----------:|:------:|:------:|:------:|:------:|
531
+ | main | canary-180m-flash | 9.53 | 5.94 | 4.90 | 8.19 |
532
 
533
 
534
  More details on evaluation can be found at [HuggingFace ASR Leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard)
 
543
 
544
  | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | **De->En** | **Es->En** | **Fr->En** |
545
  |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
546
+ | main | canary-180m-flash | 28.18 | 20.47 | 36.66 | 32.08 | 20.09 | 29.75 |
547
 
548
  COMET score:
549
 
550
  | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** | **De->En** | **Es->En** | **Fr->En** |
551
  |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
552
+ | main | canary-180m-flash | 77.56 | 78.10 | 78.53 | 83.03 | 81.48 | 82.28 |
553
 
554
  [COVOST-v2](https://github.com/facebookresearch/covost) test set:
555
 
 
557
 
558
  | **Version** | **Model** | **De->En** | **Es->En** | **Fr->En** |
559
  |:-----------:|:---------:|:----------:|:----------:|:----------:|
560
+ | main | canary-180m-flash | 35.61 | 39.84 | 38.57 |
561
 
562
  COMET score:
563
 
564
  | **Version** | **Model** | **De->En** | **Es->En** | **Fr->En** |
565
  |:-----------:|:---------:|:----------:|:----------:|:----------:|
566
+ | main | canary-180m-flash | 80.94 | 84.54 | 82.50 |
567
 
568
  [mExpresso](https://huggingface.co/facebook/seamless-expressive#mexpresso-multilingual-expresso) test set:
569
 
 
571
 
572
  | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** |
573
  |:-----------:|:---------:|:----------:|:----------:|:----------:|
574
+ | main | canary-180m-flash | 21.60 | 33.45 | 25.96 |
575
 
576
  COMET score:
577
 
578
  | **Version** | **Model** | **En->De** | **En->Es** | **En->Fr** |
579
  |:-----------:|:---------:|:----------:|:----------:|:----------:|
580
+ | main | canary-180m-flash | 77.71 | 80.87 | 77.82 |
581
 
582
 
583
  ### Timestamp Prediction
 
585
 
586
  | **Version** | **Model** | **test-clean** | **test-other** |
587
  |:-----------:|:---------:|:----------:|:----------:|
588
+ | main | canary-180m-flash | 93.48 | 91.38 |
589
 
590
 
591
  ### Hallucination Robustness
 
593
 
594
  | **Version** | **Model** | **# of character per minute** |
595
  |:-----------:|:---------:|:----------:|
596
+ | main | canary-180m-flash | 91.52 |
597
 
598
  ### Noise Robustness
599
  WER on [Librispeech Test Clean](https://www.openslr.org/12) at different SNR (signal to noise ratio) levels of additive white noise
600
 
601
  | **Version** | **Model** | **SNR 10** | **SNR 5** | **SNR 0** | **SNR -5** |
602
  |:-----------:|:---------:|:----------:|:----------:|:----------:|:----------:|
603
+ | main | canary-180m-flash | 3.23 | 5.34 | 12.21 | 34.03 |
604
 
605
  ## Model Fairness Evaluation
606