ankitapasad commited on
Commit
238ca4c
·
verified ·
1 Parent(s): 92a6f85

added RTFx table

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -496,7 +496,7 @@ The tokenizers for these models were built using the text transcripts of the tra
496
 
497
  For ASR and AST experiments, predictions were generated using greedy decoding. Note that utterances shorter than 1 second are symmetrically zero-padded upto 1 second during evaluation.
498
 
499
- ### ASR Performance (w/o PnC)
500
 
501
  The ASR performance is measured with word error rate (WER), and we process the groundtruth and predicted text with [whisper-normalizer](https://pypi.org/project/whisper-normalizer/).
502
 
@@ -506,7 +506,18 @@ WER on [HuggingFace OpenASR leaderboard](https://huggingface.co/spaces/hf-audio/
506
  |:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
507
  | 2.3.0 | canary-180m-flash | 1233 | 14.86 | 10.51 | 1.87 | 3.83 | 13.33 | 2.26 | 3.98 | 6.35 |
508
 
 
 
509
 
 
 
 
 
 
 
 
 
 
510
  WER on [MLS](https://huggingface.co/datasets/facebook/multilingual_librispeech) test set:
511
 
512
  | **Version** | **Model** | **De** | **Es** | **Fr** |
 
496
 
497
  For ASR and AST experiments, predictions were generated using greedy decoding. Note that utterances shorter than 1 second are symmetrically zero-padded upto 1 second during evaluation.
498
 
499
+ ### English ASR Performance (w/o PnC)
500
 
501
  The ASR performance is measured with word error rate (WER), and we process the groundtruth and predicted text with [whisper-normalizer](https://pypi.org/project/whisper-normalizer/).
502
 
 
506
  |:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
507
  | 2.3.0 | canary-180m-flash | 1233 | 14.86 | 10.51 | 1.87 | 3.83 | 13.33 | 2.26 | 3.98 | 6.35 |
508
 
509
+ #### Inference speed on different systems
510
+ We profiled inference speed on the OpenASR benchmark using the [real-time factor](https://github.com/NVIDIA/DeepLearningExamples/blob/master/Kaldi/SpeechRecognition/README.md#metrics) (RTFx) to quantify throughput.
511
 
512
+ | **Version** | **Model** | **System** | **RTFx** |
513
+ |:-----------:|:-------------:|:------------:|:----------:|
514
+ | 2.3.0 | canary-180m-flash | NVIDIA A100 | 1233 |
515
+ | 2.3.0 | canary-180m-flash | NVIDIA H100 | TBA |
516
+ | 2.3.0 | canary-180m-flash | NVIDIA B200 | 2357 |
517
+
518
+
519
+
520
+ ### Multilingual ASR Performance
521
  WER on [MLS](https://huggingface.co/datasets/facebook/multilingual_librispeech) test set:
522
 
523
  | **Version** | **Model** | **De** | **Es** | **Fr** |