added RTFx table
Browse files
README.md
CHANGED
@@ -496,7 +496,7 @@ The tokenizers for these models were built using the text transcripts of the tra
|
|
496 |
|
497 |
For ASR and AST experiments, predictions were generated using greedy decoding. Note that utterances shorter than 1 second are symmetrically zero-padded upto 1 second during evaluation.
|
498 |
|
499 |
-
### ASR Performance (w/o PnC)
|
500 |
|
501 |
The ASR performance is measured with word error rate (WER), and we process the groundtruth and predicted text with [whisper-normalizer](https://pypi.org/project/whisper-normalizer/).
|
502 |
|
@@ -506,7 +506,18 @@ WER on [HuggingFace OpenASR leaderboard](https://huggingface.co/spaces/hf-audio/
|
|
506 |
|:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
|
507 |
| 2.3.0 | canary-180m-flash | 1233 | 14.86 | 10.51 | 1.87 | 3.83 | 13.33 | 2.26 | 3.98 | 6.35 |
|
508 |
|
|
|
|
|
509 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
510 |
WER on [MLS](https://huggingface.co/datasets/facebook/multilingual_librispeech) test set:
|
511 |
|
512 |
| **Version** | **Model** | **De** | **Es** | **Fr** |
|
|
|
496 |
|
497 |
For ASR and AST experiments, predictions were generated using greedy decoding. Note that utterances shorter than 1 second are symmetrically zero-padded upto 1 second during evaluation.
|
498 |
|
499 |
+
### English ASR Performance (w/o PnC)
|
500 |
|
501 |
The ASR performance is measured with word error rate (WER), and we process the groundtruth and predicted text with [whisper-normalizer](https://pypi.org/project/whisper-normalizer/).
|
502 |
|
|
|
506 |
|:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
|
507 |
| 2.3.0 | canary-180m-flash | 1233 | 14.86 | 10.51 | 1.87 | 3.83 | 13.33 | 2.26 | 3.98 | 6.35 |
|
508 |
|
509 |
+
#### Inference speed on different systems
|
510 |
+
We profiled inference speed on the OpenASR benchmark using the [real-time factor](https://github.com/NVIDIA/DeepLearningExamples/blob/master/Kaldi/SpeechRecognition/README.md#metrics) (RTFx) to quantify throughput.
|
511 |
|
512 |
+
| **Version** | **Model** | **System** | **RTFx** |
|
513 |
+
|:-----------:|:-------------:|:------------:|:----------:|
|
514 |
+
| 2.3.0 | canary-180m-flash | NVIDIA A100 | 1233 |
|
515 |
+
| 2.3.0 | canary-180m-flash | NVIDIA H100 | TBA |
|
516 |
+
| 2.3.0 | canary-180m-flash | NVIDIA B200 | 2357 |
|
517 |
+
|
518 |
+
|
519 |
+
|
520 |
+
### Multilingual ASR Performance
|
521 |
WER on [MLS](https://huggingface.co/datasets/facebook/multilingual_librispeech) test set:
|
522 |
|
523 |
| **Version** | **Model** | **De** | **Es** | **Fr** |
|