nvidia
/

canary-180m-flash

Automatic Speech Recognition

automatic-speech-translation

hf-asr-leaderboard

Model card Files Files and versions Community

ankitapasad commited on 6 days ago

Commit

238ca4c

·

verified ·

1 Parent(s): 92a6f85

added RTFx table

Files changed (1) hide show

README.md +12 -1

README.md CHANGED Viewed

@@ -496,7 +496,7 @@ The tokenizers for these models were built using the text transcripts of the tra
 For ASR and AST experiments, predictions were generated using greedy decoding. Note that utterances shorter than 1 second are symmetrically zero-padded upto 1 second during evaluation.
-### ASR Performance (w/o PnC)
 The ASR performance is measured with word error rate (WER), and we process the groundtruth and predicted text with [whisper-normalizer](https://pypi.org/project/whisper-normalizer/).
@@ -506,7 +506,18 @@ WER on [HuggingFace OpenASR leaderboard](https://huggingface.co/spaces/hf-audio/
 |:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
 | 2.3.0  | canary-180m-flash | 1233 | 14.86 | 10.51 | 1.87 | 3.83 | 13.33 | 2.26 | 3.98 | 6.35 |
 WER on [MLS](https://huggingface.co/datasets/facebook/multilingual_librispeech) test set:
 | **Version** | **Model**  | **De**   | **Es**   | **Fr**   |

 For ASR and AST experiments, predictions were generated using greedy decoding. Note that utterances shorter than 1 second are symmetrically zero-padded upto 1 second during evaluation.
+### English ASR Performance (w/o PnC)
 The ASR performance is measured with word error rate (WER), and we process the groundtruth and predicted text with [whisper-normalizer](https://pypi.org/project/whisper-normalizer/).
 |:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
 | 2.3.0  | canary-180m-flash | 1233 | 14.86 | 10.51 | 1.87 | 3.83 | 13.33 | 2.26 | 3.98 | 6.35 |
+#### Inference speed on different systems
+We profiled inference speed on the OpenASR benchmark using the [real-time factor](https://github.com/NVIDIA/DeepLearningExamples/blob/master/Kaldi/SpeechRecognition/README.md#metrics) (RTFx) to quantify throughput.
+| **Version** | **Model**     | **System**   | **RTFx**   |
+|:-----------:|:-------------:|:------------:|:----------:|
+| 2.3.0 | canary-180m-flash | NVIDIA A100 | 1233 |
+| 2.3.0 | canary-180m-flash | NVIDIA H100 | TBA |
+| 2.3.0 | canary-180m-flash | NVIDIA B200 | 2357 |
+### Multilingual ASR Performance
 WER on [MLS](https://huggingface.co/datasets/facebook/multilingual_librispeech) test set:
 | **Version** | **Model**  | **De**   | **Es**   | **Fr**   |