nvidia
/

canary-180m-flash

Automatic Speech Recognition

automatic-speech-translation

hf-asr-leaderboard

Model card Files Files and versions Community

ankitapasad commited on 12 days ago

Commit

c770bfa

·

verified ·

1 Parent(s): a6fd0a2

added bsize for open-asr eval

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -383,7 +383,7 @@ canary-180m-flash <br>
 ## Training Dataset:
 The canary-180m-flash model is trained on a total of 85K hrs of speech data. It consists of 31K hrs of public data, 20K hrs collected by [Suno](https://suno.ai/), and 34K hrs of in-house data.
-The datasets below include conversations, videos from the web and audiobook recordings.
 **Data Collection Method:**
 * Human <br>
@@ -476,7 +476,7 @@ In both ASR and AST experiments, predictions were generated using beam search wi
 The ASR performance is measured with word error rate (WER), and we process the groundtruth and predicted text with [whisper-normalizer](https://pypi.org/project/whisper-normalizer/).
-WER on [HuggingFace OpenASR leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard):
 | **Version** | **Model**     | **RTFx**   | **AMI**   | **GigaSpeech**   | **LS Clean**   | **LS Other**   | **Earnings22**   | **SPGISpech**   | **Tedlium**   | **Voxpopuli**   |
 |:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|

 ## Training Dataset:
 The canary-180m-flash model is trained on a total of 85K hrs of speech data. It consists of 31K hrs of public data, 20K hrs collected by [Suno](https://suno.ai/), and 34K hrs of in-house data.
+The datasets below include conversations, videos from the web, and audiobook recordings.
 **Data Collection Method:**
 * Human <br>
 The ASR performance is measured with word error rate (WER), and we process the groundtruth and predicted text with [whisper-normalizer](https://pypi.org/project/whisper-normalizer/).
+WER on [HuggingFace OpenASR leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard) evaluated with a batch size of 128:
 | **Version** | **Model**     | **RTFx**   | **AMI**   | **GigaSpeech**   | **LS Clean**   | **LS Other**   | **Earnings22**   | **SPGISpech**   | **Tedlium**   | **Voxpopuli**   |
 |:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|