nvidia
/

canary-180m-flash

Automatic Speech Recognition

automatic-speech-translation

hf-asr-leaderboard

Model card Files Files and versions Community

ankitapasad commited on 4 days ago

Commit

b12ab41

·

verified ·

1 Parent(s): 5b796c8

minor fixes

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -306,7 +306,7 @@ canary_model.change_decoding_strategy(decode_cfg)
 Input to canary-180m-flash can be either a list of paths to audio files or a jsonl manifest file.
-### Inference with Canary-180M-flash:
 If the input is a list of paths, canary-180m-flash assumes that the audio is English and transcribes it. I.e., canary-180m-flash default behavior is English ASR.
 ```python
 output = canary_model.transcribe(
@@ -354,7 +354,7 @@ output = canary_model.transcribe(
 )
 ```
-### Longform inference with Canary-180M-flash:
 Canary models are designed to handle input audio smaller than 40 seconds. In order to handle longer audios, NeMo includes [speech_to_text_aed_chunked_infer.py](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_chunked_inference/aed/speech_to_text_aed_chunked_infer.py) script that handles chunking, performs inference on the chunked files, and stitches the transcripts.
 The script will perform inference on all `.wav` files in `audio_dir`. Alternatively you can also pass a path to a manifest file as shown above. The decoded output will be saved at `output_json_path`.

 Input to canary-180m-flash can be either a list of paths to audio files or a jsonl manifest file.
+### Inference with canary-180m-flash:
 If the input is a list of paths, canary-180m-flash assumes that the audio is English and transcribes it. I.e., canary-180m-flash default behavior is English ASR.
 ```python
 output = canary_model.transcribe(
 )
 ```
+### Longform inference with canary-180m-flash:
 Canary models are designed to handle input audio smaller than 40 seconds. In order to handle longer audios, NeMo includes [speech_to_text_aed_chunked_infer.py](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_chunked_inference/aed/speech_to_text_aed_chunked_infer.py) script that handles chunking, performs inference on the chunked files, and stitches the transcripts.
 The script will perform inference on all `.wav` files in `audio_dir`. Alternatively you can also pass a path to a manifest file as shown above. The decoded output will be saved at `output_json_path`.