nvidia
/

canary-180m-flash

Automatic Speech Recognition

automatic-speech-translation

hf-asr-leaderboard

Model card Files Files and versions Community

ankitapasad commited on 5 days ago

Commit

acd3dc8

·

verified ·

1 Parent(s): 27adfa8

update timestamps usage

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -319,7 +319,7 @@ predicted_text = output[0].text
 ```
-canary-180m-flash can also generate word and segment level timestamps
 ```python
 output = canary_model.transcribe(
   ['filepath.wav'],
@@ -331,6 +331,7 @@ word_level_timestamps = output[0].timestamp['word']
 segment_level_timestamps = output[0].timestamp['segment']
 ```
 To use canary-180m-flash for transcribing other supported languages or perform Speech-to-Text translation or provide word-level timestamps, specify the input as jsonl manifest file, where each line in the file is a dictionary containing the following fields:

 ```
+canary-180m-flash can also predict word-level and segment-level timestamps
 ```python
 output = canary_model.transcribe(
   ['filepath.wav'],
 segment_level_timestamps = output[0].timestamp['segment']
 ```
+To predict timestamps for audio files longer than 10 seconds, we recommend using the longform inference script (explained in the next section) with `chunk_len_in_secs=10.0`.
 To use canary-180m-flash for transcribing other supported languages or perform Speech-to-Text translation or provide word-level timestamps, specify the input as jsonl manifest file, where each line in the file is a dictionary containing the following fields: