neurlang commited on
Commit
415b3ea
·
verified ·
1 Parent(s): acaff0a

Timestamps info

Browse files
Files changed (1) hide show
  1. README.md +3 -4
README.md CHANGED
@@ -278,13 +278,12 @@ can be run with batched inference. It can also be extended to predict sequence l
278
  >>> sample = ds[0]["audio"]
279
 
280
  >>> prediction = pipe(sample.copy(), batch_size=8)["text"]
281
- "mˈɪstɚ kwˈɪltɚ ˈɪz ðə ˈeɪ pˈɑsəl ˈʌv ðə ˈmɪdəl klˈæsɪz ˈænd wˈɪɹ glæd tˈu ˈælkəm ˈhɪz gˈʌsbəl"
282
 
283
  >>> # we can also return timestamps for the predictions
284
- >>> prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
285
-
286
- Whisper did not predict an ending timestamp, which can happen if audio is cut off in the middle of a word. Also make sure WhisperTimeStampLogitsProcessor was used during generation.
287
 
 
288
  ```
289
 
290
  Refer to the blog post [ASR Chunking](https://huggingface.co/blog/asr-chunking) for more details on the chunking algorithm.
 
278
  >>> sample = ds[0]["audio"]
279
 
280
  >>> prediction = pipe(sample.copy(), batch_size=8)["text"]
281
+ "mˈɪstɚ kwˈɪltɚ ˈɪz ðɪ əpˈɑsəl əv ðə ˈmɪdəl klˈæsɪz ˈænd wˈɪɹ glˈæd tˈɪ wˈɛlkəm ˈhɪz gˈɑspəl"
282
 
283
  >>> # we can also return timestamps for the predictions
284
+ >>> prediction = pipe(sample.copy(), batch_size=8, return_timestamps="word")["chunks"]
 
 
285
 
286
+ [{'text': 'mˈɪstɚ', 'timestamp': (0.42, 0.78)}, {'text': ' kwˈɪltɚ', 'timestamp': (0.78, 1.2)}, {'text': ' ˈɪz', 'timestamp': (1.2, 1.4)}, {'text': ' ðɪ', 'timestamp': (1.4, 1.52)}, {'text': ' əpˈɑsəl', 'timestamp': (1.52, 2.08)}, {'text': ' əv', 'timestamp': (2.08, 2.26)}, {'text': ' ðə', 'timestamp': (2.26, 2.36)}, {'text': ' ˈmɪdəl', 'timestamp': (2.36, 2.6)}, {'text': ' klˈæsɪz', 'timestamp': (2.6, 3.22)}, {'text': ' ˈænd', 'timestamp': (3.22, 3.42)}, {'text': ' wˈɪɹ', 'timestamp': (3.42, 3.66)}, {'text': ' glˈæd', 'timestamp': (3.66, 4.02)}, {'text': ' tˈɪ', 'timestamp': (4.02, 4.18)}, {'text': ' wˈɛlkəm', 'timestamp': (4.18, 4.58)}, {'text': ' ˈhɪz', 'timestamp': (4.58, 4.82)}, {'text': ' gˈɑspəl', 'timestamp': (4.82, 5.38)}]
287
  ```
288
 
289
  Refer to the blog post [ASR Chunking](https://huggingface.co/blog/asr-chunking) for more details on the chunking algorithm.