waveletdeboshir
/

gigaam-ctc-with-lm

Automatic Speech Recognition

Model card Files Files and versions

waveletdeboshir commited on May 15

Commit

e6090eb

·

verified ·

1 Parent(s): 112b81c

Update README.md

Files changed (1) hide show

README.md +15 -4

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ tags:
 - audio
 - speech
 ---
 # GigaAM-v2-CTC with ngram LM and beamsearch 🤗 Hugging Face transformers
@@ -66,8 +66,13 @@ input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt")
 with torch.no_grad():
     logits = model(**input_features).logits
-# decoding with beamseach and LM
-transcription = processor.batch_decode(logits=logits.numpy()).text[0]
 ```
@@ -78,7 +83,13 @@ In our case (Conformer) `MODEL_STRIDE = 40` ms per timestamp.
 ```python
 MODEL_STRIDE = 40
-outputs = processor.batch_decode(logits=logits.numpy(), output_word_offsets=True)
 word_ts = [
     {
         "word": d["word"],

 - audio
 - speech
 ---
+[![Use In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/waveletdeboshir/07e39ae96f27331aa3e1e053c2c2f9e8/gigaam-ctc-hf-with-lm.ipynb)
 # GigaAM-v2-CTC with ngram LM and beamsearch 🤗 Hugging Face transformers
 with torch.no_grad():
     logits = model(**input_features).logits
+# decoding with beamseach and LM (tune alpha, beta, beam_width for your data)
+transcription = processor.batch_decode(
+    logits=logits.numpy(),
+    beam_width=64,
+    alpha=0.5,
+    beta=0.5,
+).text[0]
 ```
 ```python
 MODEL_STRIDE = 40
+outputs = processor.batch_decode(
+    logits=logits.numpy(),
+    beam_width=64,
+    alpha=0.5,
+    beta=0.5,
+    output_word_offsets=True
+)
 word_ts = [
     {
         "word": d["word"],