johaness14
/

wav2vec2-conformer-rel-pos-jv-openslr

Automatic Speech Recognition

wav2vec2-conformer

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics

johaness14 commited on May 19

Commit

baa11e5

·

1 Parent(s): 684624e

Update README.md

Files changed (1) hide show

README.md +85 -0

README.md CHANGED Viewed

@@ -51,6 +51,91 @@ The following hyperparameters were used during training:
 - num_epochs: 75
 - mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch   | Step  | Validation Loss | Wer    |

 - num_epochs: 75
 - mixed_precision_training: Native AMP
+### How to run (Gradio Web)
+```python
+import torch
+import torchaudio
+import gradio as gr
+import numpy as np
+from transformers import pipeline, AutoProcessor, AutoModelForCTC
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# Load the model and processor
+MODEL_NAME = "<fill this to your model>"
+processor = AutoProcessor.from_pretrained(MODEL_NAME)
+model = AutoModelForCTC.from_pretrained(MODEL_NAME)
+# Move model to GPU
+model.to(device)
+# Create the pipeline with the model and processor
+transcriber = pipeline("automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=device)
+def transcribe(audio):
+    sr, y = audio
+    y = y.astype(np.float32)
+    y /= np.max(np.abs(y))
+    return transcriber({"sampling_rate": sr, "raw": y})["text"]
+demo = gr.Interface(
+    transcribe,
+    gr.Audio(sources=["upload"]),
+    "text",
+)
+demo.launch(share=True)
+```
+### How to run
+```python
+import torch
+import torchaudio
+import gradio as gr
+import numpy as np
+from transformers import pipeline, AutoProcessor, AutoModelForCTC
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# Load the model and processor
+MODEL_NAME = "<fill this to actual model>"
+processor = AutoProcessor.from_pretrained(MODEL_NAME)
+model = AutoModelForCTC.from_pretrained(MODEL_NAME)
+# Move model to GPU
+model.to(device)
+# Load audio file
+AUDIO_PATH = "<replace 'path_to_audio_file.wav' with the actual path to your audio file>"
+audio_input, sample_rate = torchaudio.load(AUDIO_PATH)
+# Ensure the audio is mono (1 channel)
+if audio_input.shape[0] > 1:
+    audio_input = torch.mean(audio_input, dim=0, keepdim=True)
+# Resample audio if necessary
+if sample_rate != 16000:
+    resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
+    audio_input = resampler(audio_input)
+# Process the audio input
+input_values = processor(audio_input.squeeze(), sampling_rate=16000, return_tensors="pt").input_values
+# Move input values to GPU
+input_values = input_values.to(device)
+# Perform inference
+with torch.no_grad():
+    logits = model(input_values).logits
+# Decode the logits to text
+predicted_ids = torch.argmax(logits, dim=-1)
+transcription = processor.batch_decode(predicted_ids)[0]
+print("Transcription:", transcription)
+```
 ### Training results
 | Training Loss | Epoch   | Step  | Validation Loss | Wer    |