johaness14 commited on
Commit
baa11e5
·
1 Parent(s): 684624e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md CHANGED
@@ -51,6 +51,91 @@ The following hyperparameters were used during training:
51
  - num_epochs: 75
52
  - mixed_precision_training: Native AMP
53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  ### Training results
55
 
56
  | Training Loss | Epoch | Step | Validation Loss | Wer |
 
51
  - num_epochs: 75
52
  - mixed_precision_training: Native AMP
53
 
54
+ ### How to run (Gradio Web)
55
+ ```python
56
+ import torch
57
+ import torchaudio
58
+ import gradio as gr
59
+ import numpy as np
60
+ from transformers import pipeline, AutoProcessor, AutoModelForCTC
61
+
62
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
63
+
64
+ # Load the model and processor
65
+ MODEL_NAME = "<fill this to your model>"
66
+ processor = AutoProcessor.from_pretrained(MODEL_NAME)
67
+ model = AutoModelForCTC.from_pretrained(MODEL_NAME)
68
+
69
+ # Move model to GPU
70
+ model.to(device)
71
+
72
+ # Create the pipeline with the model and processor
73
+ transcriber = pipeline("automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=device)
74
+
75
+ def transcribe(audio):
76
+ sr, y = audio
77
+ y = y.astype(np.float32)
78
+ y /= np.max(np.abs(y))
79
+
80
+ return transcriber({"sampling_rate": sr, "raw": y})["text"]
81
+
82
+ demo = gr.Interface(
83
+ transcribe,
84
+ gr.Audio(sources=["upload"]),
85
+ "text",
86
+ )
87
+
88
+ demo.launch(share=True)
89
+ ```
90
+
91
+ ### How to run
92
+ ```python
93
+ import torch
94
+ import torchaudio
95
+ import gradio as gr
96
+ import numpy as np
97
+ from transformers import pipeline, AutoProcessor, AutoModelForCTC
98
+
99
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
100
+
101
+ # Load the model and processor
102
+ MODEL_NAME = "<fill this to actual model>"
103
+ processor = AutoProcessor.from_pretrained(MODEL_NAME)
104
+ model = AutoModelForCTC.from_pretrained(MODEL_NAME)
105
+
106
+ # Move model to GPU
107
+ model.to(device)
108
+
109
+ # Load audio file
110
+ AUDIO_PATH = "<replace 'path_to_audio_file.wav' with the actual path to your audio file>"
111
+ audio_input, sample_rate = torchaudio.load(AUDIO_PATH)
112
+
113
+ # Ensure the audio is mono (1 channel)
114
+ if audio_input.shape[0] > 1:
115
+ audio_input = torch.mean(audio_input, dim=0, keepdim=True)
116
+
117
+ # Resample audio if necessary
118
+ if sample_rate != 16000:
119
+ resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
120
+ audio_input = resampler(audio_input)
121
+
122
+ # Process the audio input
123
+ input_values = processor(audio_input.squeeze(), sampling_rate=16000, return_tensors="pt").input_values
124
+
125
+ # Move input values to GPU
126
+ input_values = input_values.to(device)
127
+
128
+ # Perform inference
129
+ with torch.no_grad():
130
+ logits = model(input_values).logits
131
+
132
+ # Decode the logits to text
133
+ predicted_ids = torch.argmax(logits, dim=-1)
134
+ transcription = processor.batch_decode(predicted_ids)[0]
135
+
136
+ print("Transcription:", transcription)
137
+ ```
138
+
139
  ### Training results
140
 
141
  | Training Loss | Epoch | Step | Validation Loss | Wer |