bilalfaye commited on
Commit
e4df77b
·
verified ·
1 Parent(s): 4f8a977

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -8
README.md CHANGED
@@ -8,6 +8,12 @@ metrics:
8
  model-index:
9
  - name: whisper-medium-english-2-wolof
10
  results: []
 
 
 
 
 
 
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -15,24 +21,31 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # whisper-medium-english-2-wolof
17
 
18
- This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
 
20
  - Loss: 1.1668
21
  - Bleu: 34.6061
22
 
23
- ## Model description
 
 
24
 
25
- More information needed
26
 
27
- ## Intended uses & limitations
 
 
28
 
29
- More information needed
 
 
30
 
31
- ## Training and evaluation data
32
 
33
- More information needed
34
 
35
- ## Training procedure
36
 
37
  ### Training hyperparameters
38
 
@@ -69,3 +82,105 @@ The following hyperparameters were used during training:
69
  - Pytorch 2.4.0+cu121
70
  - Datasets 3.2.0
71
  - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  model-index:
9
  - name: whisper-medium-english-2-wolof
10
  results: []
11
+ datasets:
12
+ - bilalfaye/english-wolof-french-dataset
13
+ language:
14
+ - en
15
+ - wo
16
+ pipeline_tag: automatic-speech-recognition
17
  ---
18
 
19
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
21
 
22
  # whisper-medium-english-2-wolof
23
 
24
+ This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the [bilalfaye/english-wolof-french-dataset](https://huggingface.co/datasets/bilalfaye/english-wolof-french-dataset). The model is designed to translate English audio into Wolof text. Since the base Whisper model does not natively support Wolof, this fine-tuned version bridges that gap.
25
  It achieves the following results on the evaluation set:
26
+
27
  - Loss: 1.1668
28
  - Bleu: 34.6061
29
 
30
+ ## Model Description
31
+
32
+ The model is based on OpenAI's Whisper architecture, fine-tuned to recognize and translate English speech to Wolof. It leverages the "medium" variant, offering a balance between accuracy and computational efficiency.
33
 
34
+ ## Intended Uses & Limitations
35
 
36
+ **Intended uses:**
37
+ - Automatic transcription and translation of English audio into Wolof text.
38
+ - Assisting researchers and language learners working with English audio content.
39
 
40
+ **Limitations:**
41
+ - May struggle with heavy accents or noisy environments.
42
+ - Performance may vary depending on speaker pronunciation and recording quality.
43
 
44
+ ## Training and Evaluation Data
45
 
46
+ The model was fine-tuned on the [bilalfaye/english-wolof-french-dataset](https://huggingface.co/datasets/bilalfaye/english-wolof-french-dataset), which consists of English audio paired with Wolof translations.
47
 
48
+ ## Training Procedure
49
 
50
  ### Training hyperparameters
51
 
 
82
  - Pytorch 2.4.0+cu121
83
  - Datasets 3.2.0
84
  - Tokenizers 0.19.1
85
+
86
+ ## Inference
87
+
88
+ ### Using Python Code
89
+
90
+ ```python
91
+ ! pip install transformers datasets torch
92
+
93
+ import torch
94
+ from transformers import WhisperForConditionalGeneration, WhisperProcessor
95
+ from datasets import load_dataset
96
+
97
+ # Load model and processor
98
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
99
+ model = WhisperForConditionalGeneration.from_pretrained("bilalfaye/whisper-medium-english-2-wolof").to(device)
100
+ processor = WhisperProcessor.from_pretrained("bilalfaye/whisper-medium-english-2-wolof")
101
+
102
+ # Load dataset
103
+ streaming_dataset = load_dataset("bilalfaye/english-wolof-french-dataset", split="train", streaming=True)
104
+ iterator = iter(streaming_dataset)
105
+ sample = next(iterator)
106
+ sample = next(iterator)
107
+ sample = next(iterator)
108
+
109
+
110
+ # Preprocess audio
111
+ input_features = processor(sample["en_audio"]["audio"]["array"],
112
+ sampling_rate=sample["en_audio"]["audio"]["sampling_rate"],
113
+ return_tensors="pt").input_features.to(device)
114
+
115
+ # Generate transcription
116
+ predicted_ids = model.generate(input_features)
117
+ transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
118
+
119
+ print("Correct sentence:", sample["en"])
120
+ print("Transcription:", transcription[0])
121
+ ```
122
+
123
+ ### Using Gradio Interface
124
+
125
+ ```python
126
+ ! pip install gradio
127
+
128
+ from transformers import pipeline
129
+ import gradio as gr
130
+ import numpy as np
131
+
132
+
133
+ # Load model pipeline
134
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
135
+ pipe = pipeline(task="automatic-speech-recognition", model="bilalfaye/whisper-medium-english-2-wolof", device=device)
136
+
137
+ # Function for transcription
138
+ def transcribe(audio):
139
+ if audio is None:
140
+ return "No audio provided. Please try again."
141
+
142
+ if isinstance(audio, str):
143
+ waveform, sample_rate = torchaudio.load(audio)
144
+ elif isinstance(audio, tuple): # Case microphone (Gradio donne un tuple (fichier, sample_rate))
145
+ waveform, sample_rate = torchaudio.load(audio[0])
146
+ else:
147
+ return "Invalid audio input format."
148
+
149
+ if waveform.shape[0] > 1:
150
+ mono_audio = waveform.mean(dim=0, keepdim=True)
151
+ else:
152
+ mono_audio = waveform
153
+
154
+ target_sample_rate = 16000
155
+ if sample_rate != target_sample_rate:
156
+ resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=target_sample_rate)
157
+ mono_audio = resampler(mono_audio)
158
+ sample_rate = target_sample_rate
159
+
160
+ mono_audio = mono_audio.squeeze(0).numpy().astype(np.float32)
161
+
162
+ result = pipe({"array": mono_audio, "sampling_rate": sample_rate})
163
+ return result['text']
164
+
165
+
166
+ # Create Gradio interfaces
167
+ interface = gr.Interface(
168
+ fn=transcribe,
169
+ inputs=gr.Audio(sources=["upload", "microphone"], type="filepath"),
170
+ outputs="text",
171
+ title="Whisper Medium English Translation",
172
+ description="Record audio in English and translate it to Wolof using a fine-tuned Whisper medium model.",
173
+ #live=True,
174
+ )
175
+
176
+
177
+ app = gr.TabbedInterface(
178
+ [interface],
179
+ ["Use Uploaded File or Microphone"]
180
+ )
181
+
182
+ app.launch(debug=True, share=True)
183
+ ```
184
+
185
+ **Author**
186
+ - Bilal FAYE