bilalfaye
/

whisper-medium-english-2-wolof

@@ -8,6 +8,12 @@ metrics:
 model-index:
 - name: whisper-medium-english-2-wolof
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -15,24 +21,31 @@ should probably proofread and complete it, then remove this comment. -->
 # whisper-medium-english-2-wolof
-This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 1.1668
 - Bleu: 34.6061
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -69,3 +82,105 @@ The following hyperparameters were used during training:
 - Pytorch 2.4.0+cu121
 - Datasets 3.2.0
 - Tokenizers 0.19.1

 model-index:
 - name: whisper-medium-english-2-wolof
   results: []
+datasets:
+- bilalfaye/english-wolof-french-dataset
+language:
+- en
+- wo
+pipeline_tag: automatic-speech-recognition
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # whisper-medium-english-2-wolof
+This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the [bilalfaye/english-wolof-french-dataset](https://huggingface.co/datasets/bilalfaye/english-wolof-french-dataset). The model is designed to translate English audio into Wolof text. Since the base Whisper model does not natively support Wolof, this fine-tuned version bridges that gap.
 It achieves the following results on the evaluation set:
 - Loss: 1.1668
 - Bleu: 34.6061
+## Model Description
+The model is based on OpenAI's Whisper architecture, fine-tuned to recognize and translate English speech to Wolof. It leverages the "medium" variant, offering a balance between accuracy and computational efficiency.
+## Intended Uses & Limitations
+**Intended uses:**
+- Automatic transcription and translation of English audio into Wolof text.
+- Assisting researchers and language learners working with English audio content.
+**Limitations:**
+- May struggle with heavy accents or noisy environments.
+- Performance may vary depending on speaker pronunciation and recording quality.
+## Training and Evaluation Data
+The model was fine-tuned on the [bilalfaye/english-wolof-french-dataset](https://huggingface.co/datasets/bilalfaye/english-wolof-french-dataset), which consists of English audio paired with Wolof translations.
+## Training Procedure
 ### Training hyperparameters
 - Pytorch 2.4.0+cu121
 - Datasets 3.2.0
 - Tokenizers 0.19.1
+## Inference
+### Using Python Code
+```python
+! pip install transformers datasets torch
+import torch
+from transformers import WhisperForConditionalGeneration, WhisperProcessor
+from datasets import load_dataset
+# Load model and processor
+device = "cuda:0" if torch.cuda.is_available() else "cpu"
+model = WhisperForConditionalGeneration.from_pretrained("bilalfaye/whisper-medium-english-2-wolof").to(device)
+processor = WhisperProcessor.from_pretrained("bilalfaye/whisper-medium-english-2-wolof")
+# Load dataset
+streaming_dataset = load_dataset("bilalfaye/english-wolof-french-dataset", split="train", streaming=True)
+iterator = iter(streaming_dataset)
+sample = next(iterator)
+sample = next(iterator)
+sample = next(iterator)
+# Preprocess audio
+input_features = processor(sample["en_audio"]["audio"]["array"],
+                           sampling_rate=sample["en_audio"]["audio"]["sampling_rate"],
+                           return_tensors="pt").input_features.to(device)
+# Generate transcription
+predicted_ids = model.generate(input_features)
+transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
+print("Correct sentence:", sample["en"])
+print("Transcription:", transcription[0])
+```
+### Using Gradio Interface
+```python
+! pip install gradio
+from transformers import pipeline
+import gradio as gr
+import numpy as np
+# Load model pipeline
+device = "cuda:0" if torch.cuda.is_available() else "cpu"
+pipe = pipeline(task="automatic-speech-recognition", model="bilalfaye/whisper-medium-english-2-wolof", device=device)
+# Function for transcription
+def transcribe(audio):
+    if audio is None:
+        return "No audio provided. Please try again."
+    if isinstance(audio, str):
+        waveform, sample_rate = torchaudio.load(audio)
+    elif isinstance(audio, tuple):  # Case microphone (Gradio donne un tuple (fichier, sample_rate))
+        waveform, sample_rate = torchaudio.load(audio[0])
+    else:
+        return "Invalid audio input format."
+    if waveform.shape[0] > 1:
+        mono_audio = waveform.mean(dim=0, keepdim=True)
+    else:
+        mono_audio = waveform
+    target_sample_rate = 16000
+    if sample_rate != target_sample_rate:
+        resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=target_sample_rate)
+        mono_audio = resampler(mono_audio)
+        sample_rate = target_sample_rate
+    mono_audio = mono_audio.squeeze(0).numpy().astype(np.float32)
+    result = pipe({"array": mono_audio, "sampling_rate": sample_rate})
+    return result['text']
+# Create Gradio interfaces
+interface = gr.Interface(
+    fn=transcribe,
+    inputs=gr.Audio(sources=["upload", "microphone"], type="filepath"),
+    outputs="text",
+    title="Whisper Medium English Translation",
+    description="Record audio in English and translate it to Wolof using a fine-tuned Whisper medium model.",
+    #live=True,
+)
+app = gr.TabbedInterface(
+    [interface],
+    ["Use Uploaded File or Microphone"]
+)
+app.launch(debug=True, share=True)
+```
+**Author**
+  - Bilal FAYE