oswald-large-v3-turbo-m1

This model is a fine-tuned version of openai/unsloth/whisper-large-v3-turbo on the creole-text-voice dataset.
The main objective is to create a 99% accurate Haitian Creole Speech-to-Text model, capable of transcribing diverse Haitian voices across accents, regions, and speaking styles.

🧠 Model description

oswald-large-v3-turbo-m1 is optimized for Haitian Creole automatic speech recognition (ASR). It builds upon the Whisper architecture by OpenAI and adapts it to Haitian Creole through transfer learning and fine-tuning on a high-quality curated dataset containing hours of Haitian Creole audio-text pairs.

Architecture: Whisper Large
Fine-tuned for: Haitian Creole (Kreyòl Ayisyen)
Vocabulary: Based on Latin script (Creole orthography), preserving diacritics and linguistic nuances.
Voice types: Made with female and male synthetics and naturals voices.
Sampling rate: 16kHz
Training objective: Maximize transcription accuracy for everyday Creole speech

✅ Intended uses

Transcribe Haitian Creole speech from:
- Voice notes
- Radio shows
- Interviews
- Public speeches
- Educational content
- Synthetic voices
Enable Creole voice interfaces in:
- Voice assistants
- Transcription services
- Language-learning tools
- Chatbots and accessibility platforms

⚠️ Limitations

May struggle with:
- Extremely poor audio quality (e.g., heavy background noise)
- Very fast or mumbled speech in some dialects
- Long duration audio file
Not optimized for real-time transcription on low-resource devices
Fine-tuned on a specific dataset – might generalize less to completely unseen voice types or rare accents

📊 Training and evaluation data

The model was trained on the creole-text-voice dataset, which includes:

7 hours of Haitian Creole Synthetic speech
8 hours of Haitian Creole Human speech
Annotated, time-aligned text transcripts following standard Creole orthography

Sources for next steps:

Public domain radio and podcast archives
Open-access interviews and spoken-word audio
Community-submitted voice samples

Preprocessing steps:

Voice Activity Detection (VAD)
Noise filtering and audio normalization
Manual transcript review and correction

Model usage script

# Load model directly
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import librosa
import numpy as np
import torch

processor = AutoProcessor.from_pretrained("jsbeaudry/oswald-large-v3-turbo-m1")
model = AutoModelForSpeechSeq2Seq.from_pretrained("jsbeaudry/oswald-large-v3-turbo-m1")

def transcript (audio_file_path):
   
    # Load audio
    speech_array, sampling_rate = librosa.load(audio_file_path, sr=16000)

    # Convert the NumPy array to a PyTorch tensor
    speech_array_pt = torch.from_numpy(speech_array).unsqueeze(0)

    input_features = processor(speech_array, sampling_rate=sampling_rate, return_tensors="pt").input_features 

    # 2. Generate predictions
    predicted_ids = model.generate(input_features)

    # 3. Decode the predictions
    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

    # print(transcription)
    return transcription

text = transcript("/path_audio")

print(text)

Model usage with gradio (UI)


from transformers import pipeline
import gradio as gr

# Load Whisper model
print("Loading model...")
pipe = pipeline(model="jsbeaudry/oswald-large-v3-turbo-m1")
print("Model loaded successfully.")

# Transcription function
def transcribe(audio_path):
    if audio_path is None:
        return "Please upload or record an audio file first."
    result = pipe(audio_path)
    return result["text"]

# Build Gradio interface
def create_interface():
    with gr.Blocks(title="Whisper Medium - Haitian Creole") as demo:
        gr.Markdown("# 🎙️ Whisper Medium Creole ASR")
        gr.Markdown(
            "Upload an audio file or record your voice in Haitian Creole. "
            "Then click **Transcribe** to see the result."
        )

        with gr.Row():
            with gr.Column():
                audio_input = gr.Audio(source="upload", type="filepath", label="🎧 Upload Audio")
            with gr.Column():
                transcribe_button = gr.Button("🔍 Transcribe")
                output_text = gr.Textbox(label="📝 Transcribed Text", lines=4)
                
    
        transcribe_button.click(fn=transcribe, inputs=audio_input, outputs=output_text)

    return demo

if __name__ == "__main__":
    interface = create_interface()
    interface.launch()

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-4
num_epochs: 6.65
hours: 2:52

Step Training Loss Validation Loss 100 0.565400 0.656878 200 0.481000 0.528320 300 0.457000 0.460658 400 0.822300 0.419748 500 0.298300 0.397042 ..... 8300 0.049500 0.215643 8400 0.024700 0.210167

Framework versions

Transformers 4.46.1
Pytorch 2.6.0+cu124
Datasets 3.5.0
Tokenizers 0.20.3

📌 Citation

If you use this model, please cite:

@misc{whispermediumcreoleoswald2025,
  title={oswald large  turbo M1},
  author={Jean sauvenel beaudry},
  year={2025},
  howpublished={\url{https://huggingface.co/jsbeaudry}}
}

Downloads last month: 15

Safetensors

Model size

0.8B params

Tensor type

BF16

F16

Model tree for jsbeaudry/oswald-large-v3-turbo-m1

Base model

openai/whisper-large-v3

Finetuned

unsloth/whisper-large-v3-turbo