oswald-large-v3-turbo-m1

This model is a fine-tuned version of openai/unsloth/whisper-large-v3-turbo on the creole-text-voice dataset.
The main objective is to create a 99% accurate Haitian Creole Speech-to-Text model, capable of transcribing diverse Haitian voices across accents, regions, and speaking styles.


🧠 Model description

oswald-large-v3-turbo-m1 is optimized for Haitian Creole automatic speech recognition (ASR). It builds upon the Whisper architecture by OpenAI and adapts it to Haitian Creole through transfer learning and fine-tuning on a high-quality curated dataset containing hours of Haitian Creole audio-text pairs.

  • Architecture: Whisper Large
  • Fine-tuned for: Haitian Creole (KreyΓ²l Ayisyen)
  • Vocabulary: Based on Latin script (Creole orthography), preserving diacritics and linguistic nuances.
  • Voice types: Made with female and male synthetics and naturals voices.
  • Sampling rate: 16kHz
  • Training objective: Maximize transcription accuracy for everyday Creole speech

βœ… Intended uses

  • Transcribe Haitian Creole speech from:

    • Voice notes
    • Radio shows
    • Interviews
    • Public speeches
    • Educational content
    • Synthetic voices
  • Enable Creole voice interfaces in:

    • Voice assistants
    • Transcription services
    • Language-learning tools
    • Chatbots and accessibility platforms

⚠️ Limitations

  • May struggle with:
    • Extremely poor audio quality (e.g., heavy background noise)
    • Very fast or mumbled speech in some dialects
    • Long duration audio file
  • Not optimized for real-time transcription on low-resource devices
  • Fine-tuned on a specific dataset – might generalize less to completely unseen voice types or rare accents

πŸ“Š Training and evaluation data

The model was trained on the creole-text-voice dataset, which includes:

  • 7 hours of Haitian Creole Synthetic speech
  • 8 hours of Haitian Creole Human speech
  • Annotated, time-aligned text transcripts following standard Creole orthography

Sources for next steps:

  • Public domain radio and podcast archives
  • Open-access interviews and spoken-word audio
  • Community-submitted voice samples

Preprocessing steps:

  • Voice Activity Detection (VAD)
  • Noise filtering and audio normalization
  • Manual transcript review and correction

Model usage script

# Load model directly
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import librosa
import numpy as np
import torch

processor = AutoProcessor.from_pretrained("jsbeaudry/oswald-large-v3-turbo-m1")
model = AutoModelForSpeechSeq2Seq.from_pretrained("jsbeaudry/oswald-large-v3-turbo-m1")

def transcript (audio_file_path):
   
    # Load audio
    speech_array, sampling_rate = librosa.load(audio_file_path, sr=16000)

    # Convert the NumPy array to a PyTorch tensor
    speech_array_pt = torch.from_numpy(speech_array).unsqueeze(0)

    input_features = processor(speech_array, sampling_rate=sampling_rate, return_tensors="pt").input_features 

    # 2. Generate predictions
    predicted_ids = model.generate(input_features)

    # 3. Decode the predictions
    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

    # print(transcription)
    return transcription

text = transcript("/path_audio")

print(text)

Model usage with gradio (UI)


from transformers import pipeline
import gradio as gr

# Load Whisper model
print("Loading model...")
pipe = pipeline(model="jsbeaudry/oswald-large-v3-turbo-m1")
print("Model loaded successfully.")

# Transcription function
def transcribe(audio_path):
    if audio_path is None:
        return "Please upload or record an audio file first."
    result = pipe(audio_path)
    return result["text"]

# Build Gradio interface
def create_interface():
    with gr.Blocks(title="Whisper Medium - Haitian Creole") as demo:
        gr.Markdown("# πŸŽ™οΈ Whisper Medium Creole ASR")
        gr.Markdown(
            "Upload an audio file or record your voice in Haitian Creole. "
            "Then click **Transcribe** to see the result."
        )

        with gr.Row():
            with gr.Column():
                audio_input = gr.Audio(source="upload", type="filepath", label="🎧 Upload Audio")
            with gr.Column():
                transcribe_button = gr.Button("πŸ” Transcribe")
                output_text = gr.Textbox(label="πŸ“ Transcribed Text", lines=4)
                
    
        transcribe_button.click(fn=transcribe, inputs=audio_input, outputs=output_text)

    return demo

if __name__ == "__main__":
    interface = create_interface()
    interface.launch()

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-4
  • num_epochs: 6.65
  • hours: 2:52

Step Training Loss Validation Loss 100 0.565400 0.656878 200 0.481000 0.528320 300 0.457000 0.460658 400 0.822300 0.419748 500 0.298300 0.397042 ..... 8300 0.049500 0.215643 8400 0.024700 0.210167

Framework versions

  • Transformers 4.46.1
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.20.3

πŸ“Œ Citation

If you use this model, please cite:

@misc{whispermediumcreoleoswald2025,
  title={oswald large  turbo M1},
  author={Jean sauvenel beaudry},
  year={2025},
  howpublished={\url{https://huggingface.co/jsbeaudry}}
}
Downloads last month
78
Safetensors
Model size
809M params
Tensor type
BF16
Β·
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jsbeaudry/oswald-large-v3-turbo-m1

Finetuned
(11)
this model

Datasets used to train jsbeaudry/oswald-large-v3-turbo-m1

Space using jsbeaudry/oswald-large-v3-turbo-m1 1