haitian_creole_tts_11K

This model is a fine-tuned version of microsoft/speecht5_tts on the mix of jsbeaudry/creole-text-voice & jsbeaudry/cmu_haitian_creole_speech datasets. It achieves the following results on the evaluation set:

  • Loss: 0.3390

🧠 Model Description

haitian_creole_tts_11K is a high-quality text-to-speech (TTS) model designed for Haitian Creole (KreyΓ²l Ayisyen). It is built and fine-tuned using 11,000+ curated audio-text pairs to synthesize natural, intelligible Creole speech for various use cases including education, accessibility, and conversational AI.

  • Architecture: Neural TTS (e.g., Tacotron2 + HiFi-GAN pipeline)
  • Trained for: Haitian Creole Text-to-Speech synthesis
  • Dataset: Over 11,000 Haitian Creole sentence-to-audio pairs
  • Voice Type: Male, Female synthetic & natural voices with clear articulation and native accent
  • Sampling Rate: 16 kHz
  • Phonetics: Uses standardized Creole orthography with support for diacritics
  • Objective: Generate natural and expressive Haitian Creole speech for daily communication, education tools, and virtual assistants

πŸ“Š Training and evaluation data

The model was trained on the creole-text-voice dataset, which includes:

  • 15 hours of Haitian Creole Synthetic and Human speechs
  • Annotated, time-aligned text transcripts following Creole orthography

Model usage script

# Load model directly
!pip install transformers==4.46.1 "datasets>=3.4.1" soundfile


from transformers import pipeline, SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan

from datasets import load_dataset
import soundfile as sf
from IPython.display import Audio
import torch
synthesiser = pipeline("text-to-speech", "jsbeaudry/haitian_creole_tts_11K")


embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embedding = torch.tensor(embeddings_dataset[7304]["xvector"]).unsqueeze(0)
# You can replace this embedding with your own as well.


speech = synthesiser("Bonjou koman ou ye?", forward_params={"speaker_embeddings": speaker_embedding})

sf.write("speech.wav", speech["audio"], samplerate=speech["sampling_rate"])

# Play the audio
Audio("speech.wav", rate=16000)

Intended uses & limitations

  • May struggle with:
    • Mixed texts (Creole + French/English )
    • Long sententes

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 15
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
3.9128 0.3261 100 0.4148
3.6517 0.6523 200 0.4010
3.3982 0.9784 300 0.3897
3.2512 1.3062 400 0.3742
......
2.8046 14.3726 4400 0.3392
2.7897 14.6987 4500 0.3390

Framework versions

  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.6.0
  • Tokenizers 0.20.3
Downloads last month
637
Safetensors
Model size
144M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jsbeaudry/haitian_creole_tts_11K

Finetuned
(1182)
this model

Datasets used to train jsbeaudry/haitian_creole_tts_11K

Space using jsbeaudry/haitian_creole_tts_11K 1