haitian_creole_tts_11K

This model is a fine-tuned version of microsoft/speecht5_tts on the mix of jsbeaudry/creole-text-voice & jsbeaudry/cmu_haitian_creole_speech datasets. It achieves the following results on the evaluation set:

Loss: 0.3390

🧠 Model Description

haitian_creole_tts_11K is a high-quality text-to-speech (TTS) model designed for Haitian Creole (Kreyòl Ayisyen). It is built and fine-tuned using 11,000+ curated audio-text pairs to synthesize natural, intelligible Creole speech for various use cases including education, accessibility, and conversational AI.

Architecture: Neural TTS (e.g., Tacotron2 + HiFi-GAN pipeline)
Trained for: Haitian Creole Text-to-Speech synthesis
Dataset: Over 11,000 Haitian Creole sentence-to-audio pairs
Voice Type: Male, Female synthetic & natural voices with clear articulation and native accent
Sampling Rate: 16 kHz
Phonetics: Uses standardized Creole orthography with support for diacritics
Objective: Generate natural and expressive Haitian Creole speech for daily communication, education tools, and virtual assistants

📊 Training and evaluation data

The model was trained on the creole-text-voice dataset, which includes:

15 hours of Haitian Creole Synthetic and Human speechs
Annotated, time-aligned text transcripts following Creole orthography

Model usage script

# Load model directly
!pip install transformers==4.46.1 "datasets>=3.4.1" soundfile


from transformers import pipeline, SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan

from datasets import load_dataset
import soundfile as sf
from IPython.display import Audio
import torch
synthesiser = pipeline("text-to-speech", "jsbeaudry/haitian_creole_tts_11K")


embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embedding = torch.tensor(embeddings_dataset[7304]["xvector"]).unsqueeze(0)
# You can replace this embedding with your own as well.


speech = synthesiser("Bonjou koman ou ye?", forward_params={"speaker_embeddings": speaker_embedding})

sf.write("speech.wav", speech["audio"], samplerate=speech["sampling_rate"])

# Play the audio
Audio("speech.wav", rate=16000)

Intended uses & limitations

May struggle with:
- Mixed texts (Creole + French/English )
- Long sententes

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 15
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
3.9128	0.3261	100	0.4148
3.6517	0.6523	200	0.4010
3.3982	0.9784	300	0.3897
3.2512	1.3062	400	0.3742
......
2.8046	14.3726	4400	0.3392
2.7897	14.6987	4500	0.3390

Framework versions

Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.6.0
Tokenizers 0.20.3

jsbeaudry
/

haitian_creole_tts_11K

haitian_creole_tts_11K

🧠 Model Description

📊 Training and evaluation data

Model usage script

Intended uses & limitations

Training hyperparameters

Training results

Framework versions

Model tree for jsbeaudry/haitian_creole_tts_11K

Datasets used to train jsbeaudry/haitian_creole_tts_11K

Space using jsbeaudry/haitian_creole_tts_11K 1

Evaluation results