speecht5_en-ng

This model is a fine-tuned version of microsoft/speecht5_tts on the openslr 70 dataset for Nigerian Accented speech generation. It achieves the following results on the evaluation set:

  • Loss: 0.4877

Model description

More information needed

Intended uses & limitations

You can run SpeechT5 TTS locally with the 🤗 Transformers library.

  1. First install the 🤗 Transformers library, sentencepiece, soundfile and datasets(optional):
pip install --upgrade pip
pip install --upgrade transformers sentencepiece datasets[audio]
  1. Run inference

from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
from datasets import load_dataset
import torch
import soundfile as sf

# First, load the base model
model = SpeechT5ForTextToSpeech.from_pretrained("toyrem/speecht5_en-ng")

# Load the processor and the vocoder
processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

inputs = processor(text="Five sentenced to death in Nigeria over 'witchcraft' murder", return_tensors="pt")

emb_dataset = load_dataset("openslr/openslr", "SLR70")["train"][10]
speaker_embeddings = torch.tensor(emb_dataset["speaker_embeddings"]).unsqueeze(0)

speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)

sf.write("speech.wav", speech.numpy(), samplerate=16000)

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.6553 1.0530 100 0.5963
0.5772 2.1060 200 0.5392
0.569 3.1589 300 0.5177
0.536 4.2119 400 0.5081
0.5445 5.2649 500 0.5046
0.5316 6.3179 600 0.4948
0.5352 7.3709 700 0.4932
0.5208 8.4238 800 0.4885
0.5204 9.4768 900 0.4881
0.5083 10.5298 1000 0.4877

Framework versions

  • Transformers 4.47.0
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
0
Safetensors
Model size
144M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for toyrem/speecht5_en-ng

Finetuned
(919)
this model

Dataset used to train toyrem/speecht5_en-ng