speecht5_en-ng
This model is a fine-tuned version of microsoft/speecht5_tts on the openslr 70 dataset for Nigerian Accented speech generation. It achieves the following results on the evaluation set:
- Loss: 0.4877
Model description
More information needed
Intended uses & limitations
You can run SpeechT5 TTS locally with the 🤗 Transformers library.
- First install the 🤗 Transformers library, sentencepiece, soundfile and datasets(optional):
pip install --upgrade pip
pip install --upgrade transformers sentencepiece datasets[audio]
- Run inference
from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
from datasets import load_dataset
import torch
import soundfile as sf
# First, load the base model
model = SpeechT5ForTextToSpeech.from_pretrained("toyrem/speecht5_en-ng")
# Load the processor and the vocoder
processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
inputs = processor(text="Five sentenced to death in Nigeria over 'witchcraft' murder", return_tensors="pt")
emb_dataset = load_dataset("openslr/openslr", "SLR70")["train"][10]
speaker_embeddings = torch.tensor(emb_dataset["speaker_embeddings"]).unsqueeze(0)
speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
sf.write("speech.wav", speech.numpy(), samplerate=16000)
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.6553 | 1.0530 | 100 | 0.5963 |
0.5772 | 2.1060 | 200 | 0.5392 |
0.569 | 3.1589 | 300 | 0.5177 |
0.536 | 4.2119 | 400 | 0.5081 |
0.5445 | 5.2649 | 500 | 0.5046 |
0.5316 | 6.3179 | 600 | 0.4948 |
0.5352 | 7.3709 | 700 | 0.4932 |
0.5208 | 8.4238 | 800 | 0.4885 |
0.5204 | 9.4768 | 900 | 0.4881 |
0.5083 | 10.5298 | 1000 | 0.4877 |
Framework versions
- Transformers 4.47.0
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for toyrem/speecht5_en-ng
Base model
microsoft/speecht5_tts