SinhalaVITS-TTS-F2 - Female Voice 02
This is a specially trained Coqui TTS Coqui TTS model specially for Sinhala, developed by Dialog Axiata PLC and the Dialog β UoM Research Lab.
We trained it on a custom recorded dataset adapting a clear female voice.
Features
- Model architecture: VITS
- Language: Sinhala (si-lk)
- Training Sampling rate: 22050 Hz
- Framework: Coqui TTS
Dataset
- Voice: Female (Sanuki)
- Recording Sampling Rate: 44100Hz
- No. of Clips: 1096
- Total Length: >100mins (~2 hrs.)
Training Specs
- Hardware: NVidia GeForce GTX1060 6GB GPU
- Training Time: ~95 hours
- Global Steps: 190,000
- Batch Size: 16
- Epochs:
- Loss Convergence: Stable mel + KL losses
Installation
You can run this model locally using the included Flask-based inference server. This server will automatically use CUDA if it's available on your system.
First install requirements.
pip install -r requirements.txtThen start the API server
python inference_F2.py
This starts a Flask server at http://localhost:8000.
- Then you can use curl or any HTTP client (like Postman) to send Sinhala text to the server. The API endpoint is '/tts'
curl -X POST http://localhost:8000/tts \
-H "Content-Type: application/json" \
-d '{"text": "ΰΆΰΆΊΰ·ΰΆΆΰ·ΰ·ΰΆ±ΰ·"}' \
--output output.wav
- This API will,
- Convert Sinhala text β Romanized Sinhala (via romanizer.py)
- Generate speech using the VITS model
- Return output.wav (Sinhala voice)
File Structure
SinhalaVITS-TTS-M2/
βββ Sanuki_190000.pth # Fine-tuned VITS checkpoint
βββ Sanuki_config.json # Model configuration
βββ romanizer.py # Sinhala β Roman converter
βββ inference_F2.py # Flask-based inference server
βββ requirements.txt # Required dependencies
βββ LICENSE # MPL-2.0 license
βββ README.md # This file
Contributors
- Kasun Ranasinghe (Dialog-UoM Reasearch Lab)
- Randika Silva (Dialog Axiata PLC)
- Vipula Wakkumbura (Dialog-UoM Reasearch Lab)
Acknowledgements
- PathNirvana (https://github.com/pathnirvana/coqui-tts) β Previous work in Sinhala TTS
- Coqui TTS β Open-source TTS framework enabling the foundation of this work
- Sinhala dataset contributor (Sanuki Bentharage) β for providing professional, quality speech samples
License
This model is released under the MPL-2.0 license.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support