SinhalaVITS-TTS-F2 - Female Voice 02

This is a specially trained Coqui TTS Coqui TTS model specially for Sinhala, developed by Dialog Axiata PLC and the Dialog – UoM Research Lab.

We trained it on a custom recorded dataset adapting a clear female voice.

Features

Model architecture: VITS
Language: Sinhala (si-lk)
Training Sampling rate: 22050 Hz
Framework: Coqui TTS

Dataset

Voice: Female (Sanuki)
Recording Sampling Rate: 44100Hz
No. of Clips: 1096
Total Length: >100mins (~2 hrs.)

Training Specs

Hardware: NVidia GeForce GTX1060 6GB GPU
Training Time: ~95 hours
Global Steps: 190,000
Batch Size: 16
Epochs:
Loss Convergence: Stable mel + KL losses

Installation

You can run this model locally using the included Flask-based inference server. This server will automatically use CUDA if it's available on your system.

First install requirements.
```
 pip install -r requirements.txt
```
Then start the API server

  python inference_F2.py

This starts a Flask server at http://localhost:8000.

Then you can use curl or any HTTP client (like Postman) to send Sinhala text to the server. The API endpoint is '/tts'

  curl -X POST http://localhost:8000/tts \
     -H "Content-Type: application/json" \
     -d '{"text": "ආයුබෝවන්"}' \
     --output output.wav

This API will,
- Convert Sinhala text → Romanized Sinhala (via romanizer.py)
- Generate speech using the VITS model
- Return output.wav (Sinhala voice)

File Structure

  SinhalaVITS-TTS-M2/
    ├── Sanuki_190000.pth           # Fine-tuned VITS checkpoint
    ├── Sanuki_config.json          # Model configuration
    ├── romanizer.py                # Sinhala → Roman converter
    ├── inference_F2.py             # Flask-based inference server
    ├── requirements.txt            # Required dependencies
    ├── LICENSE                     # MPL-2.0 license
    └── README.md                   # This file

Contributors

Kasun Ranasinghe (Dialog-UoM Reasearch Lab)
Randika Silva (Dialog Axiata PLC)
Vipula Wakkumbura (Dialog-UoM Reasearch Lab)

Acknowledgements

PathNirvana (https://github.com/pathnirvana/coqui-tts) – Previous work in Sinhala TTS
Coqui TTS – Open-source TTS framework enabling the foundation of this work
Sinhala dataset contributor (Sanuki Bentharage) – for providing professional, quality speech samples

License

This model is released under the MPL-2.0 license.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

dialoglk
/

SinhalaVITS-TTS-F2