SinhalaVITS-TTS-F2 - Female Voice 02

This is a specially trained Coqui TTS Coqui TTS model specially for Sinhala, developed by Dialog Axiata PLC and the Dialog – UoM Research Lab.

We trained it on a custom recorded dataset adapting a clear female voice.


Features

  • Model architecture: VITS
  • Language: Sinhala (si-lk)
  • Training Sampling rate: 22050 Hz
  • Framework: Coqui TTS

Dataset

  • Voice: Female (Sanuki)
  • Recording Sampling Rate: 44100Hz
  • No. of Clips: 1096
  • Total Length: >100mins (~2 hrs.)

Training Specs

  • Hardware: NVidia GeForce GTX1060 6GB GPU
  • Training Time: ~95 hours
  • Global Steps: 190,000
  • Batch Size: 16
  • Epochs:
  • Loss Convergence: Stable mel + KL losses

Installation

You can run this model locally using the included Flask-based inference server. This server will automatically use CUDA if it's available on your system.

  1. First install requirements.

     pip install -r requirements.txt
    
  2. Then start the API server

  python inference_F2.py

This starts a Flask server at http://localhost:8000.

  1. Then you can use curl or any HTTP client (like Postman) to send Sinhala text to the server. The API endpoint is '/tts'
  curl -X POST http://localhost:8000/tts \
     -H "Content-Type: application/json" \
     -d '{"text": "ΰΆ†ΰΆΊΰ·”ΰΆΆΰ·ΰ·€ΰΆ±ΰ·Š"}' \
     --output output.wav
  1. This API will,
    • Convert Sinhala text β†’ Romanized Sinhala (via romanizer.py)
    • Generate speech using the VITS model
    • Return output.wav (Sinhala voice)

File Structure

  SinhalaVITS-TTS-M2/
    β”œβ”€β”€ Sanuki_190000.pth           # Fine-tuned VITS checkpoint
    β”œβ”€β”€ Sanuki_config.json          # Model configuration
    β”œβ”€β”€ romanizer.py                # Sinhala β†’ Roman converter
    β”œβ”€β”€ inference_F2.py             # Flask-based inference server
    β”œβ”€β”€ requirements.txt            # Required dependencies
    β”œβ”€β”€ LICENSE                     # MPL-2.0 license
    └── README.md                   # This file

Contributors

  • Kasun Ranasinghe (Dialog-UoM Reasearch Lab)
  • Randika Silva (Dialog Axiata PLC)
  • Vipula Wakkumbura (Dialog-UoM Reasearch Lab)

Acknowledgements

  • PathNirvana (https://github.com/pathnirvana/coqui-tts) – Previous work in Sinhala TTS
  • Coqui TTS – Open-source TTS framework enabling the foundation of this work
  • Sinhala dataset contributor (Sanuki Bentharage) – for providing professional, quality speech samples

License

This model is released under the MPL-2.0 license.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using dialoglk/SinhalaVITS-TTS-F2 1