alakxender's picture
Update README.md
1685a68 verified
---
library_name: transformers
tags:
- dhivehi-tts
license: mit
datasets:
- alakxender/dv_syn_speech_md
language:
- dv
base_model:
- facebook/mms-tts-div
---
# Divehi TTS – Male Voice (VITS-based)
This is a fine-tuned VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) model for Divehi speech synthesis. The model produces Male voice audio from Thaana-scripted Divehi text. Fine-tuned from Meta’s MMS-TTS architecture using a curated dataset of synthetic Divehi speech.
## Model Details
| Field | Value |
|----------------------|-------------------------------------------------|
| **Model ID** | `alakxender/mms-tts-div-finetuned-md-m02` |
| **Base Architecture**| MMS-TTS (VITS) |
| **Language** | Divehi (dv) |
| **Voice** | Male |
| **Sampling Rate** | 16 kHz |
| **Tokenizer** | VITSTokenizer |
| **Inference Engine** | Transformers (🤗 Hugging Face) |
## Usage
```python
from transformers import VitsModel, VitsTokenizer
import torchaudio
tokenizer = VitsTokenizer.from_pretrained("alakxender/mms-tts-div-finetuned-md-m02")
model = VitsModel.from_pretrained("alakxender/mms-tts-div-finetuned-md-m02")
text = "މޫސުން ވަރަށް ގޯސްވެ، ފުވައްމުލަކުން ފެށިގެން އައްޑުއަށް އޮރެންޖް އެލާޓް ނެރެފި"
inputs = tokenizer(text, return_tensors="pt")
waveform = model.generate(**inputs).waveform[0]
torchaudio.save("output.wav", waveform.unsqueeze(0), 16000)
```
## Evaluation Summary
- **Model**: `alakxender/mms-tts-div-finetuned-md-m02`
- **Evaluated Samples**: 3
- **Avg Estimated MOS (UTMOS)**: `2.926`
```json
{
"5": "Excellent (very natural)",
"4": "Good (mostly natural)",
"3": "Fair (some robotic quality)",
"2": "Poor (noticeably unnatural)",
"1": "Bad (unintelligible or very synthetic)"
}
```
- **Artifacts**:
- 🎵 Audio: `outputs/audio/`
- 📊 Spectrograms: `outputs/spectrograms/`
- 📄 Report: `outputs/report.txt`
- 📈 MOS Scores: `outputs/mos_scores.txt`
## Acknowledgements
- [Meta MMS-TTS](https://github.com/facebookresearch/fairseq/tree/main/examples/mms)
- [Tarepan's SpeechMOS](https://github.com/Tarepan/SpeechMOS)
- [Hugging Face 🤗 Transformers](https://huggingface.co/transformers/)