|
--- |
|
library_name: transformers |
|
tags: |
|
- dhivehi-tts |
|
license: mit |
|
datasets: |
|
- alakxender/dv_syn_speech_md |
|
language: |
|
- dv |
|
base_model: |
|
- facebook/mms-tts-div |
|
--- |
|
|
|
# Divehi TTS – Male Voice (VITS-based) |
|
|
|
This is a fine-tuned VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) model for Divehi speech synthesis. The model produces Male voice audio from Thaana-scripted Divehi text. Fine-tuned from Meta’s MMS-TTS architecture using a curated dataset of synthetic Divehi speech. |
|
|
|
## Model Details |
|
|
|
| Field | Value | |
|
|----------------------|-------------------------------------------------| |
|
| **Model ID** | `alakxender/mms-tts-div-finetuned-md-m02` | |
|
| **Base Architecture**| MMS-TTS (VITS) | |
|
| **Language** | Divehi (dv) | |
|
| **Voice** | Male | |
|
| **Sampling Rate** | 16 kHz | |
|
| **Tokenizer** | VITSTokenizer | |
|
| **Inference Engine** | Transformers (🤗 Hugging Face) | |
|
|
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import VitsModel, VitsTokenizer |
|
import torchaudio |
|
|
|
tokenizer = VitsTokenizer.from_pretrained("alakxender/mms-tts-div-finetuned-md-m02") |
|
model = VitsModel.from_pretrained("alakxender/mms-tts-div-finetuned-md-m02") |
|
|
|
text = "މޫސުން ވަރަށް ގޯސްވެ، ފުވައްމުލަކުން ފެށިގެން އައްޑުއަށް އޮރެންޖް އެލާޓް ނެރެފި" |
|
inputs = tokenizer(text, return_tensors="pt") |
|
waveform = model.generate(**inputs).waveform[0] |
|
|
|
torchaudio.save("output.wav", waveform.unsqueeze(0), 16000) |
|
``` |
|
|
|
## Evaluation Summary |
|
|
|
- **Model**: `alakxender/mms-tts-div-finetuned-md-m02` |
|
- **Evaluated Samples**: 3 |
|
- **Avg Estimated MOS (UTMOS)**: `2.926` |
|
```json |
|
{ |
|
"5": "Excellent (very natural)", |
|
"4": "Good (mostly natural)", |
|
"3": "Fair (some robotic quality)", |
|
"2": "Poor (noticeably unnatural)", |
|
"1": "Bad (unintelligible or very synthetic)" |
|
} |
|
``` |
|
- **Artifacts**: |
|
- 🎵 Audio: `outputs/audio/` |
|
- 📊 Spectrograms: `outputs/spectrograms/` |
|
- 📄 Report: `outputs/report.txt` |
|
- 📈 MOS Scores: `outputs/mos_scores.txt` |
|
|
|
## Acknowledgements |
|
|
|
- [Meta MMS-TTS](https://github.com/facebookresearch/fairseq/tree/main/examples/mms) |
|
- [Tarepan's SpeechMOS](https://github.com/Tarepan/SpeechMOS) |
|
- [Hugging Face 🤗 Transformers](https://huggingface.co/transformers/) |
|
|
|
|