MOSS‑TTSD‑v0.5 (Dhivehi Fine‑Tuned)
A Dhivehi fine‑tuned version of fnlp/MOSS‑TTSD‑v0.5, adapted for early-stage testing and prototyping of Dhivehi spoken dialogue synthesis. This experimental release leverages the original model’s capabilities—including voice cloning and long-form generation—for Dhivehi use cases.
Note: This is a proof-of-concept finetune. Instability in speaker identity and timbre consistency is expected due to limited adaptation and inherited limitations from the base model.
Overview
- Fine-tuned on: Dhivehi dialogue generation
- Known limitations:
- Speaker switching errors
- Timbre cloning deviations
These stem from both early fine‑tuning and base model constraints, and will be addressed in future updates.
Training Summary
- Epochs: 10
- Final train loss: ~59.50
- Validation/final loss: ~56.11
- Gradient norm (final): ~32.0
- Learning rate (final): ~2.53 × 10⁻¹⁴
- Runtime:
18.5 hours (66,795 seconds) - Sample throughput: ~1.64 samples/sec
Usage
Install MOSS-TTSD
python inference.py \
--jsonl examples/examples.jsonl \
--output_dir outputs \
--seed 42 \
--use_normalize
Dhivehi example (Text Only):
{
"base_path":"examples",
"text": "[S1] މާޒީގެ އުޖާލާ މަންޒަރުތައް.
[S2] މާރީތި އުފާވެރި ކުރެހުންތައް.
[S2] ދާތީ އަދު ހާމަ ވަމުން ކުލަތައް.
[S1] ތާރީޚު އަލުން މި އިޢާދަ ވަނީ."
}
Dhivehi example (Clone):
{
"base_path":"examples",
"text": "[S1] މާޒީގެ އުޖާލާ މަންޒަރުތައް.
[S2] މާރީތި އުފާވެރި ކުރެހުންތައް.
[S2] ދާތީ އަދު ހާމަ ވަމުން ކުލަތައް.
[S1] ތާރީޚު އަލުން މި އިޢާދަ ވަނީ.",
"prompt_audio_speaker1":"f1.wav",
"prompt_text_speaker1": "އިފްޝާއަށް ދިމާވި ހާދިސާ ލޮލުން ފެނުނު މީހެއްގެ ހައިސިޔަތުން","prompt_audio_speaker2":"f2.wav",
"prompt_text_speaker2": "ޝިނާނާ ލޯބީގެ ގޮތުން ގުޅުނު ކަމުގައިވީނަމަވެސް އެކަހެރިވެ ބަދަލުކުރުމާ އަނދިރި ތަންތަނުގަ ބައްދަލު ކުރުމުގެ އިތުރުން ގައިގަ އަތްލުން ކީއްކުރަން އިނގިލި ކުރި ޖައްސާލަންވެސް މުސްކާން މަނާ ކުރި"
}
Examples and a gradio app can be found here
Sample output:
License & Ethics
Intended for research and development use. Use responsibly—this is not suitable for impersonation, fraud, unauthorized cloning, or deceptive applications.
Acknowledgments
- Based on fnlp/MOSS‑TTSD‑v0.5
- Usage patterns and inference methods follow the official MOSS‑TTSD GitHub repository
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
1
Ask for provider support