MOSS‑TTSD‑v0.5 (Dhivehi Fine‑Tuned)

A Dhivehi fine‑tuned version of fnlp/MOSS‑TTSD‑v0.5, adapted for early-stage testing and prototyping of Dhivehi spoken dialogue synthesis. This experimental release leverages the original model’s capabilities—including voice cloning and long-form generation—for Dhivehi use cases.

Note: This is a proof-of-concept finetune. Instability in speaker identity and timbre consistency is expected due to limited adaptation and inherited limitations from the base model.

Overview

Fine-tuned on: Dhivehi dialogue generation
Known limitations:
- Speaker switching errors
- Timbre cloning deviations
  These stem from both early fine‑tuning and base model constraints, and will be addressed in future updates.

Training Summary

Epochs: 10
Final train loss: ~59.50
Validation/final loss: ~56.11
Gradient norm (final): ~32.0
Learning rate (final): ~2.53 × 10⁻¹⁴
Runtime: ~~18.5 hours (~~66,795 seconds)
Sample throughput: ~1.64 samples/sec

Usage

Install MOSS-TTSD

python inference.py \
  --jsonl examples/examples.jsonl \
  --output_dir outputs \
  --seed 42 \
  --use_normalize

Dhivehi example (Text Only):

{
    "base_path":"examples",
    "text": "[S1] މާޒީގެ އުޖާލާ މަންޒަރުތައް.
             [S2] މާރީތި އުފާވެރި ކުރެހުންތައް.
             [S2] ދާތީ އަދު ހާމަ ވަމުން ކުލަތައް.
             [S1] ތާރީޚު އަލުން މި އިޢާދަ ވަނީ."
}

Dhivehi example (Clone):

{
    "base_path":"examples",
    "text": "[S1] މާޒީގެ އުޖާލާ މަންޒަރުތައް.
             [S2] މާރީތި އުފާވެރި ކުރެހުންތައް.
             [S2] ދާތީ އަދު ހާމަ ވަމުން ކުލަތައް.
             [S1] ތާރީޚު އަލުން މި އިޢާދަ ވަނީ.",
    "prompt_audio_speaker1":"f1.wav",
    "prompt_text_speaker1": "އިފްޝާއަށް ދިމާވި ހާދިސާ ލޮލުން ފެނުނު މީހެއްގެ ހައިސިޔަތުން","prompt_audio_speaker2":"f2.wav",
    "prompt_text_speaker2": "ޝިނާނާ ލޯބީގެ ގޮތުން ގުޅުނު ކަމުގައިވީނަމަވެސް އެކަހެރިވެ ބަދަލުކުރުމާ އަނދިރި ތަންތަނުގަ ބައްދަލު ކުރުމުގެ އިތުރުން ގައިގަ އަތްލުން ކީއްކުރަން އިނގިލި ކުރި ޖައްސާލަންވެސް މުސްކާން މަނާ ކުރި"
}

Examples and a gradio app can be found here

Sample output:

License & Ethics

Intended for research and development use. Use responsibly—this is not suitable for impersonation, fraud, unauthorized cloning, or deceptive applications.

Acknowledgments

Based on fnlp/MOSS‑TTSD‑v0.5
Usage patterns and inference methods follow the official MOSS‑TTSD GitHub repository

alakxender
/

moss-ttsd-dv-ft-t01

You need to agree to share your contact information to access this model