metadata
language:
- dv
- en
license: apache-2.0
base_model: nari-labs/Dia-1.6B
tags:
- text-to-speech
- tts
- audio
- dhivehi
- maldivian
- speech-synthesis
- fine-tuned
library_name: dia
pipeline_tag: text-to-speech
datasets:
- alakxender/voice-synthetic
Dia TTS - Dhivehi Fine-tuned Model
This is a fine-tuned version of nari-labs/Dia-1.6B specifically trained for Dhivehi (Maldivian) text-to-speech synthesis.
Model Description
- Base Model: Dia-1.6B
- Language: Mixed, Dhivehi (dv)
- Task: Text-to-Speech (TTS)
- Fine-tuning: Specialized for Dhivehi audio synthesis
Usage
# Install Dia library first:
# pip install git+https://github.com/nari-labs/dia.git
# pip install soundfile
from dia.model import Dia
import soundfile as sf
import torch
print("🎤 Testing Dhivehi Dia TTS model...")
try:
# Load your fine-tuned model
print("📥 Loading model from HuggingFace...")
model = Dia.from_pretrained("alakxender/Dia-1.6B-dhivehi-ep1")
print("✓ Model loaded successfully!")
# Test texts - Basic samples
test_samples = {
# Basic samples
"basic_english": "Hello, this is a test.",
"basic_dhivehi": "އައްސަލާމް ޢަލައިކުމް، މިއީ ވަކި ޓެސްޓެކެވެ.",
# Mixed language tests
"mixed_greeting": "Hello އައްސަލާމް ޢަލައިކުމް، how are you? ހާލު ކިހިނެއް؟",
# Emotional expressions and sounds
"with_laughter": "That was so funny! (laughs) ވަރަށް މަޖާ އެނގޭ! (laughs) I can't stop laughing!",
# Complex emotional scenarios
"happy_announcement": "(laughs) Guess what? ބަލާ! I got the job! އަހަރެން ވަޒީފާ ލިބުނު! (claps) (claps) (laughs)",
"achievement": "After years of hard work... (claps) finally! އެންމެ ފަހުން! I graduated! އަހަރެން ފުރިހަމަ ކުރީ! (claps) (claps) (laughs)"
}
print("\n🗣️ Generating speech samples...")
generated_files = []
for name, text in test_samples.items():
try:
print(f"🎤 Generating: {name}")
print(f" Text: {text[:60]}{'...' if len(text) > 60 else ''}")
output = model.generate(text)
filename = f"{name}.wav"
sf.write(filename, output, 44100)
generated_files.append((filename, len(output)))
print(f" ✓ Saved: {filename} ({len(output)/44100:.2f}s)")
except Exception as e:
print(f" ❌ Failed to generate {name}: {e}")
print(f"\n🎉 TTS generation completed!")
print(f"📁 Generated {len(generated_files)} audio files:")
total_duration = 0
for filename, samples in generated_files:
duration = samples / 44100
total_duration += duration
print(f" - {filename:<25} ({duration:.2f}s)")
print(f"\n📊 Total audio generated: {total_duration:.2f} seconds")
except ImportError as e:
print("❌ Missing dependencies. Please install:")
print(" pip install git+https://github.com/nari-labs/dia.git")
print(" pip install soundfile")
print(f" Error: {e}")
except Exception as e:
print(f"❌ Error during TTS generation: {e}")
print("💡 Make sure the model was uploaded correctly and is accessible")
Training Details
- Base Model: nari-labs/Dia-1.6B
- Training Data: Dhivehi audio dataset
- Fine-tuning Approach: Direct training on Dhivehi audio with lang tag [dv] and speaker tags [MALE-01], [FEMALE-01]
Model Performance
This model has been specifically fine-tuned for Dhivehi speech synthesis, providing natural-sounding speech generation for Dhivehi text input.
Limitations
- Optimized specifically for Dhivehi language
- May not perform well on other languages
- Performance depends on input text quality and pronunciation patterns
License
This model is released under the Apache 2.0 License, following the original Dia model licensing.