metadata

language:
  - dv
  - en
license: apache-2.0
base_model: nari-labs/Dia-1.6B
tags:
  - text-to-speech
  - tts
  - audio
  - dhivehi
  - maldivian
  - speech-synthesis
  - fine-tuned
library_name: dia
pipeline_tag: text-to-speech
datasets:
  - alakxender/voice-synthetic

Dia TTS - Dhivehi Fine-tuned Model

This is a fine-tuned version of nari-labs/Dia-1.6B specifically trained for Dhivehi (Maldivian) text-to-speech synthesis.

Model Description

Base Model: Dia-1.6B
Language: Mixed, Dhivehi (dv)
Task: Text-to-Speech (TTS)
Fine-tuning: Specialized for Dhivehi audio synthesis

Usage

# Install Dia library first:
# pip install git+https://github.com/nari-labs/dia.git
# pip install soundfile

from dia.model import Dia
import soundfile as sf
import torch

print("🎤 Testing Dhivehi Dia TTS model...")

try:
    # Load your fine-tuned model
    print("📥 Loading model from HuggingFace...")
    model = Dia.from_pretrained("alakxender/Dia-1.6B-dhivehi-ep1")
    print("✓ Model loaded successfully!")
    
    # Test texts - Basic samples
    test_samples = {
        # Basic samples
        "basic_english": "Hello, this is a test.",
        "basic_dhivehi": "އައްސަލާމް ޢަލައިކުމް، މިއީ ވަކި ޓެސްޓެކެވެ.",
        
        # Mixed language tests
        "mixed_greeting": "Hello އައްސަލާމް ޢަލައިކުމް، how are you? ހާލު ކިހިނެއް؟",

        # Emotional expressions and sounds
        "with_laughter": "That was so funny! (laughs) ވަރަށް މަޖާ އެނގޭ! (laughs) I can't stop laughing!",
        
        # Complex emotional scenarios
        "happy_announcement": "(laughs) Guess what? ބަލާ! I got the job! އަހަރެން ވަޒީފާ ލިބުނު! (claps) (claps) (laughs)",
        "achievement": "After years of hard work... (claps) finally! އެންމެ ފަހުން! I graduated! އަހަރެން ފުރިހަމަ ކުރީ! (claps) (claps) (laughs)"
    }
    
    print("\n🗣️  Generating speech samples...")
    generated_files = []
    
    for name, text in test_samples.items():
        try:
            print(f"🎤 Generating: {name}")
            print(f"   Text: {text[:60]}{'...' if len(text) > 60 else ''}")
            
            output = model.generate(text)
            filename = f"{name}.wav"
            sf.write(filename, output, 44100)
            generated_files.append((filename, len(output)))
            print(f"   ✓ Saved: {filename} ({len(output)/44100:.2f}s)")
            
        except Exception as e:
            print(f"   ❌ Failed to generate {name}: {e}")
    
    print(f"\n🎉 TTS generation completed!")
    print(f"📁 Generated {len(generated_files)} audio files:")
    
    total_duration = 0
    for filename, samples in generated_files:
        duration = samples / 44100
        total_duration += duration
        print(f"   - {filename:<25} ({duration:.2f}s)")
    
    print(f"\n📊 Total audio generated: {total_duration:.2f} seconds")
    
except ImportError as e:
    print("❌ Missing dependencies. Please install:")
    print("   pip install git+https://github.com/nari-labs/dia.git")
    print("   pip install soundfile")
    print(f"   Error: {e}")
    
except Exception as e:
    print(f"❌ Error during TTS generation: {e}")
    print("💡 Make sure the model was uploaded correctly and is accessible")

Training Details

Base Model: nari-labs/Dia-1.6B
Training Data: Dhivehi audio dataset
Fine-tuning Approach: Direct training on Dhivehi audio with lang tag [dv] and speaker tags [MALE-01], [FEMALE-01]

Model Performance

This model has been specifically fine-tuned for Dhivehi speech synthesis, providing natural-sounding speech generation for Dhivehi text input.

Limitations

Optimized specifically for Dhivehi language
May not perform well on other languages
Performance depends on input text quality and pronunciation patterns

License

This model is released under the Apache 2.0 License, following the original Dia model licensing.