alakxender's picture
Update README.md
f0b8c0f verified
metadata
language:
  - dv
  - en
license: apache-2.0
base_model: nari-labs/Dia-1.6B
tags:
  - text-to-speech
  - tts
  - audio
  - dhivehi
  - maldivian
  - speech-synthesis
  - fine-tuned
library_name: dia
pipeline_tag: text-to-speech
datasets:
  - alakxender/voice-synthetic

Dia TTS - Dhivehi Fine-tuned Model

This is a fine-tuned version of nari-labs/Dia-1.6B specifically trained for Dhivehi (Maldivian) text-to-speech synthesis.

Model Description

  • Base Model: Dia-1.6B
  • Language: Mixed, Dhivehi (dv)
  • Task: Text-to-Speech (TTS)
  • Fine-tuning: Specialized for Dhivehi audio synthesis

Usage

# Install Dia library first:
# pip install git+https://github.com/nari-labs/dia.git
# pip install soundfile

from dia.model import Dia
import soundfile as sf
import torch

print("🎤 Testing Dhivehi Dia TTS model...")

try:
    # Load your fine-tuned model
    print("📥 Loading model from HuggingFace...")
    model = Dia.from_pretrained("alakxender/Dia-1.6B-dhivehi-ep1")
    print("✓ Model loaded successfully!")
    
    # Test texts - Basic samples
    test_samples = {
        # Basic samples
        "basic_english": "Hello, this is a test.",
        "basic_dhivehi": "އައްސަލާމް ޢަލައިކުމް، މިއީ ވަކި ޓެސްޓެކެވެ.",
        
        # Mixed language tests
        "mixed_greeting": "Hello އައްސަލާމް ޢަލައިކުމް، how are you? ހާލު ކިހިނެއް؟",

        # Emotional expressions and sounds
        "with_laughter": "That was so funny! (laughs) ވަރަށް މަޖާ އެނގޭ! (laughs) I can't stop laughing!",
        
        # Complex emotional scenarios
        "happy_announcement": "(laughs) Guess what? ބަލާ! I got the job! އަހަރެން ވަޒީފާ ލިބުނު! (claps) (claps) (laughs)",
        "achievement": "After years of hard work... (claps) finally! އެންމެ ފަހުން! I graduated! އަހަރެން ފުރިހަމަ ކުރީ! (claps) (claps) (laughs)"
    }
    
    print("\n🗣️  Generating speech samples...")
    generated_files = []
    
    for name, text in test_samples.items():
        try:
            print(f"🎤 Generating: {name}")
            print(f"   Text: {text[:60]}{'...' if len(text) > 60 else ''}")
            
            output = model.generate(text)
            filename = f"{name}.wav"
            sf.write(filename, output, 44100)
            generated_files.append((filename, len(output)))
            print(f"   ✓ Saved: {filename} ({len(output)/44100:.2f}s)")
            
        except Exception as e:
            print(f"   ❌ Failed to generate {name}: {e}")
    
    print(f"\n🎉 TTS generation completed!")
    print(f"📁 Generated {len(generated_files)} audio files:")
    
    total_duration = 0
    for filename, samples in generated_files:
        duration = samples / 44100
        total_duration += duration
        print(f"   - {filename:<25} ({duration:.2f}s)")
    
    print(f"\n📊 Total audio generated: {total_duration:.2f} seconds")
    
except ImportError as e:
    print("❌ Missing dependencies. Please install:")
    print("   pip install git+https://github.com/nari-labs/dia.git")
    print("   pip install soundfile")
    print(f"   Error: {e}")
    
except Exception as e:
    print(f"❌ Error during TTS generation: {e}")
    print("💡 Make sure the model was uploaded correctly and is accessible")

Training Details

  • Base Model: nari-labs/Dia-1.6B
  • Training Data: Dhivehi audio dataset
  • Fine-tuning Approach: Direct training on Dhivehi audio with lang tag [dv] and speaker tags [MALE-01], [FEMALE-01]

Model Performance

This model has been specifically fine-tuned for Dhivehi speech synthesis, providing natural-sounding speech generation for Dhivehi text input.

Limitations

  • Optimized specifically for Dhivehi language
  • May not perform well on other languages
  • Performance depends on input text quality and pronunciation patterns

License

This model is released under the Apache 2.0 License, following the original Dia model licensing.