PapaRazi/Ijazah_Palsu_V2 · 🇮🇩 Indonesian TTS Model (F5-TTS)

Ijazah_Palsu_V2 is a fine-tuned Indonesian speech synthesis model based on F5-TTS.
It was trained using a custom-curated dataset called PapaRazi/id-tts-v2, focusing on natural and expressive Indonesian speech generation.

🧠 Model Details

Base Framework: F5-TTS
Training Time: ~3 days
Dataset Size: ~70,000 samples (70 hours)
Languages:
- Bahasa Indonesia (95%)
- English (5%) (limited English quality due to small dataset size)
License: Non-commercial use only
Author: [PapaRazi] (https://huggingface.co/PapaRazi) / (https://github.com/adigayung)

🛠 Training Configuration

{
  "exp_name": "F5TTS_v1_Base",
  "learning_rate": 1e-05,
  "batch_size_per_gpu": 1700,
  "batch_size_type": "frame",
  "max_samples": 64,
  "grad_accumulation_steps": 1,
  "max_grad_norm": 1,
  "epochs": 34,
  "num_warmup_updates": 7000,
  "save_per_updates": 15000,
  "keep_last_n_checkpoints": 7,
  "last_per_updates": 15000,
  "finetune": true,
  "file_checkpoint_train": "",
  "tokenizer_type": "char",
  "tokenizer_file": "",
  "mixed_precision": "fp16",
  "logger": "tensorboard",
  "bnb_optimizer": false
}

📦 Dataset The dataset used for training is called PapaRazi/id-tts-v2, consisting of curated and cleaned audio-text pairs in Bahasa Indonesia. All preprocessing, splitting, and cleaning was done using a custom tool I developed: 🔧 whisper-tools

The default dataset splitter from F5-TTS produced inconsistent results (clips that were too short or way too long), so I built a custom pipeline to ensure clean, consistent samples.

🔊 Audio Samples

🗣 Natural Sentence

"Suatu hari nanti, suara ini mungkin tidak bisa dibedakan lagi dari suara manusia asli."
🎧 Listen on vocaroo

🔢 Number Pronunciation (simple format)

"Serius?! Tiket konsernya habis dalam waktu 3 menit?!"
🎧 Listen on vocaroo

💸 Number Hallucination (millions format – still imperfect)

"Masa cuma buat beli kursi kantor aja harus bayar Rp 2.500.000,-?! Gila sih itu!"
🎧 Listen on vocaroo ⚠️ Reading large numbers (like millions) is still inaccurate due to limited examples in the training dataset.

🤝 License & Usage This model is released for non-commercial use only. Feel free to explore, fine-tune, or give feedback!

PapaRazi
/

Ijazah_Palsu_V2