--- tags: - text-to-speech - vietnamese - ai-model - deep-learning license: cc-by-nc-sa-4.0 library_name: pytorch datasets: - VLSP2021 - VLSP2022 - VLSP2023 - vietTTS - UEH model_name: ZipVoice-Vietnamese-150h language: vi --- # ๐Ÿ›‘ Important Note โš ๏ธ This model is only intended for **research purposes**. **Access requests must be made using an institutional, academic, or corporate email**. Requests from public email providers will be denied. We appreciate your understanding. # ๐ŸŽ™๏ธ ZipVoice-Vietnamese-150h ZipVoice is a series of fast and high-quality zero-shot TTS models based on flow matching. Key features: 1. Small and fast: only 123M parameters. 2. High-quality voice cloning: state-of-the-art performance in speaker similarity, intelligibility, and naturalness. 3. Multi-lingual: support Chinese and English. 4. Multi-mode: support both single-speaker and dialogue speech generation. This checkpoint is a compact fine-tuned version of ZipVoice trained on 150 hours of Vietnamese speech. ๐Ÿ”— For more fine-tuning and inference experiments, visit: https://github.com/k2-fsa/ZipVoice. ๐Ÿ“œ **License:** [CC-BY-NC-SA-4.0](https://spdx.org/licenses/CC-BY-NC-SA-4.0) โ€” Non-commercial research use only. --- ## ๐Ÿ“Œ Model Details - **Dataset:** VLSP 2021, VLSP 2022, VLSP 2023, VietTTS, TeacherDinh-UEH and some speech sources from YouTube channels. - **Total dataset durations:** 150 hours - **Data processing Technique:** - Remove all music background from audios, using facebook demucs model: https://github.com/facebookresearch/demucs - Do not use audio files shorter than 1 second or longer than 30 seconds. - Keep the default punctuation marks unchanged. - Normalize to lowercase format. - **Training Configuration:** - **Base Model:** ZipVoice with espeak-ng vi for tokenizer - **GPU:** RTX 3090 - **Batch Siz:** Max duration 200 - **Training Progress:** Stopped at **96,000 steps at epoch 30** --- ## ๐Ÿ›‘ Update Note Thank you, Teacher ฤแป‹nh from the University of Economics Ho Chi Minh City (UEH), for providing me with an additional 50-hours high-quality labeled dataset. Him contact: https://www.facebook.com/luudinhit93