--- license: apache-2.0 datasets: - amphion/Emilia-Dataset language: - en - zh tags: - text-to-speech --- # ZipVoice⚔: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching ## 1. Explanation of each directory | Directory | Model Type | Training Data | Initialized from | | :---------------------------- | :-----------------------: | :-------------------------------: | :------------------------: | | zipvoice | ZipVoice | Emilia | - | | zipvoice_libritts | ZipVoice | LibriTTS | - | | zipvoice_distill | ZipVoice-Distill | Emilia | zipvoice/model.pt | | zipvoice_distill_libritts | ZipVoice-Distill | LibriTTS | zipvoice_libritts/model.pt | | zipvoice_dialog | ZipVoice-Dialog | OpenDialog + in-house dataset | zipvoice/model.pt | | zipvoice_dialog_opendialog | ZipVoice-Dialog | OpenDialog | zipvoice/model.pt | | zipvoice_dialog_stereo | ZipVoice-Dialog-Stereo | in-house dataset | zipvoice_dialog/model.pt | ## 2. Github See our Github repository [ZipVoice](https://github.com/k2-fsa/ZipVoice) for details ## 3. Discussion & Communication You can directly discuss on [Github Issues](https://github.com/k2-fsa/ZipVoice/issues). You can also scan the QR code to join our wechat group or follow our wechat official account. | Wechat Group | Wechat Official Account | | ------------ | ----------------------- | |![wechat](https://k2-fsa.org/zh-CN/assets/pic/wechat_group.jpg) |![wechat](https://k2-fsa.org/zh-CN/assets/pic/wechat_account.jpg) | ## 4. Citation ```bibtex @article{zhu2025zipvoice, title={ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching}, author={Zhu, Han and Kang, Wei and Yao, Zengwei and Guo, Liyong and Kuang, Fangjun and Li, Zhaoqing and Zhuang, Weiji and Lin, Long and Povey, Daniel}, journal={arXiv preprint arXiv:2506.13053}, year={2025} } ```