free-svc / README.md

Update README.md

ab1387c verified 3 months ago

4.55 kB

	---
	license: cc-by-nc-sa-4.0
	language:
	- en
	- pt
	- es
	- zh
	- nl
	- fr
	- de
	- it
	- ja
	- pl
	pipeline_tag: audio-to-audio
	tags:
	- audio
	- voice
	- voice conversion
	- singing voice conversion
	- vc
	- svc
	- multilingual
	---

	# FreeSVC: Zero-shot Multilingual Singing Voice Conversion

	FreeSVC is a promising multilingual zero-shot singing voice conversion model. It enables the conversion of singing voices across languages without the need for extensive language-specific training. [GitHub repository](https://github.com/freds0/free-svc). [Paper arXiv pre-print](https://arxiv.org/abs/2501.05586).

	## Supported Languages

	\| Language \| ID \| Status \| Speech Data \| Singing Data \|
	\|------------\|-----\|--------------\|-------------\|--------------\|
	\| Chinese \| 0 \| ✅ Full \| 255h \| 70h \|
	\| Dutch \| 1 \| ✅ Full \| Part of CML \| - \|
	\| English \| 2 \| ✅ Full \| 921h \| 47h \|
	\| French \| 3 \| ✅ Full \| Part of CML \| - \|
	\| German \| 4 \| ✅ Full \| Part of CML \| - \|
	\| Italian \| 5 \| ✅ Full \| Part of CML \| - \|
	\| Japanese \| 6 \| ✅ Full \| 30h \| - \|
	\| Other* \| 7 \| ⚠️ Partial \| - \| 10h \|
	\| Polish \| 8 \| ✅ Full \| Part of CML \| - \|
	\| Portuguese \| 9 \| ✅ Full \| Part of CML \| - \|
	\| Spanish \| 10 \| ✅ Full \| Part of CML \| - \|

	*Note: The "Other" category is used for vocal techniques without content.

	## Model Overview
	FreeSVC leverages an enhanced VITS architecture integrated with Speaker-invariant Clustering (SPIN) and the ECAPA2 speaker encoder. This combination effectively separates speaker characteristics from linguistic content, ensuring high-quality and natural-sounding voice conversions across multiple languages.

	## Training Datasets

	FreeSVC was trained on a diverse set of speech and singing datasets covering multiple languages:

	\| Dataset \| Hours \| Language \| Type \|
	\|----------------------\|------------\|--------------\|--------------\|
	\| AISHELL-1 \| 170h \| Chinese \| Speech \|
	\| AISHELL-3 \| 85h \| Chinese \| Speech \|
	\| CML-TTS \| 3.1k \| 7 Languages \| Speech \|
	\| HiFiTTS \| 292h \| English \| Speech \|
	\| JVS \| 30h \| Japanese \| Speech \|
	\| LibriTTS-R \| 585h \| English \| Speech \|
	\| NUS (NHSS) \| 7h \| English \| Speech, Singing \|
	\| OpenSinger \| 50h \| Chinese \| Singing \|
	\| Opencpop \| 5h \| Chinese \| Singing \|
	\| PopBuTFy \| 10h, 40h \| Chinese, English \| Singing \|
	\| POPCS \| 5h \| Chinese \| Singing \|
	\| VCTK \| 44h \| English \| Speech \|
	\| VocalSet \| 10h \| Other \| Singing \|

	## License

	FreeSVC is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license. This means:

	- The model can only be used for research and non-commercial purposes. Any commercial use is strictly prohibited.
	- Any derivative works must be shared under the same license.
	- Proper attribution must be given when using the model.

	Users must also comply with the licenses of the original datasets used for training. Some datasets may have additional restrictions beyond CC BY-NC-SA 4.0. Ensure you review and adhere to their terms before using the model.

	For full details, refer to the [CC BY-NC-SA 4.0 License](https://creativecommons.org/licenses/by-nc-sa/4.0/).

	## Citation
	```
	@INPROCEEDINGS{10890068,
	author={Ferreira, Alef Iury and Gris, Lucas Rafael and Da Rosa, Augusto and Oliveira, Frederico and Casanova, Edresson and Sousa, Rafael and Junior, Arnaldo and Soares, Anderson and Filho, Arlindo Galvão},
	booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
	title={FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion},
	year={2025},
	volume={},
	number={},
	pages={1-5},
	keywords={Training;Source coding;Zero shot learning;Refining;Signal processing;Data models;Acoustics;Multilingual;Data mining;Speech synthesis;Singing Voice Conversion;Synthesis of Singing Voices;Cross-lingual and multilingual aspects in speech synthesis},
	doi={10.1109/ICASSP49660.2025.10890068}}
	```