---
license: cc-by-nc-sa-4.0
language:
- en
- pt
- es
- zh
- nl
- fr
- de
- it
- ja
- pl
pipeline_tag: audio-to-audio
tags:
- audio
- voice
- voice conversion
- singing voice conversion
- vc
- svc
- multilingual
---

# FreeSVC: Zero-shot Multilingual Singing Voice Conversion

**FreeSVC** is a promising multilingual zero-shot singing voice conversion model. It enables the conversion of singing voices across languages without the need for extensive language-specific training. [GitHub repository](https://github.com/freds0/free-svc). [Paper arXiv pre-print](https://arxiv.org/abs/2501.05586).

## Supported Languages

| Language    | ID  | Status       | Speech Data | Singing Data |
|------------|-----|--------------|-------------|--------------|
| Chinese    | 0   | ✅ Full      | 255h        | 70h        |
| Dutch      | 1   | ✅ Full      | Part of CML | -           |
| English    | 2   | ✅ Full      | 921h        | 47h         |
| French     | 3   | ✅ Full      | Part of CML | -           |
| German     | 4   | ✅ Full      | Part of CML | -           |
| Italian    | 5   | ✅ Full      | Part of CML | -           |
| Japanese   | 6   | ✅ Full      | 30h         | -           |
| Other*     | 7   | ⚠️ Partial   | -           | 10h         |
| Polish     | 8   | ✅ Full      | Part of CML | -           |
| Portuguese | 9   | ✅ Full      | Part of CML | -           |
| Spanish    | 10  | ✅ Full      | Part of CML | -           |

*Note: The "Other" category is used for vocal techniques without content.

## Model Overview
FreeSVC leverages an enhanced VITS architecture integrated with Speaker-invariant Clustering (SPIN) and the ECAPA2 speaker encoder. This combination effectively separates speaker characteristics from linguistic content, ensuring high-quality and natural-sounding voice conversions across multiple languages.

## Training Datasets

FreeSVC was trained on a diverse set of speech and singing datasets covering multiple languages:

| **Dataset**          | **Hours**  | **Language** | **Type**    |
|----------------------|------------|--------------|--------------|
| AISHELL-1            | 170h       | Chinese      | Speech      |
| AISHELL-3            | 85h        | Chinese      | Speech      |
| CML-TTS              | 3.1k       | 7 Languages  | Speech      |
| HiFiTTS              | 292h       | English      | Speech      |
| JVS                  | 30h        | Japanese     | Speech      |
| LibriTTS-R           | 585h       | English      | Speech      |
| NUS (NHSS)           | 7h         | English      | Speech, Singing        |
| OpenSinger           | 50h        | Chinese      | Singing     |
| Opencpop             | 5h         | Chinese      | Singing     |
| PopBuTFy             | 10h, 40h   | Chinese, English | Singing |
| POPCS                | 5h         | Chinese      | Singing     |
| VCTK                 | 44h        | English      | Speech      |
| VocalSet             | 10h        | Other      | Singing     |

## License

FreeSVC is released under the **Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)** license. This means:

- The model **can only be used for research and non-commercial purposes**. Any commercial use is strictly prohibited.
- Any derivative works must be **shared under the same license**.
- Proper attribution must be given when using the model.

Users must also **comply with the licenses of the original datasets** used for training. Some datasets may have additional restrictions beyond CC BY-NC-SA 4.0. Ensure you review and adhere to their terms before using the model.

For full details, refer to the [CC BY-NC-SA 4.0 License](https://creativecommons.org/licenses/by-nc-sa/4.0/).

## Citation
```
@INPROCEEDINGS{10890068,
  author={Ferreira, Alef Iury and Gris, Lucas Rafael and Da Rosa, Augusto and Oliveira, Frederico and Casanova, Edresson and Sousa, Rafael and Junior, Arnaldo and Soares, Anderson and Filho, Arlindo Galvão},
  booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion}, 
  year={2025},
  volume={},
  number={},
  pages={1-5},
  keywords={Training;Source coding;Zero shot learning;Refining;Signal processing;Data models;Acoustics;Multilingual;Data mining;Speech synthesis;Singing Voice Conversion;Synthesis of Singing Voices;Cross-lingual and multilingual aspects in speech synthesis},
  doi={10.1109/ICASSP49660.2025.10890068}}
```