Pendrokar/xvapitch · Hugging Face

GitHub project, inference Windows/Electron app: https://github.com/DanRuta/xVA-Synth

Fine-tuning app: https://github.com/DanRuta/xva-trainer

The base model for training other 🤗 xVASynth's "xVAPitch" type models (v3). Model itself is used by the xVATrainer TTS model training app and not for inference. All created by Dan "@dr00392" Ruta.

The v3 model now uses a slightly custom tweaked VITS/YourTTS model. Tweaks including larger capacity, bigger lang embedding, custom symbol set (a custom spec of ARPAbet with some more phonemes to cover other languages), and I guess a different training script. - Dan Ruta

When used in xVASynth editor, it is an American Adult Male voice. Default pacing is too fast and has to be adjusted.

xVAPitch_5820651 model sample:

There are hundreds of fine-tuned models on the web. But most of them use non-permissive datasets.

xVASynth Editor v3 walkthrough video ▶:

xVATrainer v1 walkthrough video ▶:

Papers:

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech - https://arxiv.org/abs/2106.06103
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone - https://arxiv.org/abs/2112.02418

Referenced papers within code:

Multi-head attention with Relative Positional embedding - https://arxiv.org/pdf/1809.04281.pdf
Transformer with Relative Potional Encoding- https://arxiv.org/abs/1803.02155
SDP - https://arxiv.org/pdf/2106.06103.pdf
Spline Flow - https://arxiv.org/abs/1906.04032

Used datasets: Unknown/Non-permissiable data

Pendrokar
/

xvapitch

xVASynth Editor v3 walkthrough video ▶:

xVATrainer v1 walkthrough video ▶:

Model tree for Pendrokar/xvapitch