Following this guide with exceptions https://rentry.org/GPT-SoVITS-guide
I used the latest git pull from https://github.com/RVC-Boss/GPT-SoVITS/
I needed to put:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/cudann/lib/
in my shell.
put the pth file in your SoVITS_weights_v2 folder and the ckpt in GPT_weights_v2
make both Language for Reference audio and Inference text language "Japanese", set slicing to "Slice by every punct".
the underlying model was trained on audio that was brickwall dynamic range compressed, like that commonly found in visual novels or video games.
you should be able to give it a CLEAN AND NOISE/MUSIC/STATIC free FEMALE Japanese voice clip from 3-10 seconds, give it a 100% ACCURATE transcription and get ok results out the other side.
I have found that results can be improved applying post-generation noise reduction and some treble boosting EQ processing. Audacity works well enough for this, since there isn't anything in the official Gradio interface.
Feel free to keep everything else at the deaults
If you want to start the inference engine auomatically, you can use do something like
python3 /path/to/GPT_SoVITS/inference_webui.py "Auto"
If you isolate it ala https://rentry.org/IsolatedLinuxWebService and put nginx in front of it with an ssl cert, you need something like this in the location block:
proxy_pass http://127.0.0.1:9872/;
proxy_buffering off;
proxy_redirect off;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
client_max_body_size 500M;
proxy_set_header X-Forwarded-Proto $scheme;
add_header 'Content-Security-Policy' 'upgrade-insecure-requests';