FastPitch and HifiGan v2.0
v2.0 of phonemizer and tokenizer. tokenzier DO SUPPORT
pauses, emotion tokens etc,.
Install NeMo
apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython packaging
rm -rf /usr/lib/python3.10/site-packages/blinker*
rm -rf /usr/local/lib/python3.10/dist-packages/blinker*
pip install --ignore-installed blinker
pip install --upgrade --force-reinstall blinker
git clone https://github.com/SadeghKrmi/NeMo.git
cd NeMo
pip install -e '.[all]'
deterministic split
Run the deterministic-train-test-split.py to split the train/test
Extract the supportive data
using the following scripts, extract pitch statistics
tar -xzf dataset_splits.tar.gz
cd extract-supportive-data
HYDRA_FULL_ERROR=1 python3 ./scripts/extract_sup_data.py \
--config-path ../config/fastpitch/ \
--config-name ds_for_fastpitch_align.yaml \
manifest_filepath=./dataset_splits/train/train.jsonl \
sup_data_path=sup_data \
phoneme_dict_path=./persian-dict/persian-v4.0.dict \
++dataloader_params.num_workers=8
dataset sup pitch stats
PITCH_MEAN=98.72935485839844, PITCH_STD=29.40760040283203 PITCH_MIN=65.4063949584961, PITCH_MAX=2093.004638671875
zip and download
tar -czf sup_data.tar.gz sup_data
Training FastPitch
training for about 800 epochs, with CosineAnnealing sched. and max_steps
200,000 for lr to decay overtime.
val_loss didn't decrease lower that about 0.77xx
val_loss = mel_loss + dur_loss + pitch_loss + energy_loss
Training HiFiGAN
training for about 40 epochs, stoped the training based on quality checking by listening to audios
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support