DeepAudio-V1 / MODELS.md
lshzhm's picture
init commit
99bbd30 verified

A newer version of the Gradio SDK is available: 5.23.3

Upgrade

Pretrained models

Model Download link File size
Speech synthesis model, based on MMAudio small 16kHz v2c_s16.pt 1.3G
Speech synthesis model, based on MMAudio small 44.1kHz v2c_s44.pt 1.3G
Speech synthesis model, based on MMAudio medium 44.1kHz v2c_m44.pt 1.3G
Speech synthesis model, based on MMAudio large 44.1kHz v2c_l44.pt 1.3G
MMAduio, small 16kHz mmaudio_small_16k.pth 601M
MMAduio, small 44.1kHz mmaudio_small_44k.pth 601M
MMAduio, medium 44.1kHz mmaudio_medium_44k.pth 2.4G
MMAduio, large 44.1kHz mmaudio_large_44k.pth 3.9G
MMAduio, large 44.1kHz, v2 mmaudio_large_44k_v2.pth 3.9G
16kHz VAE v1-16.pth 655M
16kHz BigVGAN vocoder (from Make-An-Audio 2) best_netG.pt 429M
44.1kHz VAE v1-44.pth 1.2G
Synchformer visual encoder synchformer_state_dict.pth 907M
Whisper model for WER evaluation faster-whisper-large-v3 2.9G
WavLM model for SIM-O evaluation wavlm_large_finetune.pth 1.2G

The expected directory structure:

F5-TTS
β”œβ”€β”€ ckpts
β”‚   β”œβ”€β”€ v2c
β”‚   β”‚   β”œβ”€β”€ v2c_s16.pt
β”‚   β”‚   β”œβ”€β”€ v2c_s44.pt
β”‚   β”‚   β”œβ”€β”€ v2c_m44.pt
β”‚   β”‚   └── v2c_l44.pt
β”‚   β”œβ”€β”€ faster-whisper-large-v3
β”‚   └── wavlm_large_finetune.pth
└── ...
MMAudio
β”œβ”€β”€ ext_weights
β”‚   β”œβ”€β”€ best_netG.pt
β”‚   β”œβ”€β”€ synchformer_state_dict.pth
β”‚   β”œβ”€β”€ v1-16.pth
β”‚   └── v1-44.pth
β”œβ”€β”€ weights
β”‚   β”œβ”€β”€ mmaudio_small_16k.pth
β”‚   β”œβ”€β”€ mmaudio_small_44k.pth
β”‚   β”œβ”€β”€ mmaudio_medium_44k.pth
β”‚   β”œβ”€β”€ mmaudio_large_44k.pth
β”‚   └── mmaudio_large_44k_v2.pth
└── ...