Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.23.3
Pretrained models
Model | Download link | File size |
---|---|---|
Speech synthesis model, based on MMAudio small 16kHz | v2c_s16.pt | 1.3G |
Speech synthesis model, based on MMAudio small 44.1kHz | v2c_s44.pt | 1.3G |
Speech synthesis model, based on MMAudio medium 44.1kHz | v2c_m44.pt | 1.3G |
Speech synthesis model, based on MMAudio large 44.1kHz | v2c_l44.pt | 1.3G |
MMAduio, small 16kHz | mmaudio_small_16k.pth | 601M |
MMAduio, small 44.1kHz | mmaudio_small_44k.pth | 601M |
MMAduio, medium 44.1kHz | mmaudio_medium_44k.pth | 2.4G |
MMAduio, large 44.1kHz | mmaudio_large_44k.pth | 3.9G |
MMAduio, large 44.1kHz, v2 | mmaudio_large_44k_v2.pth | 3.9G |
16kHz VAE | v1-16.pth | 655M |
16kHz BigVGAN vocoder (from Make-An-Audio 2) | best_netG.pt | 429M |
44.1kHz VAE | v1-44.pth | 1.2G |
Synchformer visual encoder | synchformer_state_dict.pth | 907M |
Whisper model for WER evaluation | faster-whisper-large-v3 | 2.9G |
WavLM model for SIM-O evaluation | wavlm_large_finetune.pth | 1.2G |
The expected directory structure:
F5-TTS
βββ ckpts
β βββ v2c
β β βββ v2c_s16.pt
β β βββ v2c_s44.pt
β β βββ v2c_m44.pt
β β βββ v2c_l44.pt
β βββ faster-whisper-large-v3
β βββ wavlm_large_finetune.pth
βββ ...
MMAudio
βββ ext_weights
β βββ best_netG.pt
β βββ synchformer_state_dict.pth
β βββ v1-16.pth
β βββ v1-44.pth
βββ weights
β βββ mmaudio_small_16k.pth
β βββ mmaudio_small_44k.pth
β βββ mmaudio_medium_44k.pth
β βββ mmaudio_large_44k.pth
β βββ mmaudio_large_44k_v2.pth
βββ ...