# Pretrained models | Model | Download link | File size | | -------- | ------- | ------- | | Speech synthesis model, based on MMAudio small 16kHz | v2c_s16.pt | 1.3G | | Speech synthesis model, based on MMAudio small 44.1kHz | v2c_s44.pt | 1.3G | | Speech synthesis model, based on MMAudio medium 44.1kHz | v2c_m44.pt | 1.3G | | Speech synthesis model, based on MMAudio large 44.1kHz | v2c_l44.pt | 1.3G | | MMAduio, small 16kHz | mmaudio_small_16k.pth | 601M | | MMAduio, small 44.1kHz | mmaudio_small_44k.pth | 601M | | MMAduio, medium 44.1kHz | mmaudio_medium_44k.pth | 2.4G | | MMAduio, large 44.1kHz | mmaudio_large_44k.pth | 3.9G | | MMAduio, large 44.1kHz, v2 | mmaudio_large_44k_v2.pth | 3.9G | | 16kHz VAE | v1-16.pth | 655M | | 16kHz BigVGAN vocoder (from Make-An-Audio 2) |best_netG.pt | 429M | | 44.1kHz VAE |v1-44.pth | 1.2G | | Synchformer visual encoder |synchformer_state_dict.pth | 907M | | Whisper model for WER evaluation | faster-whisper-large-v3 | 2.9G | | WavLM model for SIM-O evaluation | wavlm_large_finetune.pth | 1.2G | The expected directory structure: ```bash F5-TTS ├── ckpts │ ├── v2c │ │ ├── v2c_s16.pt │ │ ├── v2c_s44.pt │ │ ├── v2c_m44.pt │ │ └── v2c_l44.pt │ ├── faster-whisper-large-v3 │ └── wavlm_large_finetune.pth └── ... MMAudio ├── ext_weights │ ├── best_netG.pt │ ├── synchformer_state_dict.pth │ ├── v1-16.pth │ └── v1-44.pth ├── weights │ ├── mmaudio_small_16k.pth │ ├── mmaudio_small_44k.pth │ ├── mmaudio_medium_44k.pth │ ├── mmaudio_large_44k.pth │ └── mmaudio_large_44k_v2.pth └── ... ```