File size: 3,290 Bytes
99bbd30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# Pretrained models

| Model    | Download link | File size |
| -------- | ------- | ------- |
| Speech synthesis model, based on MMAudio small 16kHz | <a href="https://huggingface.co" download="v2c_s16.pt">v2c_s16.pt</a> | 1.3G |

| Speech synthesis model, based on MMAudio small 44.1kHz | <a href="https://huggingface.co" download="v2c_s44.pt">v2c_s44.pt</a> | 1.3G |
| Speech synthesis model, based on MMAudio medium 44.1kHz | <a href="https://huggingface.co" download="v2c_m44.pt">v2c_m44.pt</a> | 1.3G |

| Speech synthesis model, based on MMAudio large 44.1kHz | <a href="https://huggingface.co" download="v2c_l44.pt">v2c_l44.pt</a> | 1.3G |
| MMAduio, small 16kHz | <a href="https://huggingface.co/hkchengrex/MMAudio/resolve/main/weights/mmaudio_small_16k.pth" download="mmaudio_small_16k.pth">mmaudio_small_16k.pth</a> | 601M |
| MMAduio, small 44.1kHz | <a href="https://huggingface.co/hkchengrex/MMAudio/resolve/main/weights/mmaudio_small_44k.pth" download="mmaudio_small_44k.pth">mmaudio_small_44k.pth</a> | 601M |
| MMAduio, medium 44.1kHz | <a href="https://huggingface.co/hkchengrex/MMAudio/resolve/main/weights/mmaudio_medium_44k.pth" download="mmaudio_medium_44k.pth">mmaudio_medium_44k.pth</a> | 2.4G |
| MMAduio, large 44.1kHz | <a href="https://huggingface.co/hkchengrex/MMAudio/resolve/main/weights/mmaudio_large_44k.pth" download="mmaudio_large_44k.pth">mmaudio_large_44k.pth</a> | 3.9G |
| MMAduio, large 44.1kHz, v2 | <a href="https://huggingface.co/hkchengrex/MMAudio/resolve/main/weights/mmaudio_large_44k_v2.pth" download="mmaudio_large_44k_v2.pth">mmaudio_large_44k_v2.pth</a> | 3.9G |

| 16kHz VAE | <a href="https://github.com/hkchengrex/MMAudio/releases/download/v0.1/v1-16.pth">v1-16.pth</a> | 655M |

| 16kHz BigVGAN vocoder (from Make-An-Audio 2) |<a href="https://github.com/hkchengrex/MMAudio/releases/download/v0.1/best_netG.pt">best_netG.pt</a> | 429M |
| 44.1kHz VAE |<a href="https://github.com/hkchengrex/MMAudio/releases/download/v0.1/v1-44.pth">v1-44.pth</a> | 1.2G | 
| Synchformer visual encoder |<a href="https://github.com/hkchengrex/MMAudio/releases/download/v0.1/synchformer_state_dict.pth">synchformer_state_dict.pth</a> | 907M |
| Whisper model for WER evaluation | <a href="https://huggingface.co/Systran/faster-whisper-large-v3" download="faster-whisper-large-v3">faster-whisper-large-v3</a> | 2.9G |
| WavLM model for SIM-O evaluation | <a href="https://drive.google.com/file/d/1-aE1NfzpRCLxA4GUxX9ITI3F9LlbtEGP/view" download="wavlm_large_finetune.pth">wavlm_large_finetune.pth</a> | 1.2G |


The expected directory structure:

```bash

F5-TTS

β”œβ”€β”€ ckpts

β”‚   β”œβ”€β”€ v2c

β”‚   β”‚   β”œβ”€β”€ v2c_s16.pt

β”‚   β”‚   β”œβ”€β”€ v2c_s44.pt

β”‚   β”‚   β”œβ”€β”€ v2c_m44.pt

β”‚   β”‚   └── v2c_l44.pt

β”‚   β”œβ”€β”€ faster-whisper-large-v3

β”‚   └── wavlm_large_finetune.pth

└── ...

MMAudio

β”œβ”€β”€ ext_weights

β”‚   β”œβ”€β”€ best_netG.pt

β”‚   β”œβ”€β”€ synchformer_state_dict.pth

β”‚   β”œβ”€β”€ v1-16.pth

β”‚   └── v1-44.pth

β”œβ”€β”€ weights

β”‚   β”œβ”€β”€ mmaudio_small_16k.pth

β”‚   β”œβ”€β”€ mmaudio_small_44k.pth

β”‚   β”œβ”€β”€ mmaudio_medium_44k.pth

β”‚   β”œβ”€β”€ mmaudio_large_44k.pth

β”‚   └── mmaudio_large_44k_v2.pth

└── ...

```