---
license: cc-by-nc-4.0
datasets:
- mesolitica/Malaysian-Emilia
language:
- ms
- en
base_model:
- SWivid/F5-TTS
---

# Full Parameter Finetuning Malaysian Emilia F5-TTS v3

Continue training from [SWivid/F5-TTS](https://huggingface.co/SWivid/F5-TTS) `F5TTS_v1_Base` checkpoint on [Malaysian-Emilia](https://huggingface.co/datasets/mesolitica/Malaysian-Emilia),

with total 15631 hours included 600 hours Mandarin sampled from [amphion/Emilia-Dataset](https://huggingface.co/datasets/amphion/Emilia-Dataset).

## Checkpoints

We uploaded full checkpoints with optimizer states at [checkpoints](checkpoints).

## How to

You can use Gradio app from official F5-TTS,

```bash
git clone https://github.com/SWivid/F5-TTS
cd F5-TTS
GRADIO_SERVER_NAME="0.0.0.0" python3 src/f5_tts/infer/infer_gradio.py
```

After that, use `hf://mesolitica/Malaysian-F5-TTS-v3/checkpoints/model_220000.pt` in custom model path,

![image/png](https://cdn-uploads.huggingface.co/production/uploads/5e73316106936008a9ee6523/FawchV-L4e9PZjAlJxtKP.png)

- The model able to generate filler such as `erm`, `uhm` if the reference speaker also has the filler.
- The model able to generate emotion representation if the reference speaker also has the same emotion.
- The model able to generate multi-lingual (malay, local english and mainland Mandarin) plus context switching even though the reference speaker is mono-speaker.

## Dataset

We train on postfilter [Malaysian-Emilia](https://huggingface.co/datasets/mesolitica/Malaysian-Emilia) called [Malaysian-Voice-Conversion](https://huggingface.co/datasets/mesolitica/Malaysian-Voice-Conversion)

## Source code

All source code at https://github.com/mesolitica/malaya-speech/tree/master/session/f5-tts