--- license: cc-by-nc-4.0 datasets: - mesolitica/Malaysian-Emilia language: - ms - en base_model: - SWivid/F5-TTS --- # Full Parameter Finetuning Malaysian Emilia F5-TTS v3 Continue training from [SWivid/F5-TTS](https://huggingface.co/SWivid/F5-TTS) `F5TTS_v1_Base` checkpoint on [Malaysian-Emilia](https://huggingface.co/datasets/mesolitica/Malaysian-Emilia), with total 15631 hours included 600 hours Mandarin sampled from [amphion/Emilia-Dataset](https://huggingface.co/datasets/amphion/Emilia-Dataset). ## Checkpoints We uploaded full checkpoints with optimizer states at [checkpoints](checkpoints). ## How to You can use Gradio app from official F5-TTS, ```bash git clone https://github.com/SWivid/F5-TTS cd F5-TTS GRADIO_SERVER_NAME="0.0.0.0" python3 src/f5_tts/infer/infer_gradio.py ``` After that, use `hf://mesolitica/Malaysian-F5-TTS-v3/checkpoints/model_220000.pt` in custom model path, ![image/png](https://cdn-uploads.huggingface.co/production/uploads/5e73316106936008a9ee6523/FawchV-L4e9PZjAlJxtKP.png) - The model able to generate filler such as `erm`, `uhm` if the reference speaker also has the filler. - The model able to generate emotion representation if the reference speaker also has the same emotion. - The model able to generate multi-lingual (malay, local english and mainland Mandarin) plus context switching even though the reference speaker is mono-speaker. ## Dataset We train on postfilter [Malaysian-Emilia](https://huggingface.co/datasets/mesolitica/Malaysian-Emilia) called [Malaysian-Voice-Conversion](https://huggingface.co/datasets/mesolitica/Malaysian-Voice-Conversion) ## Source code All source code at https://github.com/mesolitica/malaya-speech/tree/master/session/f5-tts