--- license: cc-by-nc-4.0 language: - zh - en base_model: - meta-llama/Llama-3.2-3B-Instruct tags: - Text-to-Speech pipeline_tag: text-to-speech --- [![arXiv](https://img.shields.io/badge/arXiv-Paper-.svg)](https://arxiv.org/abs/2502.04128) **Update (2025-02-13):** Add [Llasa finetune instruction](https://github.com/zhenye234/LLaSA_training/tree/main/finetune). These models are not mentioned in the original paper, they are essentially the same as LLaSA 1B and LLaSA 3B, except they have been fine-tuned with a mixed speech and text SFT dataset, which enables the model to retain text-based conversational abilities. LLaSA: Scaling Train-Time and Inference-Time Compute for LLaMA-based Speech Synthesis