--- license: cc-by-nc-nd-4.0 pipeline_tag: text-to-speech --- # Spark TTS Vietnamese Spark-TTS is an advanced text-to-speech system that uses the power of large language models (LLM) for highly accurate and natural-sounding voice synthesis. It is designed to be efficient, flexible, and powerful for both research and production use. This model is trained from [viVoice](https://huggingface.co/datasets/thinhlpg/viVoice) vietnamese dataset # Usage First, install the required packages: ``` pip install --upgrade transformers accelerate ``` ## Text-to-Speech We have customized the code so you can inference using the huggingface transformer library without installing anything else. ```python from transformers import AutoProcessor, AutoModel, AutoTokenizer import soundfile as sf import torch import numpy as np device = "cuda" model_id = "DragonLineageAI/Vi-SparkTTS-0.5B" processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) model = AutoModel.from_pretrained(model_id, trust_remote_code=True).eval() processor.model = model prompt_audio_path = "path_to_audio_path" # CHANGE TO YOUR ACTUAL PATH prompt_transcript = "text corresponding to prompt audio" # Optional text_input = "xin chào mọi người chúng tôi là Nguyễn Công Tú Anh và Chu Văn An đến từ dragonlineageai" inputs = processor( text=text_input.lower(), prompt_speech_path=prompt_audio_path, prompt_text=prompt_transcript, return_tensors="pt" ).to(device) global_tokens_prompt = inputs.pop("global_token_ids_prompt", None) with torch.no_grad(): output_ids = model.generate( **inputs, max_new_tokens=3000, do_sample=True, temperature=0.8, top_k=50, top_p=0.95, eos_token_id=processor.tokenizer.eos_token_id, pad_token_id=processor.tokenizer.pad_token_id ) output_clone = processor.decode( generated_ids=output_ids, global_token_ids_prompt=global_tokens_prompt, input_ids_len=inputs["input_ids"].shape[-1] ) sf.write("output_cloned.wav", output_clone["audio"], output_clone["sampling_rate"]) ``` ## Fintune You can finetune this model with any dataset to improve quality or train on a new language. [training code](https://github.com/tuanh123789/Spark-TTS-finetune)