Description
Website: https://moira-ai.com/
Email: [email protected]
Report: https://moiraai2024.github.io/GreekTTS-demo/
Welcome to Moira.AI GreekTTS, a state-of-the-art text-to-speech model fine-tuned specifically for Greek language synthesis! This model is built on the powerful sesame/csm-1b architecture, which has been fine-tuned with Greek speech data to provide high-quality, natural-sounding speech generation.
Moira.AI excels in delivering lifelike, expressive speech, making it ideal for a wide range of applications, including virtual assistants, audiobooks, accessibility tools, and more. By leveraging the power of large-scale transformer-based models, Moira.AI ensures fluid prosody and accurate pronunciation of Greek text.
Key Features:
- Fine-tuned specifically for Greek TTS.
- Built on the robust sesame/csm-1b model, ensuring high-quality performance.
- Capable of generating natural-sounding, expressive Greek speech.
- Ideal for integration into applications requiring high-quality, human-like text-to-speech synthesis in Greek.
- Explore the model and see how it can enhance your Greek TTS applications!
How to use it
https://docs.unsloth.ai/get-started/install-and-update/conda-install
conda create --name unsloth_env \
python=3.11 \
pytorch-cuda=12.1 \
pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \
-y
conda activate unsloth_env
pip install unsloth
from unsloth import FastModel
from transformers import CsmForConditionalGeneration
import torch
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")
from unsloth import FastLanguageModel as FastModel
from peft import PeftModel
from IPython.display import Audio
# --- 1. Load the Base Unsloth Model and Processor ---
# This setup must be identical to your training script.
print("Loading the base model and processor...")
model, processor = FastModel.from_pretrained(
model_name = "unsloth/csm-1b",
max_seq_length = 2048,
dtype = None,
auto_model = CsmForConditionalGeneration,
load_in_4bit = False,
)
# --- 2. Identify and Load Your Best LoRA Checkpoint ---
# !!! IMPORTANT: Change this path to your best checkpoint folder !!!
# (The one you found in trainer_state.json)
int_check = 30_000
final_int =94_764
best_checkpoint_path = "./training_outputs_second_run/checkpoint-"+str(final_int)
print(f"\nLoading and merging the LoRA adapter from: {best_checkpoint_path}")
# This command seamlessly merges your trained adapter weights onto the base model
model = PeftModel.from_pretrained(model, best_checkpoint_path)
print("\nFine-tuned model is ready for inference!")
# Unsloth automatically handles moving the model to the GPU
from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained("unsloth/csm-1b")
greek_sentences = [
"Σου μιλάααανε!",
"Γεια σας, είμαι η Μίρα και σήμερα θα κάνουμε μάθημα Ελληνικων.",
"Ημουν εξω με φιλους και τα επινα. Μου αρεσει πολυ η μπυρα αλφα!",
"Όταν ξανά άνοιξα τα μάτια διαπίστωσα ότι ήμουν ξαπλωμένος σε ένα μαλακό στρώμα από κουβέρτες",
]
from IPython.display import Audio, display
import soundfile as sf
# --- Configure the Generation ---
int_ = 1
text_to_synthesize = greek_sentences[int_]
print(f"\nSynthesizing text: '{text_to_synthesize}'")
speaker_id = 0
inputs = processor(f"[{speaker_id}]{text_to_synthesize}", add_special_tokens=True).to("cuda")
audio_values = model.generate(
**inputs,
max_new_tokens=125, # 125 tokens is 10 seconds of audio, for longer speech increase this
# play with these parameters to tweak results
# depth_decoder_top_k=0,
# depth_decoder_top_p=0.9,
# depth_decoder_do_sample=True,
# depth_decoder_temperature=0.9,
# top_k=0,
# top_p=1.0,
# temperature=0.9,
# do_sample=True,
#########################################################
output_audio=True
)
audio = audio_values[0].to(torch.float32).cpu().numpy()
sf.write("example_without_context.wav", audio, 24000)
display(Audio(audio, rate=24000))