File size: 4,799 Bytes

---
base_model: unsloth/csm-1b
pipeline_tag: text-to-speech
tags:
- base_model:adapter:unsloth/csm-1b
- lora
- transformers
- unsloth
license: apache-2.0
language:
- el
new_version: moiraai2024/GreekTTS-1.5
---


# Description
Website: https://moira-ai.com/

Email: [email protected]

Report: https://moiraai2024.github.io/GreekTTS-demo/

Welcome to Moira.AI GreekTTS, a state-of-the-art text-to-speech model fine-tuned specifically for Greek language synthesis! This model is built on the powerful sesame/csm-1b architecture, which has been fine-tuned with Greek speech data to provide high-quality, natural-sounding speech generation.

Moira.AI excels in delivering lifelike, expressive speech, making it ideal for a wide range of applications, including virtual assistants, audiobooks, accessibility tools, and more. By leveraging the power of large-scale transformer-based models, Moira.AI ensures fluid prosody and accurate pronunciation of Greek text.

Key Features:

- Fine-tuned specifically for Greek TTS.
- Built on the robust sesame/csm-1b model, ensuring high-quality performance.
- Capable of generating natural-sounding, expressive Greek speech.
- Ideal for integration into applications requiring high-quality, human-like text-to-speech synthesis in Greek.

**Explore the model and see how it can enhance your Greek TTS applications!**


# How to use it
https://docs.unsloth.ai/get-started/install-and-update/conda-install


```python
conda create --name unsloth_env \
    python=3.11 \
    pytorch-cuda=12.1 \
    pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \
    -y
```

```
conda activate unsloth_env
```
```
pip install unsloth
```

```python
from unsloth import FastModel
from transformers import CsmForConditionalGeneration
import torch

gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

from unsloth import FastLanguageModel as FastModel
from peft import PeftModel
from IPython.display import Audio

# --- 1. Load the Base Unsloth Model and Processor ---
# This setup must be identical to your training script.
print("Loading the base model and processor...")
model, processor = FastModel.from_pretrained(
    model_name = "unsloth/csm-1b",
    max_seq_length = 2048,
    dtype = None,
    auto_model = CsmForConditionalGeneration,
    load_in_4bit = False,
)

# --- 2. Identify and Load Your Best LoRA Checkpoint ---
# !!! IMPORTANT: Change this path to your best checkpoint folder !!!
# (The one you found in trainer_state.json)
int_check = 30_000

final_int =94_764
best_checkpoint_path = "./training_outputs_second_run/checkpoint-"+str(final_int) 

print(f"\nLoading and merging the LoRA adapter from: {best_checkpoint_path}")

# This command seamlessly merges your trained adapter weights onto the base model
model = PeftModel.from_pretrained(model, best_checkpoint_path)

print("\nFine-tuned model is ready for inference!")
# Unsloth automatically handles moving the model to the GPU
```

```python
from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained("unsloth/csm-1b")
```

```python
greek_sentences = [
    "Σου μιλάααανε!",
    "Γεια σας, είμαι η Μίρα και σήμερα θα κάνουμε μάθημα Ελληνικων.",
    "Ημουν εξω με φιλους και τα επινα. Μου αρεσει πολυ η μπυρα αλφα!",
    "Όταν ξανά άνοιξα τα μάτια διαπίστωσα ότι ήμουν ξαπλωμένος σε ένα μαλακό στρώμα από κουβέρτες",
]
```

```python
from IPython.display import Audio, display
import soundfile as sf
```

```python
# --- Configure the Generation ---

int_ = 1
text_to_synthesize = greek_sentences[int_]

print(f"\nSynthesizing text: '{text_to_synthesize}'")

speaker_id = 0
inputs = processor(f"[{speaker_id}]{text_to_synthesize}", add_special_tokens=True).to("cuda")

audio_values = model.generate(
    **inputs,
    max_new_tokens=125, # 125 tokens is 10 seconds of audio, for longer speech increase this
    # play with these parameters to tweak results
    # depth_decoder_top_k=0,
    # depth_decoder_top_p=0.9,
    # depth_decoder_do_sample=True,
    # depth_decoder_temperature=0.9,
    # top_k=0,
    # top_p=1.0,
    # temperature=0.9,
    # do_sample=True,
    #########################################################
    output_audio=True
)
```

```python
audio = audio_values[0].to(torch.float32).cpu().numpy()
sf.write("example_without_context.wav", audio, 24000)
display(Audio(audio, rate=24000))
```