File size: 4,799 Bytes
6b37498
f71f4fb
9995a73
f71f4fb
 
 
 
 
9995a73
 
 
d5b81d7
6b37498
 
f71f4fb
6cb23ce
ec94d6e
 
 
 
56cb3d8
 
6cb23ce
 
 
 
 
 
 
 
 
 
390479e
 
6cb23ce
 
 
 
 
a093335
 
6cb23ce
 
 
 
 
 
 
 
 
 
 
 
 
 
a093335
6cb23ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a093335
6cb23ce
 
 
 
a093335
6cb23ce
 
 
 
 
 
 
 
a093335
6cb23ce
 
 
 
a093335
6cb23ce
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a093335
6cb23ce
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
base_model: unsloth/csm-1b
pipeline_tag: text-to-speech
tags:
- base_model:adapter:unsloth/csm-1b
- lora
- transformers
- unsloth
license: apache-2.0
language:
- el
new_version: moiraai2024/GreekTTS-1.5
---


# Description
Website: https://moira-ai.com/

Email: [email protected]

Report: https://moiraai2024.github.io/GreekTTS-demo/

Welcome to Moira.AI GreekTTS, a state-of-the-art text-to-speech model fine-tuned specifically for Greek language synthesis! This model is built on the powerful sesame/csm-1b architecture, which has been fine-tuned with Greek speech data to provide high-quality, natural-sounding speech generation.

Moira.AI excels in delivering lifelike, expressive speech, making it ideal for a wide range of applications, including virtual assistants, audiobooks, accessibility tools, and more. By leveraging the power of large-scale transformer-based models, Moira.AI ensures fluid prosody and accurate pronunciation of Greek text.

Key Features:

- Fine-tuned specifically for Greek TTS.
- Built on the robust sesame/csm-1b model, ensuring high-quality performance.
- Capable of generating natural-sounding, expressive Greek speech.
- Ideal for integration into applications requiring high-quality, human-like text-to-speech synthesis in Greek.

**Explore the model and see how it can enhance your Greek TTS applications!**


# How to use it
https://docs.unsloth.ai/get-started/install-and-update/conda-install


```python
conda create --name unsloth_env \
    python=3.11 \
    pytorch-cuda=12.1 \
    pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \
    -y
```

```
conda activate unsloth_env
```
```
pip install unsloth
```

```python
from unsloth import FastModel
from transformers import CsmForConditionalGeneration
import torch

gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

from unsloth import FastLanguageModel as FastModel
from peft import PeftModel
from IPython.display import Audio

# --- 1. Load the Base Unsloth Model and Processor ---
# This setup must be identical to your training script.
print("Loading the base model and processor...")
model, processor = FastModel.from_pretrained(
    model_name = "unsloth/csm-1b",
    max_seq_length = 2048,
    dtype = None,
    auto_model = CsmForConditionalGeneration,
    load_in_4bit = False,
)

# --- 2. Identify and Load Your Best LoRA Checkpoint ---
# !!! IMPORTANT: Change this path to your best checkpoint folder !!!
# (The one you found in trainer_state.json)
int_check = 30_000

final_int =94_764
best_checkpoint_path = "./training_outputs_second_run/checkpoint-"+str(final_int) 

print(f"\nLoading and merging the LoRA adapter from: {best_checkpoint_path}")

# This command seamlessly merges your trained adapter weights onto the base model
model = PeftModel.from_pretrained(model, best_checkpoint_path)

print("\nFine-tuned model is ready for inference!")
# Unsloth automatically handles moving the model to the GPU
```

```python
from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained("unsloth/csm-1b")
```

```python
greek_sentences = [
    "Σου μιλάααανε!",
    "Γεια σας, είμαι η Μίρα και σήμερα θα κάνουμε μάθημα Ελληνικων.",
    "Ημουν εξω με φιλους και τα επινα. Μου αρεσει πολυ η μπυρα αλφα!",
    "Όταν ξανά άνοιξα τα μάτια διαπίστωσα ότι ήμουν ξαπλωμένος σε ένα μαλακό στρώμα από κουβέρτες",
]
```

```python
from IPython.display import Audio, display
import soundfile as sf
```

```python
# --- Configure the Generation ---

int_ = 1
text_to_synthesize = greek_sentences[int_]

print(f"\nSynthesizing text: '{text_to_synthesize}'")

speaker_id = 0
inputs = processor(f"[{speaker_id}]{text_to_synthesize}", add_special_tokens=True).to("cuda")

audio_values = model.generate(
    **inputs,
    max_new_tokens=125, # 125 tokens is 10 seconds of audio, for longer speech increase this
    # play with these parameters to tweak results
    # depth_decoder_top_k=0,
    # depth_decoder_top_p=0.9,
    # depth_decoder_do_sample=True,
    # depth_decoder_temperature=0.9,
    # top_k=0,
    # top_p=1.0,
    # temperature=0.9,
    # do_sample=True,
    #########################################################
    output_audio=True
)
```

```python
audio = audio_values[0].to(torch.float32).cpu().numpy()
sf.write("example_without_context.wav", audio, 24000)
display(Audio(audio, rate=24000))
```