File size: 4,799 Bytes
6b37498 f71f4fb 9995a73 f71f4fb 9995a73 d5b81d7 6b37498 f71f4fb 6cb23ce ec94d6e 56cb3d8 6cb23ce 390479e 6cb23ce a093335 6cb23ce a093335 6cb23ce a093335 6cb23ce a093335 6cb23ce a093335 6cb23ce a093335 6cb23ce a093335 6cb23ce |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
---
base_model: unsloth/csm-1b
pipeline_tag: text-to-speech
tags:
- base_model:adapter:unsloth/csm-1b
- lora
- transformers
- unsloth
license: apache-2.0
language:
- el
new_version: moiraai2024/GreekTTS-1.5
---
# Description
Website: https://moira-ai.com/
Email: [email protected]
Report: https://moiraai2024.github.io/GreekTTS-demo/
Welcome to Moira.AI GreekTTS, a state-of-the-art text-to-speech model fine-tuned specifically for Greek language synthesis! This model is built on the powerful sesame/csm-1b architecture, which has been fine-tuned with Greek speech data to provide high-quality, natural-sounding speech generation.
Moira.AI excels in delivering lifelike, expressive speech, making it ideal for a wide range of applications, including virtual assistants, audiobooks, accessibility tools, and more. By leveraging the power of large-scale transformer-based models, Moira.AI ensures fluid prosody and accurate pronunciation of Greek text.
Key Features:
- Fine-tuned specifically for Greek TTS.
- Built on the robust sesame/csm-1b model, ensuring high-quality performance.
- Capable of generating natural-sounding, expressive Greek speech.
- Ideal for integration into applications requiring high-quality, human-like text-to-speech synthesis in Greek.
**Explore the model and see how it can enhance your Greek TTS applications!**
# How to use it
https://docs.unsloth.ai/get-started/install-and-update/conda-install
```python
conda create --name unsloth_env \
python=3.11 \
pytorch-cuda=12.1 \
pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \
-y
```
```
conda activate unsloth_env
```
```
pip install unsloth
```
```python
from unsloth import FastModel
from transformers import CsmForConditionalGeneration
import torch
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")
from unsloth import FastLanguageModel as FastModel
from peft import PeftModel
from IPython.display import Audio
# --- 1. Load the Base Unsloth Model and Processor ---
# This setup must be identical to your training script.
print("Loading the base model and processor...")
model, processor = FastModel.from_pretrained(
model_name = "unsloth/csm-1b",
max_seq_length = 2048,
dtype = None,
auto_model = CsmForConditionalGeneration,
load_in_4bit = False,
)
# --- 2. Identify and Load Your Best LoRA Checkpoint ---
# !!! IMPORTANT: Change this path to your best checkpoint folder !!!
# (The one you found in trainer_state.json)
int_check = 30_000
final_int =94_764
best_checkpoint_path = "./training_outputs_second_run/checkpoint-"+str(final_int)
print(f"\nLoading and merging the LoRA adapter from: {best_checkpoint_path}")
# This command seamlessly merges your trained adapter weights onto the base model
model = PeftModel.from_pretrained(model, best_checkpoint_path)
print("\nFine-tuned model is ready for inference!")
# Unsloth automatically handles moving the model to the GPU
```
```python
from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained("unsloth/csm-1b")
```
```python
greek_sentences = [
"Σου μιλάααανε!",
"Γεια σας, είμαι η Μίρα και σήμερα θα κάνουμε μάθημα Ελληνικων.",
"Ημουν εξω με φιλους και τα επινα. Μου αρεσει πολυ η μπυρα αλφα!",
"Όταν ξανά άνοιξα τα μάτια διαπίστωσα ότι ήμουν ξαπλωμένος σε ένα μαλακό στρώμα από κουβέρτες",
]
```
```python
from IPython.display import Audio, display
import soundfile as sf
```
```python
# --- Configure the Generation ---
int_ = 1
text_to_synthesize = greek_sentences[int_]
print(f"\nSynthesizing text: '{text_to_synthesize}'")
speaker_id = 0
inputs = processor(f"[{speaker_id}]{text_to_synthesize}", add_special_tokens=True).to("cuda")
audio_values = model.generate(
**inputs,
max_new_tokens=125, # 125 tokens is 10 seconds of audio, for longer speech increase this
# play with these parameters to tweak results
# depth_decoder_top_k=0,
# depth_decoder_top_p=0.9,
# depth_decoder_do_sample=True,
# depth_decoder_temperature=0.9,
# top_k=0,
# top_p=1.0,
# temperature=0.9,
# do_sample=True,
#########################################################
output_audio=True
)
```
```python
audio = audio_values[0].to(torch.float32).cpu().numpy()
sf.write("example_without_context.wav", audio, 24000)
display(Audio(audio, rate=24000))
``` |