CloneTTS - Text-to-Speech Model

CloneTTS is a Text-to-Speech (TTS) model trained on the Clone dataset. The model converts text input into natural-sounding speech and is built to facilitate speech synthesis tasks. It uses the Clone dataset for training, which includes transcriptions and corresponding audio files.

License

This model is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share, adapt, and use the model for any purpose, including commercial uses, as long as appropriate credit is given.

Model Overview

Input: Text data.
Output: .wav audio files (speech).
Task: Text-to-speech (TTS) conversion.

Features

Convert text to high-quality, natural-sounding speech.
Trained using the Clone dataset, designed to improve the quality of generated speech.

Dataset Overview

This model is trained on the Clone dataset, which consists of:

Audio files: .wav format.
Transcriptions: Corresponding text transcriptions for each audio file.
Format: A CSV file that pairs audio file paths with their corresponding text.

File Structure

data/: Contains the audio files and the transcriptions.csv file used to train the model.
model/: Contains the trained model files, including model_weights.h5 and model_config.json.
notebooks/: Contains Jupyter notebooks for experimenting with the model and performing inference.
requirements.txt: A list of required libraries and dependencies for running the model.
train.py: Script to train the model on your dataset.

Installation

To use this model, follow the instructions below to clone the repository and install dependencies.

Clone the repository:

git clone https://github.com/your_username/CloneTTS.git
cd CloneTTS