File size: 1,774 Bytes
c6d620a b078155 c6d620a dab5cce e5dfe48 dab5cce 5e3be79 dab5cce b6e8a6b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
title: Emotional TTS Comparison
emoji: π£οΈ
colorFrom: blue
colorTo: pink
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
---
# Emotional TTS Comparison
This project explores ways to incorporate emotion into Text-to-Speech (TTS) using OpenAI's GPT-4o-mini for text modification and TTS-1 for speech synthesis.

## Background
While some TTS systems like Bark can include descriptive elements in speech (e.g., "(ν° μ리λ‘) μνν΄μ!"), they may have quality issues with noise. This project aims to find a method to convey emotion using OpenAI's TTS while maintaining high audio quality.
## How It Works
1. The user inputs a text.
2. The system generates three versions of the text:
- Original: The input text as-is
- Emotional: A slightly more emotional version
- Exaggerated: A highly emotional, exaggerated version
3. Each version is then converted to speech using OpenAI's TTS-1 model.
## Example
Original: "μνν΄μ"
Emotional: "μνν΄μ!!"
Exaggerated: "μ κΉλ§μ! μλΌ, μνν΄μ!!"
## Features
- Uses GPT-4o-mini for text modification
- Employs OpenAI's TTS-1 for high-quality speech synthesis
- Provides a Gradio interface for easy interaction
- Allows comparison of different emotional intensities in speech
## Usage
1. Enter your text in the input box.
2. Click "Generate Versions and Speech".
3. Listen to and compare the three versions of the speech.
## Deployment
This project is deployed on Hugging Face Spaces, allowing easy access and usage without local setup.
## Note
This approach aims to strike a balance between conveying emotion and maintaining speech quality. It demonstrates how text modification can influence the perceived emotion in TTS output. |