--- title: Emotional TTS Comparison emoji: πŸ—£οΈ colorFrom: blue colorTo: pink sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false --- # Emotional TTS Comparison This project explores ways to incorporate emotion into Text-to-Speech (TTS) using OpenAI's GPT-4o-mini for text modification and TTS-1 for speech synthesis. ![Capture](./images/capture.png) ## Background While some TTS systems like Bark can include descriptive elements in speech (e.g., "(큰 μ†Œλ¦¬λ‘œ) μœ„ν—˜ν•΄μš”!"), they may have quality issues with noise. This project aims to find a method to convey emotion using OpenAI's TTS while maintaining high audio quality. ## How It Works 1. The user inputs a text. 2. The system generates three versions of the text: - Original: The input text as-is - Emotional: A slightly more emotional version - Exaggerated: A highly emotional, exaggerated version 3. Each version is then converted to speech using OpenAI's TTS-1 model. ## Example Original: "μœ„ν—˜ν•΄μš”" Emotional: "μœ„ν—˜ν•΄μš”!!" Exaggerated: "μž κΉλ§Œμš”! μ•ˆλΌ, μœ„ν—˜ν•΄μš”!!" ## Features - Uses GPT-4o-mini for text modification - Employs OpenAI's TTS-1 for high-quality speech synthesis - Provides a Gradio interface for easy interaction - Allows comparison of different emotional intensities in speech ## Usage 1. Enter your text in the input box. 2. Click "Generate Versions and Speech". 3. Listen to and compare the three versions of the speech. ## Deployment This project is deployed on Hugging Face Spaces, allowing easy access and usage without local setup. ## Note This approach aims to strike a balance between conveying emotion and maintaining speech quality. It demonstrates how text modification can influence the perceived emotion in TTS output.