--- title: Kokoro ONNX TTS (SSE) emoji: 🗣️ colorFrom: indigo colorTo: purple sdk: docker pinned: false license: apache-2.0 --- # Kokoro-ONNX TTS — FastAPI + SSE (Docker) A minimal **streaming TTS API** using **FastAPI** and **kokoro-onnx**, with a **Server-Sent Events (SSE)** endpoint. Includes: - `/v1/tts.sse` → streams base64-encoded PCM16 frames (24 kHz mono) - `/v1/voices` → list voices - `/healthz` → health - A simple HTML client (buffers then plays after generation) and a Python client that **streams to ffplay** for real-time playback - Dockerfile using **uv** (Astral) as the package manager > Note: Browsers can't easily play raw PCM from SSE without an AudioWorklet and resampling. For reliable live playback, use the Python client + `ffplay` pipeline below. ## 1) Get model + voices Download the Kokoro ONNX model and voices file and place them under `models/`: ```bash mkdir -p models # Example (update URLs to the latest release if needed): curl -L -o models/kokoro-v1.0.int8.onnx "https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.int8.onnx" curl -L -o models/voices-v1.0.bin "https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin" ``` Set env vars if you use different paths: - `KOKORO_MODEL` (default: `models/kokoro-v1.0.int8.onnx`) - `KOKORO_VOICES` (default: `models/voices-v1.0.bin`) ## 2) Run locally (without Docker) ```bash uv venv source .venv/bin/activate uv pip install -r requirements.txt # Start uv run uvicorn app:app --host 0.0.0.0 --port 8000 # Test SSE → ffplay (streaming) curl -G --data-urlencode 'text=Hello from Kokoro SSE!' "http://localhost:8000/v1/tts.sse?voice=af_sarah&speed=1.0&lang=en-us" | python3 clients/py_play_ffplay.py # (This script writes raw PCM16 to stdout; pipe to ffplay) # Example: # curl -G --data-urlencode 'text=Hello!' "http://localhost:8000/v1/tts.sse" | python3 clients/py_play_ffplay.py | ffplay -nodisp -autoexit -f s16le -ar 24000 -ac 1 - ``` ## 3) Build & run with Docker ```bash # Build (ensure models/ exists in build context if you want to COPY them) docker build -t kokoro-sse . # Run docker run --rm -p 8000:8000 -e KOKORO_MODEL=/app/models/kokoro-v1.0.int8.onnx -e KOKORO_VOICES=/app/models/voices-v1.0.bin kokoro-sse ``` If you don't want to bake models into the image, you can `-v` mount a host folder containing the model files to `/app/models`. ## 4) Endpoints - `GET /v1/tts.sse?text=...&voice=af_sarah&speed=1.0&lang=en-us` Returns `text/event-stream` with events: ```json {"seq": 0, "sr": 24000, "ch": 1, "format": "s16le", "pcm16": ""} ``` Final message: `event: done`, `data: {"total_chunks": N, "total_samples": M}` - `GET /v1/voices` → `{"voices": [...]}` - `GET /healthz` → `{"status": "ok", "model": "..."}` ## 5) Simple HTML client Open `static/client.html` in a browser. It buffers SSE chunks and plays after the stream completes. ## 6) Python client for real-time playback (recommended) ```bash # Linux / macOS: curl -G --data-urlencode 'text=Live streaming via SSE' "http://localhost:8000/v1/tts.sse" | python3 clients/py_play_ffplay.py | ffplay -nodisp -autoexit -f s16le -ar 24000 -ac 1 - ``` ## 7) Notes - Output sample rate is **24 kHz** (mono), PCM16 LE. - If you need a **WebSocket** endpoint (binary frames) or an **OpenAI-compatible** API shim, you can extend `app.py` easily.