---
title: Kokoro ONNX TTS (SSE)
emoji: 🗣️
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
---


# Kokoro-ONNX TTS — FastAPI + SSE (Docker)

A minimal **streaming TTS API** using **FastAPI** and **kokoro-onnx**, with a **Server-Sent Events (SSE)** endpoint.
Includes:
- `/v1/tts.sse` → streams base64-encoded PCM16 frames (24 kHz mono)
- `/v1/voices` → list voices
- `/healthz` → health
- A simple HTML client (buffers then plays after generation) and a Python client that **streams to ffplay** for real-time playback
- Dockerfile using **uv** (Astral) as the package manager

> Note: Browsers can't easily play raw PCM from SSE without an AudioWorklet and resampling. For reliable live playback, use the Python client + `ffplay` pipeline below.

## 1) Get model + voices

Download the Kokoro ONNX model and voices file and place them under `models/`:

```bash
mkdir -p models
# Example (update URLs to the latest release if needed):
curl -L -o models/kokoro-v1.0.int8.onnx "https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.int8.onnx"
curl -L -o models/voices-v1.0.bin      "https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin"
```

Set env vars if you use different paths:
- `KOKORO_MODEL` (default: `models/kokoro-v1.0.int8.onnx`)
- `KOKORO_VOICES` (default: `models/voices-v1.0.bin`)

## 2) Run locally (without Docker)

```bash
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt

# Start
uv run uvicorn app:app --host 0.0.0.0 --port 8000

# Test SSE → ffplay (streaming)
curl -G --data-urlencode 'text=Hello from Kokoro SSE!'   "http://localhost:8000/v1/tts.sse?voice=af_sarah&speed=1.0&lang=en-us" | python3 clients/py_play_ffplay.py
# (This script writes raw PCM16 to stdout; pipe to ffplay)
# Example:
# curl -G --data-urlencode 'text=Hello!' "http://localhost:8000/v1/tts.sse" | python3 clients/py_play_ffplay.py | ffplay -nodisp -autoexit -f s16le -ar 24000 -ac 1 -
```

## 3) Build & run with Docker

```bash
# Build (ensure models/ exists in build context if you want to COPY them)
docker build -t kokoro-sse .

# Run
docker run --rm -p 8000:8000   -e KOKORO_MODEL=/app/models/kokoro-v1.0.int8.onnx   -e KOKORO_VOICES=/app/models/voices-v1.0.bin   kokoro-sse
```

If you don't want to bake models into the image, you can `-v` mount a host folder containing the model files to `/app/models`.

## 4) Endpoints

- `GET /v1/tts.sse?text=...&voice=af_sarah&speed=1.0&lang=en-us`  
  Returns `text/event-stream` with events:
  ```json
  {"seq": 0, "sr": 24000, "ch": 1, "format": "s16le", "pcm16": "<base64>"}
  ```
  Final message: `event: done`, `data: {"total_chunks": N, "total_samples": M}`

- `GET /v1/voices` → `{"voices": [...]}`

- `GET /healthz` → `{"status": "ok", "model": "..."}`

## 5) Simple HTML client

Open `static/client.html` in a browser. It buffers SSE chunks and plays after the stream completes.

## 6) Python client for real-time playback (recommended)

```bash
# Linux / macOS:
curl -G --data-urlencode 'text=Live streaming via SSE' "http://localhost:8000/v1/tts.sse"  | python3 clients/py_play_ffplay.py  | ffplay -nodisp -autoexit -f s16le -ar 24000 -ac 1 -
```

## 7) Notes

- Output sample rate is **24 kHz** (mono), PCM16 LE.
- If you need a **WebSocket** endpoint (binary frames) or an **OpenAI-compatible** API shim, you can extend `app.py` easily.