Spaces:

ArunKr
/

kokoro-sse

Sleeping

App Files Files Community

kokoro-sse / README.md

ArunKr

Update README.md

dbacbf7 verified 4 months ago

preview code

raw

history blame

3.44 kB

metadata

title: Kokoro ONNX TTS (SSE)
emoji: 🗣️
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0

Kokoro-ONNX TTS — FastAPI + SSE (Docker)

A minimal streaming TTS API using FastAPI and kokoro-onnx, with a Server-Sent Events (SSE) endpoint. Includes:

/v1/tts.sse → streams base64-encoded PCM16 frames (24 kHz mono)
/v1/voices → list voices
/healthz → health
A simple HTML client (buffers then plays after generation) and a Python client that streams to ffplay for real-time playback
Dockerfile using uv (Astral) as the package manager

Note: Browsers can't easily play raw PCM from SSE without an AudioWorklet and resampling. For reliable live playback, use the Python client + ffplay pipeline below.

1) Get model + voices

Download the Kokoro ONNX model and voices file and place them under models/:

mkdir -p models
# Example (update URLs to the latest release if needed):
curl -L -o models/kokoro-v1.0.int8.onnx "https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.int8.onnx"
curl -L -o models/voices-v1.0.bin      "https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin"

Set env vars if you use different paths:

KOKORO_MODEL (default: models/kokoro-v1.0.int8.onnx)
KOKORO_VOICES (default: models/voices-v1.0.bin)

2) Run locally (without Docker)

uv venv
source .venv/bin/activate
uv pip install -r requirements.txt

# Start
uv run uvicorn app:app --host 0.0.0.0 --port 8000

# Test SSE → ffplay (streaming)
curl -G --data-urlencode 'text=Hello from Kokoro SSE!'   "http://localhost:8000/v1/tts.sse?voice=af_sarah&speed=1.0&lang=en-us" | python3 clients/py_play_ffplay.py
# (This script writes raw PCM16 to stdout; pipe to ffplay)
# Example:
# curl -G --data-urlencode 'text=Hello!' "http://localhost:8000/v1/tts.sse" | python3 clients/py_play_ffplay.py | ffplay -nodisp -autoexit -f s16le -ar 24000 -ac 1 -

3) Build & run with Docker

# Build (ensure models/ exists in build context if you want to COPY them)
docker build -t kokoro-sse .

# Run
docker run --rm -p 8000:8000   -e KOKORO_MODEL=/app/models/kokoro-v1.0.int8.onnx   -e KOKORO_VOICES=/app/models/voices-v1.0.bin   kokoro-sse

If you don't want to bake models into the image, you can -v mount a host folder containing the model files to /app/models.

4) Endpoints

GET /v1/tts.sse?text=...&voice=af_sarah&speed=1.0&lang=en-us
Returns text/event-stream with events:
```
{"seq": 0, "sr": 24000, "ch": 1, "format": "s16le", "pcm16": "<base64>"}
```
Final message: event: done, data: {"total_chunks": N, "total_samples": M}
GET /v1/voices → {"voices": [...]}
GET /healthz → {"status": "ok", "model": "..."}

5) Simple HTML client

Open static/client.html in a browser. It buffers SSE chunks and plays after the stream completes.

6) Python client for real-time playback (recommended)

# Linux / macOS:
curl -G --data-urlencode 'text=Live streaming via SSE' "http://localhost:8000/v1/tts.sse"  | python3 clients/py_play_ffplay.py  | ffplay -nodisp -autoexit -f s16le -ar 24000 -ac 1 -

7) Notes

Output sample rate is 24 kHz (mono), PCM16 LE.
If you need a WebSocket endpoint (binary frames) or an OpenAI-compatible API shim, you can extend app.py easily.