AI_Avatar_Chat / API_DOCUMENTATION.md
bravedims
Replace ElevenLabs with HuggingFace TTS (SpeechT5)
8be8b4b

A newer version of the Gradio SDK is available: 5.42.0

Upgrade

ο»Ώ# πŸ”Œ OmniAvatar API Documentation

POST /generate - Avatar Generation

Request Format

URL: https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate Method: POST Content-Type: application/json

Request Body (JSON)

{
  "prompt": "string",
  "text_to_speech": "string (optional)",
  "elevenlabs_audio_url": "string (optional)",
  "voice_id": "string (optional, default: '21m00Tcm4TlvDq8ikWAM')",
  "image_url": "string (optional)",
  "guidance_scale": "float (default: 5.0)",
  "audio_scale": "float (default: 3.0)",
  "num_steps": "int (default: 30)",
  "sp_size": "int (default: 1)",
  "tea_cache_l1_thresh": "float (optional)"
}

Request Parameters

Field Type Required Description
prompt string βœ… Character behavior description
text_to_speech string ❌ Text to convert to speech via ElevenLabs
elevenlabs_audio_url string ❌ Direct URL to audio file
voice_id string ❌ ElevenLabs voice ID (default: Rachel)
image_url string ❌ Reference image URL
guidance_scale float ❌ Prompt following strength (4-6 recommended)
audio_scale float ❌ Lip-sync accuracy (3-5 recommended)
num_steps int ❌ Generation steps (20-50 recommended)
sp_size int ❌ Parallel processing size
tea_cache_l1_thresh float ❌ Cache threshold optimization

Note: Either text_to_speech OR elevenlabs_audio_url must be provided.

Example Request

{
  "prompt": "A professional teacher explaining a mathematical concept with clear gestures",
  "text_to_speech": "Hello students! Today we're going to learn about calculus and how derivatives work in real life.",
  "voice_id": "21m00Tcm4TlvDq8ikWAM",
  "image_url": "https://example.com/teacher.jpg",
  "guidance_scale": 5.0,
  "audio_scale": 3.5,
  "num_steps": 30
}

Response Format

Success Response (200 OK):

{
  "message": "string",
  "output_path": "string",
  "processing_time": "float",
  "audio_generated": "boolean"
}

Response Fields

Field Type Description
message string Success/status message
output_path string Path to generated video file
processing_time float Processing time in seconds
audio_generated boolean Whether audio was generated from text

Example Response

{
  "message": "Avatar generation completed successfully",
  "output_path": "./outputs/avatar_20240807_130512.mp4",
  "processing_time": 45.67,
  "audio_generated": true
}

Error Responses

400 Bad Request:

{
  "detail": "Either text_to_speech or elevenlabs_audio_url must be provided"
}

500 Internal Server Error:

{
  "detail": "Model not loaded"
}

503 Service Unavailable:

{
  "detail": "Model not loaded"
}

Available ElevenLabs Voices

Voice ID Name Description
21m00Tcm4TlvDq8ikWAM Rachel Default, clear female voice
pNInz6obpgDQGcFmaJgB Adam Professional male voice
EXAVITQu4vr4xnSDxMaL Bella Expressive female voice

Usage Examples

With Text-to-Speech

curl -X POST "https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A friendly presenter speaking confidently",
    "text_to_speech": "Welcome to our AI avatar demonstration!",
    "voice_id": "21m00Tcm4TlvDq8ikWAM",
    "guidance_scale": 5.5,
    "audio_scale": 4.0
  }'

With Audio URL

curl -X POST "https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A news anchor delivering headlines",
    "elevenlabs_audio_url": "https://example.com/audio.mp3",
    "image_url": "https://example.com/anchor.jpg",
    "num_steps": 40
  }'

Other Endpoints

GET /health - Health Check

{
  "status": "healthy",
  "model_loaded": true,
  "device": "cuda",
  "supports_elevenlabs": true,
  "supports_image_urls": true,
  "supports_text_to_speech": true,
  "elevenlabs_api_configured": true
}

GET /docs - FastAPI Documentation

Interactive API documentation available at /docs endpoint.

Rate Limits & Performance

  • Processing Time: 30-120 seconds depending on complexity
  • Max Video Length: Determined by audio length
  • Supported Formats: MP4 output, MP3/WAV audio input
  • GPU Acceleration: Enabled on T4+ hardware

Live API Base URL: https://huggingface.co/spaces/bravedims/AI_Avatar_Chat