API_DOCUMENTATION.md · bravedims/AI_Avatar

# 🔌 OmniAvatar API Documentation

POST /generate - Avatar Generation

Request Format

URL: https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate Method: POST Content-Type: application/json

Request Body (JSON)

{
  "prompt": "string",
  "text_to_speech": "string (optional)",
  "elevenlabs_audio_url": "string (optional)",
  "voice_id": "string (optional, default: '21m00Tcm4TlvDq8ikWAM')",
  "image_url": "string (optional)",
  "guidance_scale": "float (default: 5.0)",
  "audio_scale": "float (default: 3.0)",
  "num_steps": "int (default: 30)",
  "sp_size": "int (default: 1)",
  "tea_cache_l1_thresh": "float (optional)"
}

Request Parameters

Field	Type	Required	Description
`prompt`	string	✅	Character behavior description
`text_to_speech`	string	❌	Text to convert to speech via ElevenLabs
`elevenlabs_audio_url`	string	❌	Direct URL to audio file
`voice_id`	string	❌	ElevenLabs voice ID (default: Rachel)
`image_url`	string	❌	Reference image URL
`guidance_scale`	float	❌	Prompt following strength (4-6 recommended)
`audio_scale`	float	❌	Lip-sync accuracy (3-5 recommended)
`num_steps`	int	❌	Generation steps (20-50 recommended)
`sp_size`	int	❌	Parallel processing size
`tea_cache_l1_thresh`	float	❌	Cache threshold optimization

Note: Either text_to_speech OR elevenlabs_audio_url must be provided.

Example Request

{
  "prompt": "A professional teacher explaining a mathematical concept with clear gestures",
  "text_to_speech": "Hello students! Today we're going to learn about calculus and how derivatives work in real life.",
  "voice_id": "21m00Tcm4TlvDq8ikWAM",
  "image_url": "https://example.com/teacher.jpg",
  "guidance_scale": 5.0,
  "audio_scale": 3.5,
  "num_steps": 30
}

Response Format

Success Response (200 OK):

{
  "message": "string",
  "output_path": "string",
  "processing_time": "float",
  "audio_generated": "boolean"
}

Response Fields

Field	Type	Description
`message`	string	Success/status message
`output_path`	string	Path to generated video file
`processing_time`	float	Processing time in seconds
`audio_generated`	boolean	Whether audio was generated from text

Example Response

{
  "message": "Avatar generation completed successfully",
  "output_path": "./outputs/avatar_20240807_130512.mp4",
  "processing_time": 45.67,
  "audio_generated": true
}

Error Responses

400 Bad Request:

{
  "detail": "Either text_to_speech or elevenlabs_audio_url must be provided"
}

500 Internal Server Error:

{
  "detail": "Model not loaded"
}

503 Service Unavailable:

{
  "detail": "Model not loaded"
}

Available ElevenLabs Voices

Voice ID	Name	Description
`21m00Tcm4TlvDq8ikWAM`	Rachel	Default, clear female voice
`pNInz6obpgDQGcFmaJgB`	Adam	Professional male voice
`EXAVITQu4vr4xnSDxMaL`	Bella	Expressive female voice

Usage Examples

With Text-to-Speech

curl -X POST "https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A friendly presenter speaking confidently",
    "text_to_speech": "Welcome to our AI avatar demonstration!",
    "voice_id": "21m00Tcm4TlvDq8ikWAM",
    "guidance_scale": 5.5,
    "audio_scale": 4.0
  }'

With Audio URL

curl -X POST "https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A news anchor delivering headlines",
    "elevenlabs_audio_url": "https://example.com/audio.mp3",
    "image_url": "https://example.com/anchor.jpg",
    "num_steps": 40
  }'

Other Endpoints

GET /health - Health Check

{
  "status": "healthy",
  "model_loaded": true,
  "device": "cuda",
  "supports_elevenlabs": true,
  "supports_image_urls": true,
  "supports_text_to_speech": true,
  "elevenlabs_api_configured": true
}

GET /docs - FastAPI Documentation

Interactive API documentation available at /docs endpoint.

Rate Limits & Performance

Processing Time: 30-120 seconds depending on complexity
Max Video Length: Determined by audio length
Supported Formats: MP4 output, MP3/WAV audio input
GPU Acceleration: Enabled on T4+ hardware

Live API Base URL: https://huggingface.co/spaces/bravedims/AI_Avatar_Chat