Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.42.0
ο»Ώ# π OmniAvatar API Documentation
POST /generate - Avatar Generation
Request Format
URL: https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate
Method: POST
Content-Type: application/json
Request Body (JSON)
{
"prompt": "string",
"text_to_speech": "string (optional)",
"elevenlabs_audio_url": "string (optional)",
"voice_id": "string (optional, default: '21m00Tcm4TlvDq8ikWAM')",
"image_url": "string (optional)",
"guidance_scale": "float (default: 5.0)",
"audio_scale": "float (default: 3.0)",
"num_steps": "int (default: 30)",
"sp_size": "int (default: 1)",
"tea_cache_l1_thresh": "float (optional)"
}
Request Parameters
Field | Type | Required | Description |
---|---|---|---|
prompt |
string | β | Character behavior description |
text_to_speech |
string | β | Text to convert to speech via ElevenLabs |
elevenlabs_audio_url |
string | β | Direct URL to audio file |
voice_id |
string | β | ElevenLabs voice ID (default: Rachel) |
image_url |
string | β | Reference image URL |
guidance_scale |
float | β | Prompt following strength (4-6 recommended) |
audio_scale |
float | β | Lip-sync accuracy (3-5 recommended) |
num_steps |
int | β | Generation steps (20-50 recommended) |
sp_size |
int | β | Parallel processing size |
tea_cache_l1_thresh |
float | β | Cache threshold optimization |
Note: Either text_to_speech
OR elevenlabs_audio_url
must be provided.
Example Request
{
"prompt": "A professional teacher explaining a mathematical concept with clear gestures",
"text_to_speech": "Hello students! Today we're going to learn about calculus and how derivatives work in real life.",
"voice_id": "21m00Tcm4TlvDq8ikWAM",
"image_url": "https://example.com/teacher.jpg",
"guidance_scale": 5.0,
"audio_scale": 3.5,
"num_steps": 30
}
Response Format
Success Response (200 OK):
{
"message": "string",
"output_path": "string",
"processing_time": "float",
"audio_generated": "boolean"
}
Response Fields
Field | Type | Description |
---|---|---|
message |
string | Success/status message |
output_path |
string | Path to generated video file |
processing_time |
float | Processing time in seconds |
audio_generated |
boolean | Whether audio was generated from text |
Example Response
{
"message": "Avatar generation completed successfully",
"output_path": "./outputs/avatar_20240807_130512.mp4",
"processing_time": 45.67,
"audio_generated": true
}
Error Responses
400 Bad Request:
{
"detail": "Either text_to_speech or elevenlabs_audio_url must be provided"
}
500 Internal Server Error:
{
"detail": "Model not loaded"
}
503 Service Unavailable:
{
"detail": "Model not loaded"
}
Available ElevenLabs Voices
Voice ID | Name | Description |
---|---|---|
21m00Tcm4TlvDq8ikWAM |
Rachel | Default, clear female voice |
pNInz6obpgDQGcFmaJgB |
Adam | Professional male voice |
EXAVITQu4vr4xnSDxMaL |
Bella | Expressive female voice |
Usage Examples
With Text-to-Speech
curl -X POST "https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A friendly presenter speaking confidently",
"text_to_speech": "Welcome to our AI avatar demonstration!",
"voice_id": "21m00Tcm4TlvDq8ikWAM",
"guidance_scale": 5.5,
"audio_scale": 4.0
}'
With Audio URL
curl -X POST "https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A news anchor delivering headlines",
"elevenlabs_audio_url": "https://example.com/audio.mp3",
"image_url": "https://example.com/anchor.jpg",
"num_steps": 40
}'
Other Endpoints
GET /health - Health Check
{
"status": "healthy",
"model_loaded": true,
"device": "cuda",
"supports_elevenlabs": true,
"supports_image_urls": true,
"supports_text_to_speech": true,
"elevenlabs_api_configured": true
}
GET /docs - FastAPI Documentation
Interactive API documentation available at /docs
endpoint.
Rate Limits & Performance
- Processing Time: 30-120 seconds depending on complexity
- Max Video Length: Determined by audio length
- Supported Formats: MP4 output, MP3/WAV audio input
- GPU Acceleration: Enabled on T4+ hardware
Live API Base URL: https://huggingface.co/spaces/bravedims/AI_Avatar_Chat