# 🔌 OmniAvatar API Documentation ## POST /generate - Avatar Generation ### Request Format **URL:** `https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate` **Method:** `POST` **Content-Type:** `application/json` ### Request Body (JSON) ```json { "prompt": "string", "text_to_speech": "string (optional)", "elevenlabs_audio_url": "string (optional)", "voice_id": "string (optional, default: '21m00Tcm4TlvDq8ikWAM')", "image_url": "string (optional)", "guidance_scale": "float (default: 5.0)", "audio_scale": "float (default: 3.0)", "num_steps": "int (default: 30)", "sp_size": "int (default: 1)", "tea_cache_l1_thresh": "float (optional)" } ``` ### Request Parameters | Field | Type | Required | Description | |-------|------|----------|-------------| | `prompt` | string | ✅ | Character behavior description | | `text_to_speech` | string | ❌ | Text to convert to speech via ElevenLabs | | `elevenlabs_audio_url` | string | ❌ | Direct URL to audio file | | `voice_id` | string | ❌ | ElevenLabs voice ID (default: Rachel) | | `image_url` | string | ❌ | Reference image URL | | `guidance_scale` | float | ❌ | Prompt following strength (4-6 recommended) | | `audio_scale` | float | ❌ | Lip-sync accuracy (3-5 recommended) | | `num_steps` | int | ❌ | Generation steps (20-50 recommended) | | `sp_size` | int | ❌ | Parallel processing size | | `tea_cache_l1_thresh` | float | ❌ | Cache threshold optimization | **Note:** Either `text_to_speech` OR `elevenlabs_audio_url` must be provided. ### Example Request ```json { "prompt": "A professional teacher explaining a mathematical concept with clear gestures", "text_to_speech": "Hello students! Today we're going to learn about calculus and how derivatives work in real life.", "voice_id": "21m00Tcm4TlvDq8ikWAM", "image_url": "https://example.com/teacher.jpg", "guidance_scale": 5.0, "audio_scale": 3.5, "num_steps": 30 } ``` ### Response Format **Success Response (200 OK):** ```json { "message": "string", "output_path": "string", "processing_time": "float", "audio_generated": "boolean" } ``` ### Response Fields | Field | Type | Description | |-------|------|-------------| | `message` | string | Success/status message | | `output_path` | string | Path to generated video file | | `processing_time` | float | Processing time in seconds | | `audio_generated` | boolean | Whether audio was generated from text | ### Example Response ```json { "message": "Avatar generation completed successfully", "output_path": "./outputs/avatar_20240807_130512.mp4", "processing_time": 45.67, "audio_generated": true } ``` ### Error Responses **400 Bad Request:** ```json { "detail": "Either text_to_speech or elevenlabs_audio_url must be provided" } ``` **500 Internal Server Error:** ```json { "detail": "Model not loaded" } ``` **503 Service Unavailable:** ```json { "detail": "Model not loaded" } ``` ### Available ElevenLabs Voices | Voice ID | Name | Description | |----------|------|-------------| | `21m00Tcm4TlvDq8ikWAM` | Rachel | Default, clear female voice | | `pNInz6obpgDQGcFmaJgB` | Adam | Professional male voice | | `EXAVITQu4vr4xnSDxMaL` | Bella | Expressive female voice | ### Usage Examples #### With Text-to-Speech ```bash curl -X POST "https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate" \ -H "Content-Type: application/json" \ -d '{ "prompt": "A friendly presenter speaking confidently", "text_to_speech": "Welcome to our AI avatar demonstration!", "voice_id": "21m00Tcm4TlvDq8ikWAM", "guidance_scale": 5.5, "audio_scale": 4.0 }' ``` #### With Audio URL ```bash curl -X POST "https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate" \ -H "Content-Type: application/json" \ -d '{ "prompt": "A news anchor delivering headlines", "elevenlabs_audio_url": "https://example.com/audio.mp3", "image_url": "https://example.com/anchor.jpg", "num_steps": 40 }' ``` ### Other Endpoints #### GET /health - Health Check ```json { "status": "healthy", "model_loaded": true, "device": "cuda", "supports_elevenlabs": true, "supports_image_urls": true, "supports_text_to_speech": true, "elevenlabs_api_configured": true } ``` #### GET /docs - FastAPI Documentation Interactive API documentation available at `/docs` endpoint. ### Rate Limits & Performance - **Processing Time:** 30-120 seconds depending on complexity - **Max Video Length:** Determined by audio length - **Supported Formats:** MP4 output, MP3/WAV audio input - **GPU Acceleration:** Enabled on T4+ hardware --- **Live API Base URL:** `https://huggingface.co/spaces/bravedims/AI_Avatar_Chat`