Spaces:

bravedims
/

AI_Avatar_Chat

Running

File size: 4,715 Bytes

8be8b4b

# 🔌 OmniAvatar API Documentation

## POST /generate - Avatar Generation

### Request Format

**URL:** `https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate`
**Method:** `POST`
**Content-Type:** `application/json`

### Request Body (JSON)

```json
{
  "prompt": "string",
  "text_to_speech": "string (optional)",
  "elevenlabs_audio_url": "string (optional)",
  "voice_id": "string (optional, default: '21m00Tcm4TlvDq8ikWAM')",
  "image_url": "string (optional)",
  "guidance_scale": "float (default: 5.0)",
  "audio_scale": "float (default: 3.0)",
  "num_steps": "int (default: 30)",
  "sp_size": "int (default: 1)",
  "tea_cache_l1_thresh": "float (optional)"
}
```

### Request Parameters

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `prompt` | string | ✅ | Character behavior description |
| `text_to_speech` | string | ❌ | Text to convert to speech via ElevenLabs |
| `elevenlabs_audio_url` | string | ❌ | Direct URL to audio file |
| `voice_id` | string | ❌ | ElevenLabs voice ID (default: Rachel) |
| `image_url` | string | ❌ | Reference image URL |
| `guidance_scale` | float | ❌ | Prompt following strength (4-6 recommended) |
| `audio_scale` | float | ❌ | Lip-sync accuracy (3-5 recommended) |
| `num_steps` | int | ❌ | Generation steps (20-50 recommended) |
| `sp_size` | int | ❌ | Parallel processing size |
| `tea_cache_l1_thresh` | float | ❌ | Cache threshold optimization |

**Note:** Either `text_to_speech` OR `elevenlabs_audio_url` must be provided.

### Example Request

```json
{
  "prompt": "A professional teacher explaining a mathematical concept with clear gestures",
  "text_to_speech": "Hello students! Today we're going to learn about calculus and how derivatives work in real life.",
  "voice_id": "21m00Tcm4TlvDq8ikWAM",
  "image_url": "https://example.com/teacher.jpg",
  "guidance_scale": 5.0,
  "audio_scale": 3.5,
  "num_steps": 30
}
```

### Response Format

**Success Response (200 OK):**

```json
{
  "message": "string",
  "output_path": "string",
  "processing_time": "float",
  "audio_generated": "boolean"
}
```

### Response Fields

| Field | Type | Description |
|-------|------|-------------|
| `message` | string | Success/status message |
| `output_path` | string | Path to generated video file |
| `processing_time` | float | Processing time in seconds |
| `audio_generated` | boolean | Whether audio was generated from text |

### Example Response

```json
{
  "message": "Avatar generation completed successfully",
  "output_path": "./outputs/avatar_20240807_130512.mp4",
  "processing_time": 45.67,
  "audio_generated": true
}
```

### Error Responses

**400 Bad Request:**
```json
{
  "detail": "Either text_to_speech or elevenlabs_audio_url must be provided"
}
```

**500 Internal Server Error:**
```json
{
  "detail": "Model not loaded"
}
```

**503 Service Unavailable:**
```json
{
  "detail": "Model not loaded"
}
```

### Available ElevenLabs Voices

| Voice ID | Name | Description |
|----------|------|-------------|
| `21m00Tcm4TlvDq8ikWAM` | Rachel | Default, clear female voice |
| `pNInz6obpgDQGcFmaJgB` | Adam | Professional male voice |
| `EXAVITQu4vr4xnSDxMaL` | Bella | Expressive female voice |

### Usage Examples

#### With Text-to-Speech
```bash
curl -X POST "https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A friendly presenter speaking confidently",
    "text_to_speech": "Welcome to our AI avatar demonstration!",
    "voice_id": "21m00Tcm4TlvDq8ikWAM",
    "guidance_scale": 5.5,
    "audio_scale": 4.0
  }'
```

#### With Audio URL
```bash
curl -X POST "https://huggingface.co/spaces/bravedims/AI_Avatar_Chat/api/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A news anchor delivering headlines",
    "elevenlabs_audio_url": "https://example.com/audio.mp3",
    "image_url": "https://example.com/anchor.jpg",
    "num_steps": 40
  }'
```

### Other Endpoints

#### GET /health - Health Check
```json
{
  "status": "healthy",
  "model_loaded": true,
  "device": "cuda",
  "supports_elevenlabs": true,
  "supports_image_urls": true,
  "supports_text_to_speech": true,
  "elevenlabs_api_configured": true
}
```

#### GET /docs - FastAPI Documentation
Interactive API documentation available at `/docs` endpoint.

### Rate Limits & Performance

- **Processing Time:** 30-120 seconds depending on complexity
- **Max Video Length:** Determined by audio length
- **Supported Formats:** MP4 output, MP3/WAV audio input
- **GPU Acceleration:** Enabled on T4+ hardware

---

**Live API Base URL:** `https://huggingface.co/spaces/bravedims/AI_Avatar_Chat`