AI_Avatar_Chat / MODEL_DOWNLOAD_GUIDE.md
bravedims
πŸ“‹ Add model download guides and helpers for TTS-only mode issue
c89ce9a

A newer version of the Gradio SDK is available: 5.42.0

Upgrade

ο»Ώ# Alternative OmniAvatar Model Download Guide

🎯 Why You're Getting Only Audio Output

Your app is working correctly but running in TTS-only mode because the OmniAvatar-14B models are missing. The app gracefully falls back to audio-only generation when video models aren't available.

πŸš€ Solutions to Enable Video Generation

Option 1: Use Git to Download Models (If you have Git LFS)

Create model directories

mkdir pretrained_models\Wan2.1-T2V-14B mkdir pretrained_models\OmniAvatar-14B
mkdir pretrained_models\wav2vec2-base-960h

Clone models (requires Git LFS)

git lfs clone https://huggingface.co/Wan-AI/Wan2.1-T2V-14B pretrained_models/Wan2.1-T2V-14B git lfs clone https://huggingface.co/OmniAvatar/OmniAvatar-14B pretrained_models/OmniAvatar-14B git lfs clone https://huggingface.co/facebook/wav2vec2-base-960h pretrained_models/wav2vec2-base-960h

Option 2: Install Python and Run Setup Script

  1. Install Python (if not already done):

  2. Run the setup script: python setup_omniavatar.py

Option 3: Manual Download from HuggingFace

Visit these URLs and download manually:

Extract to:

  • pretrained_models/Wan2.1-T2V-14B/
  • pretrained_models/OmniAvatar-14B/
  • pretrained_models/wav2vec2-base-960h/

Option 4: Use Windows Subsystem for Linux (WSL)

If you have WSL installed:

wsl
cd /mnt/c/path/to/your/project
python setup_omniavatar.py

πŸ“Š Model Requirements

Total download size: ~30.36GB

  • Wan2.1-T2V-14B: ~28GB (base text-to-video model)
  • OmniAvatar-14B: ~2GB (avatar animation weights)
  • wav2vec2-base-960h: ~360MB (audio encoder)

πŸ” Verify Installation

After downloading, restart your app and check:

  • The app should show "full functionality enabled" in logs
  • API responses should return video URLs instead of just audio
  • Gradio interface should show video output component

πŸ’‘ Current Status

Your setup is working perfectly for TTS! Once the OmniAvatar models are downloaded, you'll get: βœ… Audio-driven avatar videos
βœ… Adaptive body animation βœ… Lip-sync accuracy βœ… 480p video output