# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview LIA-X is a Portrait Animator application built with Gradio that enables image animation, image editing, and video editing using deep learning models. It's deployed as a Hugging Face Space with GPU acceleration. ## Architecture ### Core Components 1. **Main Application** (`app.py`): Gradio web interface that loads the model and serves three main tabs 2. **Generator Network** (`networks/generator.py`): Core neural network model that handles animation and editing - Uses encoder-decoder architecture - Implements motion encoding and style transfer - Pre-allocates tensors for performance optimization 3. **Gradio Tabs** (`gradio_tabs/`): UI modules for different functionalities - `animation.py`: Handles image-to-video animation - `img_edit.py`: Image editing interface - `vid_edit.py`: Video editing interface ### Model Architecture - **Encoder** (`networks/encoder.py`): Encodes source images and motion - **Decoder** (`networks/decoder.py`): Reconstructs edited/animated outputs - **Custom Ops** (`networks/op/`): CUDA kernels for optimized operations (fused_act, upfirdn2d) ## Development Commands ### Running the Application ```bash python app.py ``` The app launches a Gradio interface on local server. Note: Requires CUDA-capable GPU. ### Installing Dependencies ```bash pip install -r requirements.txt ``` Key dependencies: PyTorch 2.5.1, torchvision, Gradio 5.42.0, einops, imageio, av ### Model Loading The model checkpoint is automatically downloaded from Hugging Face Hub: - Repository: `YaohuiW/LIA-X` - File: `lia-x.pt` ## Important Notes - This is a GPU-only application (uses `torch.device("cuda")`) - Uses `@spaces` decorator for Hugging Face Spaces GPU allocation - Model operates at 512x512 resolution with motion_dim=40 - Chunk size of 16 frames for video processing - Custom CUDA kernels in `networks/op/` require compilation with ninja - Git LFS is configured for large files (models, videos, images) ## File Processing - Images: Loaded as RGB, resized to 512x512, normalized to [-1, 1] - Videos: Processed with torchvision, maintains original FPS - Supports cropping tools for better results (referenced in instruction.md) ## Testing No explicit test suite found. Manual testing through Gradio interface. ## Data Structure - `data/source/`: Source images for examples - `data/driving/`: Driving videos for animation examples - `assets/`: Documentation and UI text (instruction.md, title.md)