LIA-X-fast

Paused

App Files Files Community

Julian Bilcke commited on Aug 18

Commit

1595c43

1 Parent(s): d72fa8b

up

Browse files

Files changed (1) hide show

CLAUDE.md +76 -0

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,76 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Project Overview
+LIA-X is a Portrait Animator application built with Gradio that enables image animation, image editing, and video editing using deep learning models. It's deployed as a Hugging Face Space with GPU acceleration.
+## Architecture
+### Core Components
+1. **Main Application** (`app.py`): Gradio web interface that loads the model and serves three main tabs
+2. **Generator Network** (`networks/generator.py`): Core neural network model that handles animation and editing
+   - Uses encoder-decoder architecture
+   - Implements motion encoding and style transfer
+   - Pre-allocates tensors for performance optimization
+3. **Gradio Tabs** (`gradio_tabs/`): UI modules for different functionalities
+   - `animation.py`: Handles image-to-video animation
+   - `img_edit.py`: Image editing interface
+   - `vid_edit.py`: Video editing interface
+### Model Architecture
+- **Encoder** (`networks/encoder.py`): Encodes source images and motion
+- **Decoder** (`networks/decoder.py`): Reconstructs edited/animated outputs
+- **Custom Ops** (`networks/op/`): CUDA kernels for optimized operations (fused_act, upfirdn2d)
+## Development Commands
+### Running the Application
+```bash
+python app.py
+```
+The app launches a Gradio interface on local server. Note: Requires CUDA-capable GPU.
+### Installing Dependencies
+```bash
+pip install -r requirements.txt
+```
+Key dependencies: PyTorch 2.5.1, torchvision, Gradio 5.42.0, einops, imageio, av
+### Model Loading
+The model checkpoint is automatically downloaded from Hugging Face Hub:
+- Repository: `YaohuiW/LIA-X`
+- File: `lia-x.pt`
+## Important Notes
+- This is a GPU-only application (uses `torch.device("cuda")`)
+- Uses `@spaces` decorator for Hugging Face Spaces GPU allocation
+- Model operates at 512x512 resolution with motion_dim=40
+- Chunk size of 16 frames for video processing
+- Custom CUDA kernels in `networks/op/` require compilation with ninja
+- Git LFS is configured for large files (models, videos, images)
+## File Processing
+- Images: Loaded as RGB, resized to 512x512, normalized to [-1, 1]
+- Videos: Processed with torchvision, maintains original FPS
+- Supports cropping tools for better results (referenced in instruction.md)
+## Testing
+No explicit test suite found. Manual testing through Gradio interface.
+## Data Structure
+- `data/source/`: Source images for examples
+- `data/driving/`: Driving videos for animation examples
+- `assets/`: Documentation and UI text (instruction.md, title.md)