Spaces:
Paused
Paused
Julian Bilcke
commited on
Commit
·
1595c43
1
Parent(s):
d72fa8b
up
Browse files
CLAUDE.md
ADDED
|
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CLAUDE.md
|
| 2 |
+
|
| 3 |
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
| 4 |
+
|
| 5 |
+
## Project Overview
|
| 6 |
+
|
| 7 |
+
LIA-X is a Portrait Animator application built with Gradio that enables image animation, image editing, and video editing using deep learning models. It's deployed as a Hugging Face Space with GPU acceleration.
|
| 8 |
+
|
| 9 |
+
## Architecture
|
| 10 |
+
|
| 11 |
+
### Core Components
|
| 12 |
+
|
| 13 |
+
1. **Main Application** (`app.py`): Gradio web interface that loads the model and serves three main tabs
|
| 14 |
+
2. **Generator Network** (`networks/generator.py`): Core neural network model that handles animation and editing
|
| 15 |
+
- Uses encoder-decoder architecture
|
| 16 |
+
- Implements motion encoding and style transfer
|
| 17 |
+
- Pre-allocates tensors for performance optimization
|
| 18 |
+
3. **Gradio Tabs** (`gradio_tabs/`): UI modules for different functionalities
|
| 19 |
+
- `animation.py`: Handles image-to-video animation
|
| 20 |
+
- `img_edit.py`: Image editing interface
|
| 21 |
+
- `vid_edit.py`: Video editing interface
|
| 22 |
+
|
| 23 |
+
### Model Architecture
|
| 24 |
+
|
| 25 |
+
- **Encoder** (`networks/encoder.py`): Encodes source images and motion
|
| 26 |
+
- **Decoder** (`networks/decoder.py`): Reconstructs edited/animated outputs
|
| 27 |
+
- **Custom Ops** (`networks/op/`): CUDA kernels for optimized operations (fused_act, upfirdn2d)
|
| 28 |
+
|
| 29 |
+
## Development Commands
|
| 30 |
+
|
| 31 |
+
### Running the Application
|
| 32 |
+
|
| 33 |
+
```bash
|
| 34 |
+
python app.py
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
The app launches a Gradio interface on local server. Note: Requires CUDA-capable GPU.
|
| 38 |
+
|
| 39 |
+
### Installing Dependencies
|
| 40 |
+
|
| 41 |
+
```bash
|
| 42 |
+
pip install -r requirements.txt
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
Key dependencies: PyTorch 2.5.1, torchvision, Gradio 5.42.0, einops, imageio, av
|
| 46 |
+
|
| 47 |
+
### Model Loading
|
| 48 |
+
|
| 49 |
+
The model checkpoint is automatically downloaded from Hugging Face Hub:
|
| 50 |
+
- Repository: `YaohuiW/LIA-X`
|
| 51 |
+
- File: `lia-x.pt`
|
| 52 |
+
|
| 53 |
+
## Important Notes
|
| 54 |
+
|
| 55 |
+
- This is a GPU-only application (uses `torch.device("cuda")`)
|
| 56 |
+
- Uses `@spaces` decorator for Hugging Face Spaces GPU allocation
|
| 57 |
+
- Model operates at 512x512 resolution with motion_dim=40
|
| 58 |
+
- Chunk size of 16 frames for video processing
|
| 59 |
+
- Custom CUDA kernels in `networks/op/` require compilation with ninja
|
| 60 |
+
- Git LFS is configured for large files (models, videos, images)
|
| 61 |
+
|
| 62 |
+
## File Processing
|
| 63 |
+
|
| 64 |
+
- Images: Loaded as RGB, resized to 512x512, normalized to [-1, 1]
|
| 65 |
+
- Videos: Processed with torchvision, maintains original FPS
|
| 66 |
+
- Supports cropping tools for better results (referenced in instruction.md)
|
| 67 |
+
|
| 68 |
+
## Testing
|
| 69 |
+
|
| 70 |
+
No explicit test suite found. Manual testing through Gradio interface.
|
| 71 |
+
|
| 72 |
+
## Data Structure
|
| 73 |
+
|
| 74 |
+
- `data/source/`: Source images for examples
|
| 75 |
+
- `data/driving/`: Driving videos for animation examples
|
| 76 |
+
- `assets/`: Documentation and UI text (instruction.md, title.md)
|