File size: 6,362 Bytes
0d895bb acdea4a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 |
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with this HuggingFace Spaces demo repository.
## Repository Overview
This is the HuggingFace Spaces repository for HybridTransformer-MFIF, providing an interactive Gradio-based web demo for the multi-focus image fusion model. Users can upload near-focus and far-focus images to see the hybrid transformer model fuse them into a single all-in-focus image.
## Repository Structure
### Core Application Files
- `app.py`: Main Gradio application with complete model definition and inference pipeline
- `README.md`: HuggingFace Spaces configuration and demo documentation
- `requirements.txt`: Python dependencies for the Gradio application
- `pyproject.toml`: Additional project configuration
- `uv.lock`: Dependency lock file
### Assets
- `assets/`: Directory containing sample images for the demo
- `lytro-01-A.jpg`: Near-focus example image
- `lytro-01-B.jpg`: Far-focus example image
### Documentation
- `AGENTS.md`: Agent interaction documentation
- `LICENSE`: Project license
## Application Architecture (app.py)
### Model Components
The application includes the complete model definition:
- **FocalModulation**: Adaptive spatial attention mechanism
- **CrossAttention**: Cross-view attention between input images
- **CrossViTBlock**: Cross-attention transformer blocks
- **FocalTransformerBlock**: Focal modulation transformer blocks
- **PatchEmbed**: Image patch embedding layer
- **FocalCrossViTHybrid**: Main hybrid model architecture
### Model Configuration
- **Image Size**: 224×224 pixels
- **Patch Size**: 16×16
- **Embedding Dimension**: 768
- **CrossViT Depth**: 4 blocks
- **Focal Transformer Depth**: 6 blocks
- **Attention Heads**: 12
- **Focal Window**: 9×9
- **Focal Levels**: 3
### Key Functions
- `load_model()`: Downloads model from HuggingFace Hub and initializes with error handling
- `get_transform()`: Image preprocessing pipeline
- `denormalize()`: Convert model output back to displayable format
- `fuse_images()`: Main inference function for image fusion
## Development Guidelines
### Local Development Setup
```bash
# Clone the repository
git clone https://huggingface.co/spaces/divitmittal/hybridtransformer-mfif
cd hybridtransformer-mfif
# Install dependencies
pip install -r requirements.txt
# OR with uv
uv sync
# Run the application
python app.py
# OR with uv
uv run app.py
```
### Model Loading Requirements
- Downloads model checkpoint `best_model.pth` from HuggingFace Hub repository `divitmittal/HybridTransformer-MFIF`
- Model weights are cached locally in `./model_cache` directory
- Model weights should be compatible with the defined architecture
- Supports both regular and DataParallel model states
- Automatic device detection (CUDA/CPU)
### Image Processing Pipeline
1. **Input**: PIL images (any size)
2. **Preprocessing**: Resize to 224×224, normalize with ImageNet stats
3. **Inference**: Forward pass through hybrid transformer
4. **Postprocessing**: Denormalize and convert to PIL image
5. **Output**: Fused PIL image
## Gradio Interface Components
### Input Components
- `near_img`: Image upload for near-focus input
- `far_img`: Image upload for far-focus input
- `submit_btn`: Button to trigger fusion process
### Output Components
- `fused_img`: Display for the resulting fused image
### Examples
- Predefined example pair using sample Lytro images
- Demonstrates expected input format and quality
## Error Handling
### Model Loading Errors
- Graceful handling of HuggingFace Hub download failures
- Device compatibility checking
- State dictionary format validation
- Network connectivity error handling
### Input Validation
- Checks for missing input images
- Handles various image formats via PIL
- Automatic error messages via Gradio interface
### Runtime Errors
- GPU memory management
- Inference error handling
- Graceful degradation to CPU if needed
## Performance Considerations
### Model Optimization
- Model is set to evaluation mode for inference
- No gradient computation during inference
- Efficient tensor operations with proper device placement
### Memory Management
- Single model instance cached globally
- Proper tensor cleanup after inference
- Device-appropriate memory allocation
## HuggingFace Spaces Configuration (README.md)
### Spaces Metadata
- **Title**: Hybrid Transformer for Multi-Focus Image Fusion
- **SDK**: Gradio
- **App File**: app.py
- **Emoji**: 🖼️
- **Color Theme**: Blue to green gradient
### Demo Features
- Interactive image upload interface
- Real-time fusion processing
- Example images for testing
- Responsive web interface
## Dependencies (requirements.txt)
### Core Dependencies
- `torch`: PyTorch framework for model inference
- `torchvision`: Image transformations and utilities
- `gradio`: Web interface framework
- `numpy`: Numerical computations
- `Pillow`: Image processing library
- `huggingface_hub`: Download models from HuggingFace Hub
### Version Management
- Minimal version specifications for maximum compatibility
- Focused on essential dependencies only
- Compatible with HuggingFace Spaces environment
## Usage Examples
### Basic Usage
1. Upload a near-focus image (foreground in focus)
2. Upload a far-focus image (background in focus)
3. Click "Fuse Images" to generate the all-in-focus result
### Expected Input
- Image pairs with complementary focus regions
- RGB color images (any resolution, will be resized)
- Similar scene content with different focal points
### Output Quality
- High-resolution fused images maintaining detail from both inputs
- Optimal focus transfer from source images
- Seamless blending without artifacts
## Development Tips
### Model Modifications
- Model architecture is defined directly in `app.py`
- Changes require updating the model class definitions
- Ensure compatibility with existing checkpoint format
### Interface Updates
- Gradio interface is highly customizable
- Can add new input/output components easily
- Supports additional preprocessing or postprocessing steps
### Deployment
- Optimized for HuggingFace Spaces deployment
- Automatic dependency installation
- Zero-configuration cloud deployment
This demo provides an accessible way for users to experience the multi-focus image fusion capabilities without requiring technical setup or model training.
|