# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with this HuggingFace Spaces demo repository. ## Repository Overview This is the HuggingFace Spaces repository for HybridTransformer-MFIF, providing an interactive Gradio-based web demo for the multi-focus image fusion model. Users can upload near-focus and far-focus images to see the hybrid transformer model fuse them into a single all-in-focus image. ## Repository Structure ### Core Application Files - `app.py`: Main Gradio application with complete model definition and inference pipeline - `README.md`: HuggingFace Spaces configuration and demo documentation - `requirements.txt`: Python dependencies for the Gradio application - `pyproject.toml`: Additional project configuration - `uv.lock`: Dependency lock file ### Assets - `assets/`: Directory containing sample images for the demo - `lytro-01-A.jpg`: Near-focus example image - `lytro-01-B.jpg`: Far-focus example image ### Documentation - `AGENTS.md`: Agent interaction documentation - `LICENSE`: Project license ## Application Architecture (app.py) ### Model Components The application includes the complete model definition: - **FocalModulation**: Adaptive spatial attention mechanism - **CrossAttention**: Cross-view attention between input images - **CrossViTBlock**: Cross-attention transformer blocks - **FocalTransformerBlock**: Focal modulation transformer blocks - **PatchEmbed**: Image patch embedding layer - **FocalCrossViTHybrid**: Main hybrid model architecture ### Model Configuration - **Image Size**: 224×224 pixels - **Patch Size**: 16×16 - **Embedding Dimension**: 768 - **CrossViT Depth**: 4 blocks - **Focal Transformer Depth**: 6 blocks - **Attention Heads**: 12 - **Focal Window**: 9×9 - **Focal Levels**: 3 ### Key Functions - `load_model()`: Downloads model from HuggingFace Hub and initializes with error handling - `get_transform()`: Image preprocessing pipeline - `denormalize()`: Convert model output back to displayable format - `fuse_images()`: Main inference function for image fusion ## Development Guidelines ### Local Development Setup ```bash # Clone the repository git clone https://huggingface.co/spaces/divitmittal/hybridtransformer-mfif cd hybridtransformer-mfif # Install dependencies pip install -r requirements.txt # OR with uv uv sync # Run the application python app.py # OR with uv uv run app.py ``` ### Model Loading Requirements - Downloads model checkpoint `best_model.pth` from HuggingFace Hub repository `divitmittal/HybridTransformer-MFIF` - Model weights are cached locally in `./model_cache` directory - Model weights should be compatible with the defined architecture - Supports both regular and DataParallel model states - Automatic device detection (CUDA/CPU) ### Image Processing Pipeline 1. **Input**: PIL images (any size) 2. **Preprocessing**: Resize to 224×224, normalize with ImageNet stats 3. **Inference**: Forward pass through hybrid transformer 4. **Postprocessing**: Denormalize and convert to PIL image 5. **Output**: Fused PIL image ## Gradio Interface Components ### Input Components - `near_img`: Image upload for near-focus input - `far_img`: Image upload for far-focus input - `submit_btn`: Button to trigger fusion process ### Output Components - `fused_img`: Display for the resulting fused image ### Examples - Predefined example pair using sample Lytro images - Demonstrates expected input format and quality ## Error Handling ### Model Loading Errors - Graceful handling of HuggingFace Hub download failures - Device compatibility checking - State dictionary format validation - Network connectivity error handling ### Input Validation - Checks for missing input images - Handles various image formats via PIL - Automatic error messages via Gradio interface ### Runtime Errors - GPU memory management - Inference error handling - Graceful degradation to CPU if needed ## Performance Considerations ### Model Optimization - Model is set to evaluation mode for inference - No gradient computation during inference - Efficient tensor operations with proper device placement ### Memory Management - Single model instance cached globally - Proper tensor cleanup after inference - Device-appropriate memory allocation ## HuggingFace Spaces Configuration (README.md) ### Spaces Metadata - **Title**: Hybrid Transformer for Multi-Focus Image Fusion - **SDK**: Gradio - **App File**: app.py - **Emoji**: 🖼️ - **Color Theme**: Blue to green gradient ### Demo Features - Interactive image upload interface - Real-time fusion processing - Example images for testing - Responsive web interface ## Dependencies (requirements.txt) ### Core Dependencies - `torch`: PyTorch framework for model inference - `torchvision`: Image transformations and utilities - `gradio`: Web interface framework - `numpy`: Numerical computations - `Pillow`: Image processing library - `huggingface_hub`: Download models from HuggingFace Hub ### Version Management - Minimal version specifications for maximum compatibility - Focused on essential dependencies only - Compatible with HuggingFace Spaces environment ## Usage Examples ### Basic Usage 1. Upload a near-focus image (foreground in focus) 2. Upload a far-focus image (background in focus) 3. Click "Fuse Images" to generate the all-in-focus result ### Expected Input - Image pairs with complementary focus regions - RGB color images (any resolution, will be resized) - Similar scene content with different focal points ### Output Quality - High-resolution fused images maintaining detail from both inputs - Optimal focus transfer from source images - Seamless blending without artifacts ## Development Tips ### Model Modifications - Model architecture is defined directly in `app.py` - Changes require updating the model class definitions - Ensure compatibility with existing checkpoint format ### Interface Updates - Gradio interface is highly customizable - Can add new input/output components easily - Supports additional preprocessing or postprocessing steps ### Deployment - Optimized for HuggingFace Spaces deployment - Automatic dependency installation - Zero-configuration cloud deployment This demo provides an accessible way for users to experience the multi-focus image fusion capabilities without requiring technical setup or model training.