Spaces:

divitmittal
/

hybridtransformer-mfif

Running

File size: 6,362 Bytes

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with this HuggingFace Spaces demo repository.

## Repository Overview

This is the HuggingFace Spaces repository for HybridTransformer-MFIF, providing an interactive Gradio-based web demo for the multi-focus image fusion model. Users can upload near-focus and far-focus images to see the hybrid transformer model fuse them into a single all-in-focus image.

## Repository Structure

### Core Application Files
- `app.py`: Main Gradio application with complete model definition and inference pipeline
- `README.md`: HuggingFace Spaces configuration and demo documentation
- `requirements.txt`: Python dependencies for the Gradio application
- `pyproject.toml`: Additional project configuration
- `uv.lock`: Dependency lock file

### Assets
- `assets/`: Directory containing sample images for the demo
  - `lytro-01-A.jpg`: Near-focus example image
  - `lytro-01-B.jpg`: Far-focus example image

### Documentation
- `AGENTS.md`: Agent interaction documentation
- `LICENSE`: Project license

## Application Architecture (app.py)

### Model Components
The application includes the complete model definition:
- **FocalModulation**: Adaptive spatial attention mechanism
- **CrossAttention**: Cross-view attention between input images
- **CrossViTBlock**: Cross-attention transformer blocks
- **FocalTransformerBlock**: Focal modulation transformer blocks
- **PatchEmbed**: Image patch embedding layer
- **FocalCrossViTHybrid**: Main hybrid model architecture

### Model Configuration
- **Image Size**: 224×224 pixels
- **Patch Size**: 16×16
- **Embedding Dimension**: 768
- **CrossViT Depth**: 4 blocks
- **Focal Transformer Depth**: 6 blocks
- **Attention Heads**: 12
- **Focal Window**: 9×9
- **Focal Levels**: 3

### Key Functions
- `load_model()`: Downloads model from HuggingFace Hub and initializes with error handling
- `get_transform()`: Image preprocessing pipeline
- `denormalize()`: Convert model output back to displayable format
- `fuse_images()`: Main inference function for image fusion

## Development Guidelines

### Local Development Setup
```bash
# Clone the repository
git clone https://huggingface.co/spaces/divitmittal/hybridtransformer-mfif
cd hybridtransformer-mfif

# Install dependencies
pip install -r requirements.txt
# OR with uv
uv sync

# Run the application
python app.py
# OR with uv
uv run app.py
```

### Model Loading Requirements
- Downloads model checkpoint `best_model.pth` from HuggingFace Hub repository `divitmittal/HybridTransformer-MFIF`
- Model weights are cached locally in `./model_cache` directory
- Model weights should be compatible with the defined architecture
- Supports both regular and DataParallel model states
- Automatic device detection (CUDA/CPU)

### Image Processing Pipeline
1. **Input**: PIL images (any size)
2. **Preprocessing**: Resize to 224×224, normalize with ImageNet stats
3. **Inference**: Forward pass through hybrid transformer
4. **Postprocessing**: Denormalize and convert to PIL image
5. **Output**: Fused PIL image

## Gradio Interface Components

### Input Components
- `near_img`: Image upload for near-focus input
- `far_img`: Image upload for far-focus input
- `submit_btn`: Button to trigger fusion process

### Output Components
- `fused_img`: Display for the resulting fused image

### Examples
- Predefined example pair using sample Lytro images
- Demonstrates expected input format and quality

## Error Handling

### Model Loading Errors
- Graceful handling of HuggingFace Hub download failures
- Device compatibility checking
- State dictionary format validation
- Network connectivity error handling

### Input Validation
- Checks for missing input images
- Handles various image formats via PIL
- Automatic error messages via Gradio interface

### Runtime Errors
- GPU memory management
- Inference error handling
- Graceful degradation to CPU if needed

## Performance Considerations

### Model Optimization
- Model is set to evaluation mode for inference
- No gradient computation during inference
- Efficient tensor operations with proper device placement

### Memory Management
- Single model instance cached globally
- Proper tensor cleanup after inference
- Device-appropriate memory allocation

## HuggingFace Spaces Configuration (README.md)

### Spaces Metadata
- **Title**: Hybrid Transformer for Multi-Focus Image Fusion
- **SDK**: Gradio
- **App File**: app.py
- **Emoji**: 🖼️
- **Color Theme**: Blue to green gradient

### Demo Features
- Interactive image upload interface
- Real-time fusion processing
- Example images for testing
- Responsive web interface

## Dependencies (requirements.txt)

### Core Dependencies
- `torch`: PyTorch framework for model inference
- `torchvision`: Image transformations and utilities
- `gradio`: Web interface framework
- `numpy`: Numerical computations
- `Pillow`: Image processing library
- `huggingface_hub`: Download models from HuggingFace Hub

### Version Management
- Minimal version specifications for maximum compatibility
- Focused on essential dependencies only
- Compatible with HuggingFace Spaces environment

## Usage Examples

### Basic Usage
1. Upload a near-focus image (foreground in focus)
2. Upload a far-focus image (background in focus)
3. Click "Fuse Images" to generate the all-in-focus result

### Expected Input
- Image pairs with complementary focus regions
- RGB color images (any resolution, will be resized)
- Similar scene content with different focal points

### Output Quality
- High-resolution fused images maintaining detail from both inputs
- Optimal focus transfer from source images
- Seamless blending without artifacts

## Development Tips

### Model Modifications
- Model architecture is defined directly in `app.py`
- Changes require updating the model class definitions
- Ensure compatibility with existing checkpoint format

### Interface Updates
- Gradio interface is highly customizable
- Can add new input/output components easily
- Supports additional preprocessing or postprocessing steps

### Deployment
- Optimized for HuggingFace Spaces deployment
- Automatic dependency installation
- Zero-configuration cloud deployment

This demo provides an accessible way for users to experience the multi-focus image fusion capabilities without requiring technical setup or model training.