divitmittal's picture
chore: remove outdated AGENTS.md and CRUSH.md
acdea4a

A newer version of the Gradio SDK is available: 5.44.1

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with this HuggingFace Spaces demo repository.

Repository Overview

This is the HuggingFace Spaces repository for HybridTransformer-MFIF, providing an interactive Gradio-based web demo for the multi-focus image fusion model. Users can upload near-focus and far-focus images to see the hybrid transformer model fuse them into a single all-in-focus image.

Repository Structure

Core Application Files

  • app.py: Main Gradio application with complete model definition and inference pipeline
  • README.md: HuggingFace Spaces configuration and demo documentation
  • requirements.txt: Python dependencies for the Gradio application
  • pyproject.toml: Additional project configuration
  • uv.lock: Dependency lock file

Assets

  • assets/: Directory containing sample images for the demo
    • lytro-01-A.jpg: Near-focus example image
    • lytro-01-B.jpg: Far-focus example image

Documentation

  • AGENTS.md: Agent interaction documentation
  • LICENSE: Project license

Application Architecture (app.py)

Model Components

The application includes the complete model definition:

  • FocalModulation: Adaptive spatial attention mechanism
  • CrossAttention: Cross-view attention between input images
  • CrossViTBlock: Cross-attention transformer blocks
  • FocalTransformerBlock: Focal modulation transformer blocks
  • PatchEmbed: Image patch embedding layer
  • FocalCrossViTHybrid: Main hybrid model architecture

Model Configuration

  • Image Size: 224×224 pixels
  • Patch Size: 16×16
  • Embedding Dimension: 768
  • CrossViT Depth: 4 blocks
  • Focal Transformer Depth: 6 blocks
  • Attention Heads: 12
  • Focal Window: 9×9
  • Focal Levels: 3

Key Functions

  • load_model(): Downloads model from HuggingFace Hub and initializes with error handling
  • get_transform(): Image preprocessing pipeline
  • denormalize(): Convert model output back to displayable format
  • fuse_images(): Main inference function for image fusion

Development Guidelines

Local Development Setup

# Clone the repository
git clone https://huggingface.co/spaces/divitmittal/hybridtransformer-mfif
cd hybridtransformer-mfif

# Install dependencies
pip install -r requirements.txt
# OR with uv
uv sync

# Run the application
python app.py
# OR with uv
uv run app.py

Model Loading Requirements

  • Downloads model checkpoint best_model.pth from HuggingFace Hub repository divitmittal/HybridTransformer-MFIF
  • Model weights are cached locally in ./model_cache directory
  • Model weights should be compatible with the defined architecture
  • Supports both regular and DataParallel model states
  • Automatic device detection (CUDA/CPU)

Image Processing Pipeline

  1. Input: PIL images (any size)
  2. Preprocessing: Resize to 224×224, normalize with ImageNet stats
  3. Inference: Forward pass through hybrid transformer
  4. Postprocessing: Denormalize and convert to PIL image
  5. Output: Fused PIL image

Gradio Interface Components

Input Components

  • near_img: Image upload for near-focus input
  • far_img: Image upload for far-focus input
  • submit_btn: Button to trigger fusion process

Output Components

  • fused_img: Display for the resulting fused image

Examples

  • Predefined example pair using sample Lytro images
  • Demonstrates expected input format and quality

Error Handling

Model Loading Errors

  • Graceful handling of HuggingFace Hub download failures
  • Device compatibility checking
  • State dictionary format validation
  • Network connectivity error handling

Input Validation

  • Checks for missing input images
  • Handles various image formats via PIL
  • Automatic error messages via Gradio interface

Runtime Errors

  • GPU memory management
  • Inference error handling
  • Graceful degradation to CPU if needed

Performance Considerations

Model Optimization

  • Model is set to evaluation mode for inference
  • No gradient computation during inference
  • Efficient tensor operations with proper device placement

Memory Management

  • Single model instance cached globally
  • Proper tensor cleanup after inference
  • Device-appropriate memory allocation

HuggingFace Spaces Configuration (README.md)

Spaces Metadata

  • Title: Hybrid Transformer for Multi-Focus Image Fusion
  • SDK: Gradio
  • App File: app.py
  • Emoji: 🖼️
  • Color Theme: Blue to green gradient

Demo Features

  • Interactive image upload interface
  • Real-time fusion processing
  • Example images for testing
  • Responsive web interface

Dependencies (requirements.txt)

Core Dependencies

  • torch: PyTorch framework for model inference
  • torchvision: Image transformations and utilities
  • gradio: Web interface framework
  • numpy: Numerical computations
  • Pillow: Image processing library
  • huggingface_hub: Download models from HuggingFace Hub

Version Management

  • Minimal version specifications for maximum compatibility
  • Focused on essential dependencies only
  • Compatible with HuggingFace Spaces environment

Usage Examples

Basic Usage

  1. Upload a near-focus image (foreground in focus)
  2. Upload a far-focus image (background in focus)
  3. Click "Fuse Images" to generate the all-in-focus result

Expected Input

  • Image pairs with complementary focus regions
  • RGB color images (any resolution, will be resized)
  • Similar scene content with different focal points

Output Quality

  • High-resolution fused images maintaining detail from both inputs
  • Optimal focus transfer from source images
  • Seamless blending without artifacts

Development Tips

Model Modifications

  • Model architecture is defined directly in app.py
  • Changes require updating the model class definitions
  • Ensure compatibility with existing checkpoint format

Interface Updates

  • Gradio interface is highly customizable
  • Can add new input/output components easily
  • Supports additional preprocessing or postprocessing steps

Deployment

  • Optimized for HuggingFace Spaces deployment
  • Automatic dependency installation
  • Zero-configuration cloud deployment

This demo provides an accessible way for users to experience the multi-focus image fusion capabilities without requiring technical setup or model training.