Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.44.1
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with this HuggingFace Spaces demo repository.
Repository Overview
This is the HuggingFace Spaces repository for HybridTransformer-MFIF, providing an interactive Gradio-based web demo for the multi-focus image fusion model. Users can upload near-focus and far-focus images to see the hybrid transformer model fuse them into a single all-in-focus image.
Repository Structure
Core Application Files
app.py
: Main Gradio application with complete model definition and inference pipelineREADME.md
: HuggingFace Spaces configuration and demo documentationrequirements.txt
: Python dependencies for the Gradio applicationpyproject.toml
: Additional project configurationuv.lock
: Dependency lock file
Assets
assets/
: Directory containing sample images for the demolytro-01-A.jpg
: Near-focus example imagelytro-01-B.jpg
: Far-focus example image
Documentation
AGENTS.md
: Agent interaction documentationLICENSE
: Project license
Application Architecture (app.py)
Model Components
The application includes the complete model definition:
- FocalModulation: Adaptive spatial attention mechanism
- CrossAttention: Cross-view attention between input images
- CrossViTBlock: Cross-attention transformer blocks
- FocalTransformerBlock: Focal modulation transformer blocks
- PatchEmbed: Image patch embedding layer
- FocalCrossViTHybrid: Main hybrid model architecture
Model Configuration
- Image Size: 224×224 pixels
- Patch Size: 16×16
- Embedding Dimension: 768
- CrossViT Depth: 4 blocks
- Focal Transformer Depth: 6 blocks
- Attention Heads: 12
- Focal Window: 9×9
- Focal Levels: 3
Key Functions
load_model()
: Downloads model from HuggingFace Hub and initializes with error handlingget_transform()
: Image preprocessing pipelinedenormalize()
: Convert model output back to displayable formatfuse_images()
: Main inference function for image fusion
Development Guidelines
Local Development Setup
# Clone the repository
git clone https://huggingface.co/spaces/divitmittal/hybridtransformer-mfif
cd hybridtransformer-mfif
# Install dependencies
pip install -r requirements.txt
# OR with uv
uv sync
# Run the application
python app.py
# OR with uv
uv run app.py
Model Loading Requirements
- Downloads model checkpoint
best_model.pth
from HuggingFace Hub repositorydivitmittal/HybridTransformer-MFIF
- Model weights are cached locally in
./model_cache
directory - Model weights should be compatible with the defined architecture
- Supports both regular and DataParallel model states
- Automatic device detection (CUDA/CPU)
Image Processing Pipeline
- Input: PIL images (any size)
- Preprocessing: Resize to 224×224, normalize with ImageNet stats
- Inference: Forward pass through hybrid transformer
- Postprocessing: Denormalize and convert to PIL image
- Output: Fused PIL image
Gradio Interface Components
Input Components
near_img
: Image upload for near-focus inputfar_img
: Image upload for far-focus inputsubmit_btn
: Button to trigger fusion process
Output Components
fused_img
: Display for the resulting fused image
Examples
- Predefined example pair using sample Lytro images
- Demonstrates expected input format and quality
Error Handling
Model Loading Errors
- Graceful handling of HuggingFace Hub download failures
- Device compatibility checking
- State dictionary format validation
- Network connectivity error handling
Input Validation
- Checks for missing input images
- Handles various image formats via PIL
- Automatic error messages via Gradio interface
Runtime Errors
- GPU memory management
- Inference error handling
- Graceful degradation to CPU if needed
Performance Considerations
Model Optimization
- Model is set to evaluation mode for inference
- No gradient computation during inference
- Efficient tensor operations with proper device placement
Memory Management
- Single model instance cached globally
- Proper tensor cleanup after inference
- Device-appropriate memory allocation
HuggingFace Spaces Configuration (README.md)
Spaces Metadata
- Title: Hybrid Transformer for Multi-Focus Image Fusion
- SDK: Gradio
- App File: app.py
- Emoji: 🖼️
- Color Theme: Blue to green gradient
Demo Features
- Interactive image upload interface
- Real-time fusion processing
- Example images for testing
- Responsive web interface
Dependencies (requirements.txt)
Core Dependencies
torch
: PyTorch framework for model inferencetorchvision
: Image transformations and utilitiesgradio
: Web interface frameworknumpy
: Numerical computationsPillow
: Image processing libraryhuggingface_hub
: Download models from HuggingFace Hub
Version Management
- Minimal version specifications for maximum compatibility
- Focused on essential dependencies only
- Compatible with HuggingFace Spaces environment
Usage Examples
Basic Usage
- Upload a near-focus image (foreground in focus)
- Upload a far-focus image (background in focus)
- Click "Fuse Images" to generate the all-in-focus result
Expected Input
- Image pairs with complementary focus regions
- RGB color images (any resolution, will be resized)
- Similar scene content with different focal points
Output Quality
- High-resolution fused images maintaining detail from both inputs
- Optimal focus transfer from source images
- Seamless blending without artifacts
Development Tips
Model Modifications
- Model architecture is defined directly in
app.py
- Changes require updating the model class definitions
- Ensure compatibility with existing checkpoint format
Interface Updates
- Gradio interface is highly customizable
- Can add new input/output components easily
- Supports additional preprocessing or postprocessing steps
Deployment
- Optimized for HuggingFace Spaces deployment
- Automatic dependency installation
- Zero-configuration cloud deployment
This demo provides an accessible way for users to experience the multi-focus image fusion capabilities without requiring technical setup or model training.