divitmittal's picture
docs: add README for Hybrid Transformer MFIF
5f49440
|
raw
history blame
4.93 kB
metadata
title: Hybrid Transformer for Multi-Focus Image Fusion
emoji: πŸ–ΌοΈ
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
pinned: true

πŸ”¬ Hybrid Transformer (Focal & CrossViT) for Multi-Focus Image Fusion

HybridTransformer MFIF Logo

This interactive demo showcases a novel hybrid transformer architecture that combines Focal Transformers and CrossViT for state-of-the-art multi-focus image fusion. Upload two images with different focus areas and watch the AI intelligently merge them into a single, perfectly focused result.

πŸš€ Try the Demo

Upload your own images or use the provided examples to see the fusion in action!

🧠 How It Works

Our hybrid model combines two powerful transformer architectures:

  • 🎯 Focal Transformer: Provides adaptive spatial attention with multi-scale focal windows
  • πŸ”„ CrossViT: Enables cross-attention between near and far-focused images
  • ⚑ Hybrid Integration: Sequential processing pipeline optimized for image fusion

Model Architecture

FocalCrossViTHybrid Architecture
  • πŸ“ Input Size: 224Γ—224 pixels
  • 🧩 Patch Size: 16Γ—16
  • πŸ’Ύ Parameters: 73M+ trainable parameters
  • πŸ—οΈ Architecture: 4 CrossViT blocks + 6 Focal Transformer blocks
  • 🎯 Attention Heads: 12 multi-head attention mechanisms

πŸ“Š Training Details

The model was trained on the Lytro Multi-Focus Dataset using:

  • 🎨 Advanced Data Augmentation: Random flips, rotations, color jittering
  • πŸ“ˆ Multi-Component Loss: L1 + SSIM + Perceptual + Gradient + Focus losses
  • βš™οΈ Optimization: Adam optimizer with cosine annealing scheduler
  • 🎯 Metrics: PSNR, SSIM, VIF, QABF, and custom fusion quality measures

πŸ”— Project Resources

Platform Purpose Link
πŸ“ GitHub Source Complete source code & documentation View Repository
πŸ“Š Kaggle Training Train your own model with GPU acceleration Launch Notebook
πŸ“¦ Dataset Lytro Multi-Focus training data Download on Kaggle

πŸ› οΈ Run Locally

1. Clone the Repository

git clone https://huggingface.co/spaces/divitmittal/hybridtransformer-mfif
cd hybridtransformer-mfif

2. Install Dependencies

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
## With uv
uv sync

3. Run the Gradio App

python app.py
## With uv
uv run app.py

This will launch a local web server where you can interact with the demo.

🎯 Use Cases

This technology is perfect for:

  • πŸ“± Mobile Photography: Merge photos with different focus points
  • πŸ”¬ Scientific Imaging: Combine microscopy images with varying focal depths
  • 🏞️ Landscape Photography: Create fully focused images from multiple shots
  • πŸ“š Document Scanning: Ensure all text areas are in perfect focus
  • 🎨 Creative Photography: Artistic control over focus blending

πŸ“ˆ Performance Metrics

Our model achieves state-of-the-art results on the Lytro dataset:

  • πŸ“Š PSNR: High peak signal-to-noise ratio
  • πŸ–ΌοΈ SSIM: Excellent structural similarity preservation
  • πŸ‘οΈ VIF: Superior visual information fidelity
  • ⚑ QABF: Outstanding edge information quality
  • 🎯 Focus Transfer: Optimal focus preservation from source images

πŸ”¬ Research Applications

This implementation supports:

  • πŸ§ͺ Ablation Studies: Modular architecture for component analysis
  • πŸ“‹ Benchmarking: Comprehensive evaluation metrics
  • πŸ”„ Reproducibility: Deterministic training with detailed logging
  • βš™οΈ Customization: Flexible configuration for different experiments

πŸ“š Citation

If you use this model in your research, please cite:

@article{hybridtransformer2024,
  title={Hybrid Transformer Architecture for Multi-Focus Image Fusion},
  author={Your Name},
  journal={Conference/Journal Name},
  year={2024}
}

🀝 Contributing

Interested in improving the model? Check out our GitHub repository for:

  • πŸ› Bug reports and feature requests
  • πŸ’‘ Architecture improvements
  • πŸ“Š New evaluation metrics
  • πŸ”§ Performance optimizations

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.