title: Hybrid Transformer for Multi-Focus Image Fusion
emoji: πΌοΈ
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
pinned: true
π¬ Hybrid Transformer (Focal & CrossViT) for Multi-Focus Image Fusion

This interactive demo showcases a novel hybrid transformer architecture that combines Focal Transformers and CrossViT for state-of-the-art multi-focus image fusion. Upload two images with different focus areas and watch the AI intelligently merge them into a single, perfectly focused result.
π Try the Demo
Upload your own images or use the provided examples to see the fusion in action!
π§ How It Works
Our hybrid model combines two powerful transformer architectures:
- π― Focal Transformer: Provides adaptive spatial attention with multi-scale focal windows
- π CrossViT: Enables cross-attention between near and far-focused images
- β‘ Hybrid Integration: Sequential processing pipeline optimized for image fusion
Model Architecture

- π Input Size: 224Γ224 pixels
- π§© Patch Size: 16Γ16
- πΎ Parameters: 73M+ trainable parameters
- ποΈ Architecture: 4 CrossViT blocks + 6 Focal Transformer blocks
- π― Attention Heads: 12 multi-head attention mechanisms
π Training Details
The model was trained on the Lytro Multi-Focus Dataset using:
- π¨ Advanced Data Augmentation: Random flips, rotations, color jittering
- π Multi-Component Loss: L1 + SSIM + Perceptual + Gradient + Focus losses
- βοΈ Optimization: Adam optimizer with cosine annealing scheduler
- π― Metrics: PSNR, SSIM, VIF, QABF, and custom fusion quality measures
π Project Resources
Platform | Purpose | Link |
---|---|---|
π GitHub Source | Complete source code & documentation | View Repository |
π Kaggle Training | Train your own model with GPU acceleration | Launch Notebook |
π¦ Dataset | Lytro Multi-Focus training data | Download on Kaggle |
π οΈ Run Locally
1. Clone the Repository
git clone https://huggingface.co/spaces/divitmittal/hybridtransformer-mfif
cd hybridtransformer-mfif
2. Install Dependencies
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
## With uv
uv sync
3. Run the Gradio App
python app.py
## With uv
uv run app.py
This will launch a local web server where you can interact with the demo.
π― Use Cases
This technology is perfect for:
- π± Mobile Photography: Merge photos with different focus points
- π¬ Scientific Imaging: Combine microscopy images with varying focal depths
- ποΈ Landscape Photography: Create fully focused images from multiple shots
- π Document Scanning: Ensure all text areas are in perfect focus
- π¨ Creative Photography: Artistic control over focus blending
π Performance Metrics
Our model achieves state-of-the-art results on the Lytro dataset:
- π PSNR: High peak signal-to-noise ratio
- πΌοΈ SSIM: Excellent structural similarity preservation
- ποΈ VIF: Superior visual information fidelity
- β‘ QABF: Outstanding edge information quality
- π― Focus Transfer: Optimal focus preservation from source images
π¬ Research Applications
This implementation supports:
- π§ͺ Ablation Studies: Modular architecture for component analysis
- π Benchmarking: Comprehensive evaluation metrics
- π Reproducibility: Deterministic training with detailed logging
- βοΈ Customization: Flexible configuration for different experiments
π Citation
If you use this model in your research, please cite:
@article{hybridtransformer2024,
title={Hybrid Transformer Architecture for Multi-Focus Image Fusion},
author={Your Name},
journal={Conference/Journal Name},
year={2024}
}
π€ Contributing
Interested in improving the model? Check out our GitHub repository for:
- π Bug reports and feature requests
- π‘ Architecture improvements
- π New evaluation metrics
- π§ Performance optimizations
π License
This project is licensed under the MIT License - see the LICENSE file for details.