metadata

title: Hybrid Transformer for Multi-Focus Image Fusion
emoji: 🖼️
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
pinned: true

🔬 Hybrid Transformer (Focal & CrossViT) for Multi-Focus Image Fusion

This interactive demo showcases a novel hybrid transformer architecture that combines Focal Transformers and CrossViT for state-of-the-art multi-focus image fusion. Upload two images with different focus areas and watch the AI intelligently merge them into a single, perfectly focused result.

🚀 Try the Demo

Upload your own images or use the provided examples to see the fusion in action!

🧠 How It Works

Our hybrid model combines two powerful transformer architectures:

🎯 Focal Transformer: Provides adaptive spatial attention with multi-scale focal windows
🔄 CrossViT: Enables cross-attention between near and far-focused images
⚡ Hybrid Integration: Sequential processing pipeline optimized for image fusion

Model Architecture

📐 Input Size: 224×224 pixels
🧩 Patch Size: 16×16
💾 Parameters: 73M+ trainable parameters
🏗️ Architecture: 4 CrossViT blocks + 6 Focal Transformer blocks
🎯 Attention Heads: 12 multi-head attention mechanisms

📊 Training Details

The model was trained on the Lytro Multi-Focus Dataset using:

🎨 Advanced Data Augmentation: Random flips, rotations, color jittering
📈 Multi-Component Loss: L1 + SSIM + Perceptual + Gradient + Focus losses
⚙️ Optimization: Adam optimizer with cosine annealing scheduler
🎯 Metrics: PSNR, SSIM, VIF, QABF, and custom fusion quality measures

🔗 Project Resources

Platform	Purpose	Link
📁 GitHub Source	Complete source code & documentation	View Repository
📊 Kaggle Training	Train your own model with GPU acceleration	Launch Notebook
📦 Dataset	Lytro Multi-Focus training data	Download on Kaggle

🛠️ Run Locally

1. Clone the Repository

git clone https://huggingface.co/spaces/divitmittal/hybridtransformer-mfif
cd hybridtransformer-mfif

2. Install Dependencies

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
## With uv
uv sync

3. Run the Gradio App

python app.py
## With uv
uv run app.py

This will launch a local web server where you can interact with the demo.

🎯 Use Cases

This technology is perfect for:

📱 Mobile Photography: Merge photos with different focus points
🔬 Scientific Imaging: Combine microscopy images with varying focal depths
🏞️ Landscape Photography: Create fully focused images from multiple shots
📚 Document Scanning: Ensure all text areas are in perfect focus
🎨 Creative Photography: Artistic control over focus blending

📈 Performance Metrics

Our model achieves state-of-the-art results on the Lytro dataset:

📊 PSNR: High peak signal-to-noise ratio
🖼️ SSIM: Excellent structural similarity preservation
👁️ VIF: Superior visual information fidelity
⚡ QABF: Outstanding edge information quality
🎯 Focus Transfer: Optimal focus preservation from source images

🔬 Research Applications

This implementation supports:

🧪 Ablation Studies: Modular architecture for component analysis
📋 Benchmarking: Comprehensive evaluation metrics
🔄 Reproducibility: Deterministic training with detailed logging
⚙️ Customization: Flexible configuration for different experiments

📚 Citation

If you use this model in your research, please cite:

@article{hybridtransformer2024,
  title={Hybrid Transformer Architecture for Multi-Focus Image Fusion},
  author={Your Name},
  journal={Conference/Journal Name},
  year={2024}
}

🤝 Contributing

Interested in improving the model? Check out our GitHub repository for:

🐛 Bug reports and feature requests
💡 Architecture improvements
📊 New evaluation metrics
🔧 Performance optimizations

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.