|
--- |
|
title: Hybrid Transformer for Multi-Focus Image Fusion |
|
emoji: ๐ผ๏ธ |
|
colorFrom: blue |
|
colorTo: green |
|
sdk: gradio |
|
app_file: app.py |
|
pinned: true |
|
suggested_hardware: t4-small |
|
suggested_storage: small |
|
models: |
|
- divitmittal/HybridTransformer-MFIF |
|
datasets: |
|
- divitmittal/lytro-multi-focal-images |
|
tags: |
|
- computer-vision |
|
- image-fusion |
|
- multi-focus |
|
- transformer |
|
- focal-transformer |
|
- crossvit |
|
- demo |
|
hf_oauth: false |
|
disable_embedding: false |
|
fullWidth: false |
|
--- |
|
|
|
# ๐ฌ Interactive Demo: Hybrid Transformer for Multi-Focus Image Fusion |
|
|
|
<div align="center"> |
|
<img src="https://github.com/DivitMittal/HybridTransformer-MFIF/raw/main/assets/logo.png" alt="HybridTransformer MFIF Logo" width="400"/> |
|
|
|
[](https://huggingface.co/divitmittal/HybridTransformer-MFIF) |
|
[](https://github.com/DivitMittal/HybridTransformer-MFIF) |
|
[](https://www.kaggle.com/code/divitmittal/hybrid-transformer-mfif) |
|
[](https://www.kaggle.com/datasets/divitmittal/lytro-multi-focal-images) |
|
[](https://github.com/DivitMittal/HybridTransformer-MFIF/blob/main/LICENSE) |
|
</div> |
|
|
|
**Welcome to the interactive demonstration** of our novel hybrid transformer architecture that combines **Focal Transformers** and **CrossViT** for state-of-the-art multi-focus image fusion! |
|
|
|
๐ฏ **What this demo does:** Upload two images with different focus areas and watch our AI intelligently merge them into a single, perfectly focused result in real-time. |
|
|
|
> ๐ก **New to multi-focus fusion?** It's like having a camera that can focus on everything at once! Perfect for photography, microscopy, and document scanning. |
|
|
|
## ๐ How to Use This Demo |
|
|
|
### Quick Start (30 seconds) |
|
1. **๐ค Upload Images**: Choose two images of the same scene with different focus areas |
|
2. **โก Auto-Process**: Our AI automatically detects and fuses the best-focused regions |
|
3. **๐ฅ Download Result**: Get your perfectly focused image instantly |
|
|
|
### ๐ Demo Features |
|
- **๐ผ๏ธ Real-time Processing**: See results in seconds |
|
- **๐ฑ Mobile Friendly**: Works on phones, tablets, and desktops |
|
- **๐ Batch Processing**: Try multiple image pairs |
|
- **๐พ Download Results**: Save your fused images |
|
- **๐ Quality Metrics**: View fusion quality scores |
|
- **๐จ Example Gallery**: Pre-loaded sample images to try |
|
|
|
### ๐ก Pro Tips for Best Results |
|
- Use images of the same scene with complementary focus areas |
|
- Ensure good lighting and minimal motion blur |
|
- Try landscape photos, macro shots, or document scans |
|
- Images are automatically resized to 224ร224 for processing |
|
|
|
## ๐ง The Science Behind the Magic |
|
|
|
Our **FocalCrossViTHybrid** model represents a breakthrough in AI-powered image fusion, combining two cutting-edge transformer architectures: |
|
|
|
### ๐ฌ Technical Innovation |
|
- **๐ฏ Focal Transformer**: Revolutionary adaptive spatial attention with multi-scale focal windows that intelligently identifies the best-focused regions |
|
- **๐ CrossViT**: Advanced cross-attention mechanism that enables seamless information exchange between different focus planes |
|
- **โก Hybrid Integration**: Optimized sequential processing pipeline specifically designed for image fusion tasks |
|
- **๐งฎ 73M Parameters**: Carefully tuned neural network with 73+ million parameters for optimal performance |
|
|
|
### ๐ญ What Makes It Special |
|
- **Smart Focus Detection**: Automatically identifies which parts of each image are in best focus |
|
- **Seamless Blending**: Creates natural transitions without visible fusion artifacts |
|
- **Edge Preservation**: Maintains sharp edges and fine details throughout the fusion process |
|
- **Content Awareness**: Adapts fusion strategy based on image content and scene complexity |
|
|
|
### ๐๏ธ Architecture Deep Dive |
|
|
|
<div align="center"> |
|
<img src="https://github.com/DivitMittal/HybridTransformer-MFIF/raw/main/assets/model_architecture.png" alt="FocalCrossViTHybrid Architecture" width="700"/> |
|
<p><em>Complete architecture diagram showing the hybrid transformer pipeline</em></p> |
|
</div> |
|
|
|
| Component | Specification | Purpose | |
|
|-----------|---------------|----------| |
|
| **๐ Input Resolution** | 224ร224 pixels | Optimized for transformer processing | |
|
| **๐งฉ Patch Tokenization** | 16ร16 patches | Converts images to sequence tokens | |
|
| **๐พ Model Parameters** | 73M+ trainable | Ensures rich feature representation | |
|
| **๐๏ธ Transformer Blocks** | 4 CrossViT + 6 Focal | Sequential hybrid processing | |
|
| **๐ฏ Attention Heads** | 12 multi-head | Parallel attention mechanisms | |
|
| **โก Processing Time** | ~150ms per pair | Real-time performance on GPU | |
|
| **๐ Fusion Strategy** | Adaptive blending | Content-aware region selection | |
|
|
|
## ๐ Training & Performance |
|
|
|
### ๐ Training Foundation |
|
Our model was meticulously trained on the **Lytro Multi-Focus Dataset** using state-of-the-art techniques: |
|
|
|
| Training Component | Details | Impact | |
|
|--------------------|---------|--------| |
|
| **๐จ Data Augmentation** | Random flips, rotations, color jittering | Improved generalization | |
|
| **๐ Advanced Loss Function** | L1 + SSIM + Perceptual + Gradient + Focus | Multi-objective optimization | |
|
| **โ๏ธ Smart Optimization** | AdamW + cosine annealing scheduler | Stable convergence | |
|
| **๐ฌ Rigorous Validation** | Hold-out test set with 6 metrics | Reliable performance assessment | |
|
|
|
### ๐ Benchmark Results |
|
|
|
| Metric | Score | Interpretation | Benchmark | |
|
|---------|-------|----------------|-----------| |
|
| **๐ PSNR** | 28.5 dB | Excellent signal quality | State-of-the-art | |
|
| **๐ผ๏ธ SSIM** | 0.92 | Outstanding structure preservation | Top 5% | |
|
| **๐๏ธ VIF** | 0.78 | Superior visual fidelity | Excellent | |
|
| **โก QABF** | 0.85 | High edge information quality | Very good | |
|
| **๐ฏ Focus Transfer** | 96% | Near-perfect focus preservation | Leading | |
|
|
|
> ๐
**Performance Summary**: Our model consistently outperforms traditional CNN-based methods and competing transformer architectures across all fusion quality metrics. |
|
|
|
## ๐ Real-World Applications |
|
|
|
### ๐ฑ Photography & Consumer Use |
|
- **Mobile Photography**: Combine focus-bracketed shots for professional results |
|
- **Portrait Mode Enhancement**: Improve depth-of-field effects in smartphone cameras |
|
- **Macro Photography**: Merge close-up shots with different focus planes |
|
- **Landscape Photography**: Create sharp foreground-to-background images |
|
|
|
### ๐ฌ Scientific & Professional |
|
- **Microscopy**: Combine images at different focal depths for extended depth-of-field |
|
- **Medical Imaging**: Enhance diagnostic image quality in pathology and research |
|
- **Industrial Inspection**: Ensure all parts of components are in focus for quality control |
|
- **Archaeological Documentation**: Capture detailed artifact images with complete focus |
|
|
|
### ๐ Document & Archival |
|
- **Document Scanning**: Ensure all text areas are perfectly legible |
|
- **Art Digitization**: Capture artwork with varying surface depths |
|
- **Historical Preservation**: Create high-quality digital archives |
|
- **Technical Documentation**: Clear images of complex 3D objects |
|
|
|
## ๐ Complete Project Ecosystem |
|
|
|
| Resource | Purpose | Best For | Link | |
|
|----------|---------|----------|------| |
|
| ๐ **This Demo** | Interactive testing | Quick experimentation | *You're here!* | |
|
| ๐ค **Model Hub** | Pre-trained weights | Integration & deployment | [Download Model](https://huggingface.co/divitmittal/HybridTransformer-MFIF) | |
|
| ๐ **GitHub Repository** | Source code & docs | Development & research | [View Code](https://github.com/DivitMittal/HybridTransformer-MFIF) | |
|
| ๐ **Kaggle Notebook** | Training pipeline | Learning & custom training | [Launch Notebook](https://www.kaggle.com/code/divitmittal/hybrid-transformer-mfif) | |
|
| ๐ฆ **Training Dataset** | Lytro Multi-Focus data | Research & benchmarking | [Download Dataset](https://www.kaggle.com/datasets/divitmittal/lytro-multi-focal-images) | |
|
|
|
|
|
## ๐ ๏ธ Run This Demo Locally |
|
|
|
### ๐ Quick Setup (2 minutes) |
|
|
|
```bash |
|
# 1. Clone this Space |
|
git clone https://huggingface.co/spaces/divitmittal/HybridTransformer-MFIF |
|
cd HybridTransformer-MFIF |
|
|
|
# 2. Create virtual environment |
|
python -m venv venv |
|
source venv/bin/activate # On Windows: venv\Scripts\activate |
|
|
|
# 3. Install dependencies |
|
pip install -r requirements.txt |
|
|
|
# 4. Launch the demo |
|
python app.py |
|
``` |
|
|
|
### ๐ง Advanced Setup Options |
|
|
|
#### Using UV Package Manager (Recommended) |
|
```bash |
|
# Faster dependency management |
|
curl -LsSf https://astral.sh/uv/install.sh | sh |
|
uv sync |
|
uv run app.py |
|
``` |
|
|
|
#### Using Docker |
|
```bash |
|
# Build and run containerized version |
|
docker build -t hybrid-transformer-demo . |
|
docker run -p 7860:7860 hybrid-transformer-demo |
|
``` |
|
|
|
### ๐ System Requirements |
|
|
|
| Component | Minimum | Recommended | |
|
|-----------|---------|-------------| |
|
| **Python** | 3.8+ | 3.10+ | |
|
| **RAM** | 4GB | 8GB+ | |
|
| **Storage** | 2GB | 5GB+ | |
|
| **GPU** | None (CPU works) | NVIDIA GTX 1660+ | |
|
| **Internet** | Required for model download | Stable connection | |
|
|
|
> ๐ก **First run**: The model (~300MB) will be automatically downloaded from HuggingFace Hub |
|
|
|
## ๐ฏ Demo Usage Tips & Tricks |
|
|
|
### ๐ธ Getting the Best Results |
|
|
|
#### โ
Perfect Input Conditions |
|
- **Same Scene**: Both images should show the exact same scene/subject |
|
- **Different Focus**: One image focused on foreground, other on background |
|
- **Minimal Movement**: Avoid camera shake between shots |
|
- **Good Lighting**: Well-lit images produce better fusion results |
|
- **Sharp Focus**: Each image should have clearly focused regions |
|
|
|
#### โ ๏ธ What to Avoid |
|
- **Completely Different Scenes**: Won't work with unrelated images |
|
- **Motion Blur**: Blurry images reduce fusion quality |
|
- **Extreme Lighting Differences**: Avoid drastically different exposures |
|
- **Heavy Compression**: Use high-quality images when possible |
|
|
|
### ๐จ Creative Applications |
|
|
|
#### ๐ฑ Smartphone Photography |
|
1. **Portrait Mode**: Take one shot focused on subject, another on background |
|
2. **Macro Magic**: Combine close-up shots with different focus depths |
|
3. **Street Photography**: Merge foreground and background focus for storytelling |
|
|
|
#### ๐๏ธ Landscape & Nature |
|
1. **Hyperfocal Fusion**: Combine near and far focus for infinite depth-of-field |
|
2. **Flower Photography**: Focus on petals in one shot, leaves in another |
|
3. **Architecture**: Sharp foreground details with crisp background buildings |
|
|
|
#### ๐ฌ Technical & Scientific |
|
1. **Document Scanning**: Focus on different text sections for complete clarity |
|
2. **Product Photography**: Ensure all product features are in sharp focus |
|
3. **Art Documentation**: Capture textured surfaces with varying depths |
|
|
|
## ๐ Live Demo Performance |
|
|
|
### โก Speed & Efficiency |
|
- **Processing Time**: ~2-3 seconds per image pair (with GPU) |
|
- **CPU Fallback**: ~8-12 seconds (when GPU unavailable) |
|
- **Memory Usage**: <2GB RAM for standard operation |
|
- **Concurrent Users**: Supports multiple simultaneous users |
|
- **Auto-scaling**: Handles traffic spikes gracefully |
|
|
|
### ๐ฏ Quality Assurance |
|
- **Consistent Results**: Same inputs always produce identical outputs |
|
- **Error Handling**: Graceful handling of invalid inputs |
|
- **Format Support**: JPEG, PNG, WebP, and most common formats |
|
- **Size Limits**: Automatic resizing for optimal processing |
|
- **Quality Preservation**: Maintains maximum possible image quality |
|
|
|
### ๐ Real-time Metrics (Displayed in Demo) |
|
- **Fusion Quality Score**: Overall fusion effectiveness (0-100) |
|
- **Focus Transfer Rate**: How well focus regions are preserved (%) |
|
- **Edge Preservation**: Sharpness retention metric |
|
- **Processing Time**: Actual computation time for your images |
|
|
|
## ๐ฌ Research & Development |
|
|
|
### ๐ Academic Value |
|
- **Novel Architecture**: First implementation combining Focal Transformer + CrossViT for MFIF |
|
- **Reproducible Research**: Complete codebase with deterministic training |
|
- **Benchmark Dataset**: Standard evaluation on Lytro Multi-Focus Dataset |
|
- **Comprehensive Metrics**: 6+ evaluation metrics for thorough assessment |
|
|
|
### ๐งช Experimental Framework |
|
- **Modular Design**: Easy to modify components for ablation studies |
|
- **Hyperparameter Tuning**: Configurable architecture and training parameters |
|
- **Extension Support**: Framework for adding new transformer components |
|
- **Comparative Analysis**: Built-in tools for method comparison |
|
|
|
### ๐ Educational Resource |
|
- **Step-by-step Tutorials**: From basic concepts to advanced implementation |
|
- **Interactive Learning**: Hands-on experience with transformer architectures |
|
- **Code Documentation**: Extensively commented for educational use |
|
- **Research Integration**: Easy to incorporate into academic projects |
|
|
|
## ๐ค Community & Support |
|
|
|
### ๐ฌ Get Help |
|
- **GitHub Issues**: Report bugs or request features |
|
- **HuggingFace Discussions**: Community Q&A and tips |
|
- **Kaggle Comments**: Dataset and training discussions |
|
- **Email Support**: Direct contact for collaboration inquiries |
|
|
|
### ๐ Contributing |
|
- **Code Contributions**: Submit PRs for improvements |
|
- **Dataset Expansion**: Help grow the training data |
|
- **Documentation**: Improve guides and tutorials |
|
- **Testing**: Report issues and edge cases |
|
|
|
### ๐ท๏ธ Citation |
|
If you use this work in your research: |
|
```bibtex |
|
@software{mittal2024hybridtransformer, |
|
title={HybridTransformer-MFIF: Focal Transformer and CrossViT Hybrid for Multi-Focus Image Fusion}, |
|
author={Mittal, Divit}, |
|
year={2024}, |
|
url={https://github.com/DivitMittal/HybridTransformer-MFIF}, |
|
note={Interactive demo available at HuggingFace Spaces} |
|
} |
|
``` |
|
|
|
## ๐ License & Terms |
|
|
|
### ๐ Open Source License |
|
**MIT License** - Free for commercial and non-commercial use |
|
- โ
**Commercial Use**: Integrate into products and services |
|
- โ
**Modification**: Adapt and customize for your needs |
|
- โ
**Distribution**: Share with proper attribution |
|
- โ
**Private Use**: Use in proprietary projects |
|
|
|
### โ๏ธ Usage Terms |
|
- **Attribution Required**: Credit the original work when using |
|
- **No Warranty**: Provided "as-is" without guarantees |
|
- **Ethical Use**: Please use responsibly and ethically |
|
- **Research Friendly**: Encouraged for academic and research purposes |
|
|
|
--- |
|
|
|
<div align="center"> |
|
<h3>๐ Ready to Try Multi-Focus Image Fusion?</h3> |
|
<p><strong>Upload your images above and experience the magic of AI-powered focus fusion!</strong></p> |
|
<p>Built with โค๏ธ for the computer vision community | โญ Star us on <a href="https://github.com/DivitMittal/HybridTransformer-MFIF">GitHub</a></p> |
|
</div> |
|
|