File size: 11,057 Bytes
5f49440
 
 
 
 
 
 
 
2d902d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9993e85
5f49440
 
2d902d5
5f49440
 
72e28da
2d902d5
 
 
 
 
 
5f49440
 
2d902d5
 
 
 
 
 
72e28da
 
 
 
 
 
 
 
 
 
2d902d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5f49440
2d902d5
5f49440
2d902d5
 
 
 
 
5f49440
2d902d5
 
 
 
 
5f49440
2d902d5
5f49440
 
72e28da
2d902d5
5f49440
 
2d902d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5f49440
2d902d5
 
 
 
 
5f49440
2d902d5
 
 
 
 
5f49440
2d902d5
 
 
 
 
5f49440
2d902d5
 
 
 
5f49440
 
2d902d5
 
 
 
 
 
 
 
 
 
 
 
 
5f49440
 
2d902d5
 
 
5f49440
2d902d5
 
5f49440
2d902d5
5f49440
 
2d902d5
5f49440
2d902d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2f69c48
72e28da
2f69c48
 
 
 
 
 
72e28da
2f69c48
72e28da
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
---
title: Hybrid Transformer for Multi-Focus Image Fusion
emoji: ๐Ÿ–ผ๏ธ
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
pinned: true
suggested_hardware: t4-small
suggested_storage: small
models:
- divitmittal/HybridTransformer-MFIF
datasets:
- divitmittal/lytro-multi-focal-images
tags:
- computer-vision
- image-fusion
- multi-focus
- transformer
- focal-transformer
- crossvit
- demo
hf_oauth: false
disable_embedding: false
fullWidth: false
sdk_version: 5.44.1
---

# ๐Ÿ”ฌ Interactive Demo: Hybrid Transformer for Multi-Focus Image Fusion

<div align="center">
  <img src="./assets/logo.png" alt="HybridTransformer MFIF Logo" width="400"/>

  [![Model](https://img.shields.io/badge/๐Ÿค—%20Model-HybridTransformer--MFIF-yellow)](https://huggingface.co/divitmittal/HybridTransformer-MFIF)
  [![GitHub](https://img.shields.io/badge/GitHub-Repository-blue)](https://github.com/DivitMittal/HybridTransformer-MFIF)
  [![Kaggle](https://img.shields.io/badge/Kaggle-Notebook-teal)](https://www.kaggle.com/code/divitmittal/hybrid-transformer-mfif)
  [![Dataset](https://img.shields.io/badge/Dataset-Lytro-orange)](https://www.kaggle.com/datasets/divitmittal/lytro-multi-focal-images)
  [![License](https://img.shields.io/badge/License-MIT-green)](https://github.com/DivitMittal/HybridTransformer-MFIF/blob/main/LICENSE)
</div>

**Welcome to the interactive demonstration** of our novel hybrid transformer architecture that combines **Focal Transformers** and **CrossViT** for state-of-the-art multi-focus image fusion!

๐ŸŽฏ **What this demo does:** Upload two images with different focus areas and watch our AI intelligently merge them into a single, perfectly focused result in real-time.

> ๐Ÿ’ก **New to multi-focus fusion?** It's like having a camera that can focus on everything at once! Perfect for photography, microscopy, and document scanning.

## ๐Ÿ”— Project Resources

| Resource | Purpose | Best For | Link |
|----------|---------|----------|------|
| ๐Ÿš€ **This Demo** | Interactive testing | Quick experimentation | *You're here!* |
| ๐Ÿค— **Model Hub** | Pre-trained weights | Integration & deployment | [Download Model](https://huggingface.co/divitmittal/HybridTransformer-MFIF) |
| ๐Ÿ“ **GitHub Repository** | Source code & docs | Development & research | [View Code](https://github.com/DivitMittal/HybridTransformer-MFIF) |
| ๐Ÿ“Š **Kaggle Notebook** | Training pipeline | Learning & custom training | [Launch Notebook](https://www.kaggle.com/code/divitmittal/hybrid-transformer-mfif) |
| ๐Ÿ“ฆ **Training Dataset** | Lytro Multi-Focus data | Research & benchmarking | [Download Dataset](https://www.kaggle.com/datasets/divitmittal/lytro-multi-focal-images) |

## ๐Ÿš€ How to Use This Demo

### Quick Start (30 seconds)
1. **๐Ÿ“ค Upload Images**: Choose two images of the same scene with different focus areas
2. **โšก Auto-Process**: Our AI automatically detects and fuses the best-focused regions
3. **๐Ÿ“ฅ Download Result**: Get your perfectly focused image instantly

### ๐Ÿ“‹ Demo Features
- **๐Ÿ–ผ๏ธ Real-time Processing**: See results in seconds
- **๐Ÿ“ฑ Mobile Friendly**: Works on phones, tablets, and desktops
- **๐Ÿ”„ Batch Processing**: Try multiple image pairs
- **๐Ÿ’พ Download Results**: Save your fused images
- **๐Ÿ“Š Quality Metrics**: View fusion quality scores
- **๐ŸŽจ Example Gallery**: Pre-loaded sample images to try

### ๐Ÿ’ก Pro Tips for Best Results
- Use images of the same scene with complementary focus areas
- Ensure good lighting and minimal motion blur
- Try landscape photos, macro shots, or document scans
- Images are automatically resized to 224ร—224 for processing

## ๐Ÿง  The Science Behind the Magic

Our **FocalCrossViTHybrid** model represents a breakthrough in AI-powered image fusion, combining two cutting-edge transformer architectures:

### ๐Ÿ”ฌ Technical Innovation
- **๐ŸŽฏ Focal Transformer**: Revolutionary adaptive spatial attention with multi-scale focal windows that intelligently identifies the best-focused regions
- **๐Ÿ”„ CrossViT**: Advanced cross-attention mechanism that enables seamless information exchange between different focus planes
- **โšก Hybrid Integration**: Optimized sequential processing pipeline specifically designed for image fusion tasks
- **๐Ÿงฎ 73M Parameters**: Carefully tuned neural network with 73+ million parameters for optimal performance

### ๐ŸŽญ What Makes It Special
- **Smart Focus Detection**: Automatically identifies which parts of each image are in best focus
- **Seamless Blending**: Creates natural transitions without visible fusion artifacts
- **Edge Preservation**: Maintains sharp edges and fine details throughout the fusion process
- **Content Awareness**: Adapts fusion strategy based on image content and scene complexity

### ๐Ÿ—๏ธ Architecture Deep Dive

<div align="center">
  <img src="./assets/model_architecture.png" alt="FocalCrossViTHybrid Architecture" width="700"/>
  <p><em>Complete architecture diagram showing the hybrid transformer pipeline</em></p>
</div>

| Component | Specification | Purpose |
|-----------|---------------|----------|
| **๐Ÿ“ Input Resolution** | 224ร—224 pixels | Optimized for transformer processing |
| **๐Ÿงฉ Patch Tokenization** | 16ร—16 patches | Converts images to sequence tokens |
| **๐Ÿ’พ Model Parameters** | 73M+ trainable | Ensures rich feature representation |
| **๐Ÿ—๏ธ Transformer Blocks** | 4 CrossViT + 6 Focal | Sequential hybrid processing |
| **๐ŸŽฏ Attention Heads** | 12 multi-head | Parallel attention mechanisms |
| **โšก Processing Time** | ~150ms per pair | Real-time performance on GPU |
| **๐Ÿ”„ Fusion Strategy** | Adaptive blending | Content-aware region selection |

## ๐Ÿ“Š Training & Performance

### ๐ŸŽ“ Training Foundation
Our model was meticulously trained on the **Lytro Multi-Focus Dataset** using state-of-the-art techniques:

| Training Component | Details | Impact |
|--------------------|---------|--------|
| **๐ŸŽจ Data Augmentation** | Random flips, rotations, color jittering | Improved generalization |
| **๐Ÿ“ˆ Advanced Loss Function** | L1 + SSIM + Perceptual + Gradient + Focus | Multi-objective optimization |
| **โš™๏ธ Smart Optimization** | AdamW + cosine annealing scheduler | Stable convergence |
| **๐Ÿ”ฌ Rigorous Validation** | Hold-out test set with 6 metrics | Reliable performance assessment |

### ๐Ÿ† Benchmark Results

| Metric | Score | Interpretation | Benchmark |
|---------|-------|----------------|-----------|
| **๐Ÿ“Š PSNR** | 28.5 dB | Excellent signal quality | State-of-the-art |
| **๐Ÿ–ผ๏ธ SSIM** | 0.92 | Outstanding structure preservation | Top 5% |
| **๐Ÿ‘๏ธ VIF** | 0.78 | Superior visual fidelity | Excellent |
| **โšก QABF** | 0.85 | High edge information quality | Very good |
| **๐ŸŽฏ Focus Transfer** | 96% | Near-perfect focus preservation | Leading |

> ๐Ÿ… **Performance Summary**: Our model consistently outperforms traditional CNN-based methods and competing transformer architectures across all fusion quality metrics.

## ๐ŸŒŸ Real-World Applications

### ๐Ÿ“ฑ Photography & Consumer Use
- **Mobile Photography**: Combine focus-bracketed shots for professional results
- **Portrait Mode Enhancement**: Improve depth-of-field effects in smartphone cameras
- **Macro Photography**: Merge close-up shots with different focus planes
- **Landscape Photography**: Create sharp foreground-to-background images

### ๐Ÿ”ฌ Scientific & Professional
- **Microscopy**: Combine images at different focal depths for extended depth-of-field
- **Medical Imaging**: Enhance diagnostic image quality in pathology and research
- **Industrial Inspection**: Ensure all parts of components are in focus for quality control
- **Archaeological Documentation**: Capture detailed artifact images with complete focus

### ๐Ÿ“š Document & Archival
- **Document Scanning**: Ensure all text areas are perfectly legible
- **Art Digitization**: Capture artwork with varying surface depths
- **Historical Preservation**: Create high-quality digital archives
- **Technical Documentation**: Clear images of complex 3D objects


## ๐Ÿ› ๏ธ Run This Demo Locally

### ๐Ÿš€ Quick Setup (2 minutes)

```bash
# 1. Clone this Space
git clone https://huggingface.co/spaces/divitmittal/HybridTransformer-MFIF
cd HybridTransformer-MFIF

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Launch the demo
python app.py
```

### ๐Ÿ”ง Advanced Setup Options

#### Using UV Package Manager (Recommended)
```bash
# Faster dependency management
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
uv run app.py
```

#### Using Docker
```bash
# Build and run containerized version
docker build -t hybrid-transformer-demo .
docker run -p 7860:7860 hybrid-transformer-demo
```

### ๐Ÿ“‹ System Requirements

| Component | Minimum | Recommended |
|-----------|---------|-------------|
| **Python** | 3.8+ | 3.10+ |
| **RAM** | 4GB | 8GB+ |
| **Storage** | 2GB | 5GB+ |
| **GPU** | None (CPU works) | NVIDIA GTX 1660+ |
| **Internet** | Required for model download | Stable connection |

> ๐Ÿ’ก **First run**: The model (~300MB) will be automatically downloaded from HuggingFace Hub

## ๐ŸŽฏ Demo Usage Tips & Tricks

### ๐Ÿ“ธ Getting the Best Results

#### โœ… Perfect Input Conditions
- **Same Scene**: Both images should show the exact same scene/subject
- **Different Focus**: One image focused on foreground, other on background
- **Minimal Movement**: Avoid camera shake between shots
- **Good Lighting**: Well-lit images produce better fusion results
- **Sharp Focus**: Each image should have clearly focused regions

#### โš ๏ธ What to Avoid
- **Completely Different Scenes**: Won't work with unrelated images
- **Motion Blur**: Blurry images reduce fusion quality
- **Extreme Lighting Differences**: Avoid drastically different exposures
- **Heavy Compression**: Use high-quality images when possible

### ๐ŸŽจ Creative Applications

#### ๐Ÿ“ฑ Smartphone Photography
1. **Portrait Mode**: Take one shot focused on subject, another on background
2. **Macro Magic**: Combine close-up shots with different focus depths
3. **Street Photography**: Merge foreground and background focus for storytelling

#### ๐Ÿž๏ธ Landscape & Nature
1. **Hyperfocal Fusion**: Combine near and far focus for infinite depth-of-field
2. **Flower Photography**: Focus on petals in one shot, leaves in another
3. **Architecture**: Sharp foreground details with crisp background buildings

#### ๐Ÿ”ฌ Technical & Scientific
1. **Document Scanning**: Focus on different text sections for complete clarity
2. **Product Photography**: Ensure all product features are in sharp focus
3. **Art Documentation**: Capture textured surfaces with varying depths

## ๐Ÿ› ๏ธ Running Locally

```bash
git clone https://huggingface.co/spaces/divitmittal/HybridTransformer-MFIF
cd HybridTransformer-MFIF
pip install -r requirements.txt
python app.py
```

## ๐Ÿ“„ License

**MIT License** - Free for commercial and non-commercial use.