Commit
Β·
5f49440
1
Parent(s):
429742d
docs: add README for Hybrid Transformer MFIF
Browse files
README.md
ADDED
@@ -0,0 +1,134 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
title: Hybrid Transformer for Multi-Focus Image Fusion
|
3 |
+
emoji: πΌοΈ
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: green
|
6 |
+
sdk: gradio
|
7 |
+
app_file: app.py
|
8 |
+
pinned: true
|
9 |
+
---
|
10 |
+
|
11 |
+
# π¬ Hybrid Transformer (Focal & CrossViT) for Multi-Focus Image Fusion
|
12 |
+
|
13 |
+
<div align="center">
|
14 |
+
<img src="https://github.com/DivitMittal/HybridTransformer-MFIF/raw/main/assets/logo.png" alt="HybridTransformer MFIF Logo" width="400"/>
|
15 |
+
</div>
|
16 |
+
|
17 |
+
This interactive demo showcases a novel hybrid transformer architecture that combines **Focal Transformers** and **CrossViT** for state-of-the-art multi-focus image fusion. Upload two images with different focus areas and watch the AI intelligently merge them into a single, perfectly focused result.
|
18 |
+
|
19 |
+
## π Try the Demo
|
20 |
+
|
21 |
+
Upload your own images or use the provided examples to see the fusion in action!
|
22 |
+
|
23 |
+
## π§ How It Works
|
24 |
+
|
25 |
+
Our hybrid model combines two powerful transformer architectures:
|
26 |
+
|
27 |
+
- **π― Focal Transformer**: Provides adaptive spatial attention with multi-scale focal windows
|
28 |
+
- **π CrossViT**: Enables cross-attention between near and far-focused images
|
29 |
+
- **β‘ Hybrid Integration**: Sequential processing pipeline optimized for image fusion
|
30 |
+
|
31 |
+
### Model Architecture
|
32 |
+
|
33 |
+
<div align="center">
|
34 |
+
<img src="https://github.com/DivitMittal/HybridTransformer-MFIF/raw/main/assets/model_architecture.png" alt="FocalCrossViTHybrid Architecture" width="700"/>
|
35 |
+
</div>
|
36 |
+
|
37 |
+
- **π Input Size**: 224Γ224 pixels
|
38 |
+
- **π§© Patch Size**: 16Γ16
|
39 |
+
- **πΎ Parameters**: 73M+ trainable parameters
|
40 |
+
- **ποΈ Architecture**: 4 CrossViT blocks + 6 Focal Transformer blocks
|
41 |
+
- **π― Attention Heads**: 12 multi-head attention mechanisms
|
42 |
+
|
43 |
+
## π Training Details
|
44 |
+
|
45 |
+
The model was trained on the **Lytro Multi-Focus Dataset** using:
|
46 |
+
- **π¨ Advanced Data Augmentation**: Random flips, rotations, color jittering
|
47 |
+
- **π Multi-Component Loss**: L1 + SSIM + Perceptual + Gradient + Focus losses
|
48 |
+
- **βοΈ Optimization**: Adam optimizer with cosine annealing scheduler
|
49 |
+
- **π― Metrics**: PSNR, SSIM, VIF, QABF, and custom fusion quality measures
|
50 |
+
|
51 |
+
## π Project Resources
|
52 |
+
|
53 |
+
| Platform | Purpose | Link |
|
54 |
+
|----------|---------|------|
|
55 |
+
| π **GitHub Source** | Complete source code & documentation | [View Repository](https://github.com/DivitMittal/HybridTransformer-MFIF) |
|
56 |
+
| π **Kaggle Training** | Train your own model with GPU acceleration | [Launch Notebook](https://www.kaggle.com/code/divitmittal/hybrid-transformer-mfif) |
|
57 |
+
| π¦ **Dataset** | Lytro Multi-Focus training data | [Download on Kaggle](https://www.kaggle.com/datasets/divitmittal/lytro-multi-focal-images) |
|
58 |
+
|
59 |
+
## π οΈ Run Locally
|
60 |
+
|
61 |
+
### 1. Clone the Repository
|
62 |
+
```bash
|
63 |
+
git clone https://huggingface.co/spaces/divitmittal/hybridtransformer-mfif
|
64 |
+
cd hybridtransformer-mfif
|
65 |
+
```
|
66 |
+
|
67 |
+
### 2. Install Dependencies
|
68 |
+
```bash
|
69 |
+
python -m venv venv
|
70 |
+
source venv/bin/activate
|
71 |
+
pip install -r requirements.txt
|
72 |
+
## With uv
|
73 |
+
uv sync
|
74 |
+
```
|
75 |
+
|
76 |
+
### 3. Run the Gradio App
|
77 |
+
```bash
|
78 |
+
python app.py
|
79 |
+
## With uv
|
80 |
+
uv run app.py
|
81 |
+
```
|
82 |
+
|
83 |
+
This will launch a local web server where you can interact with the demo.
|
84 |
+
|
85 |
+
## π― Use Cases
|
86 |
+
|
87 |
+
This technology is perfect for:
|
88 |
+
- **π± Mobile Photography**: Merge photos with different focus points
|
89 |
+
- **π¬ Scientific Imaging**: Combine microscopy images with varying focal depths
|
90 |
+
- **ποΈ Landscape Photography**: Create fully focused images from multiple shots
|
91 |
+
- **π Document Scanning**: Ensure all text areas are in perfect focus
|
92 |
+
- **π¨ Creative Photography**: Artistic control over focus blending
|
93 |
+
|
94 |
+
## π Performance Metrics
|
95 |
+
|
96 |
+
Our model achieves state-of-the-art results on the Lytro dataset:
|
97 |
+
- **π PSNR**: High peak signal-to-noise ratio
|
98 |
+
- **πΌοΈ SSIM**: Excellent structural similarity preservation
|
99 |
+
- **ποΈ VIF**: Superior visual information fidelity
|
100 |
+
- **β‘ QABF**: Outstanding edge information quality
|
101 |
+
- **π― Focus Transfer**: Optimal focus preservation from source images
|
102 |
+
|
103 |
+
## π¬ Research Applications
|
104 |
+
|
105 |
+
This implementation supports:
|
106 |
+
- **π§ͺ Ablation Studies**: Modular architecture for component analysis
|
107 |
+
- **π Benchmarking**: Comprehensive evaluation metrics
|
108 |
+
- **π Reproducibility**: Deterministic training with detailed logging
|
109 |
+
- **βοΈ Customization**: Flexible configuration for different experiments
|
110 |
+
|
111 |
+
## π Citation
|
112 |
+
|
113 |
+
If you use this model in your research, please cite:
|
114 |
+
|
115 |
+
```bibtex
|
116 |
+
@article{hybridtransformer2024,
|
117 |
+
title={Hybrid Transformer Architecture for Multi-Focus Image Fusion},
|
118 |
+
author={Your Name},
|
119 |
+
journal={Conference/Journal Name},
|
120 |
+
year={2024}
|
121 |
+
}
|
122 |
+
```
|
123 |
+
|
124 |
+
## π€ Contributing
|
125 |
+
|
126 |
+
Interested in improving the model? Check out our [GitHub repository](https://github.com/DivitMittal/HybridTransformer-MFIF) for:
|
127 |
+
- π Bug reports and feature requests
|
128 |
+
- π‘ Architecture improvements
|
129 |
+
- π New evaluation metrics
|
130 |
+
- π§ Performance optimizations
|
131 |
+
|
132 |
+
## π License
|
133 |
+
|
134 |
+
This project is licensed under the MIT License - see the [LICENSE](https://github.com/DivitMittal/HybridTransformer-MFIF/blob/main/LICENSE) file for details.
|