divitmittal commited on
Commit
5f49440
Β·
1 Parent(s): 429742d

docs: add README for Hybrid Transformer MFIF

Browse files
Files changed (1) hide show
  1. README.md +134 -0
README.md ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Hybrid Transformer for Multi-Focus Image Fusion
3
+ emoji: πŸ–ΌοΈ
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ app_file: app.py
8
+ pinned: true
9
+ ---
10
+
11
+ # πŸ”¬ Hybrid Transformer (Focal & CrossViT) for Multi-Focus Image Fusion
12
+
13
+ <div align="center">
14
+ <img src="https://github.com/DivitMittal/HybridTransformer-MFIF/raw/main/assets/logo.png" alt="HybridTransformer MFIF Logo" width="400"/>
15
+ </div>
16
+
17
+ This interactive demo showcases a novel hybrid transformer architecture that combines **Focal Transformers** and **CrossViT** for state-of-the-art multi-focus image fusion. Upload two images with different focus areas and watch the AI intelligently merge them into a single, perfectly focused result.
18
+
19
+ ## πŸš€ Try the Demo
20
+
21
+ Upload your own images or use the provided examples to see the fusion in action!
22
+
23
+ ## 🧠 How It Works
24
+
25
+ Our hybrid model combines two powerful transformer architectures:
26
+
27
+ - **🎯 Focal Transformer**: Provides adaptive spatial attention with multi-scale focal windows
28
+ - **πŸ”„ CrossViT**: Enables cross-attention between near and far-focused images
29
+ - **⚑ Hybrid Integration**: Sequential processing pipeline optimized for image fusion
30
+
31
+ ### Model Architecture
32
+
33
+ <div align="center">
34
+ <img src="https://github.com/DivitMittal/HybridTransformer-MFIF/raw/main/assets/model_architecture.png" alt="FocalCrossViTHybrid Architecture" width="700"/>
35
+ </div>
36
+
37
+ - **πŸ“ Input Size**: 224Γ—224 pixels
38
+ - **🧩 Patch Size**: 16Γ—16
39
+ - **πŸ’Ύ Parameters**: 73M+ trainable parameters
40
+ - **πŸ—οΈ Architecture**: 4 CrossViT blocks + 6 Focal Transformer blocks
41
+ - **🎯 Attention Heads**: 12 multi-head attention mechanisms
42
+
43
+ ## πŸ“Š Training Details
44
+
45
+ The model was trained on the **Lytro Multi-Focus Dataset** using:
46
+ - **🎨 Advanced Data Augmentation**: Random flips, rotations, color jittering
47
+ - **πŸ“ˆ Multi-Component Loss**: L1 + SSIM + Perceptual + Gradient + Focus losses
48
+ - **βš™οΈ Optimization**: Adam optimizer with cosine annealing scheduler
49
+ - **🎯 Metrics**: PSNR, SSIM, VIF, QABF, and custom fusion quality measures
50
+
51
+ ## πŸ”— Project Resources
52
+
53
+ | Platform | Purpose | Link |
54
+ |----------|---------|------|
55
+ | πŸ“ **GitHub Source** | Complete source code & documentation | [View Repository](https://github.com/DivitMittal/HybridTransformer-MFIF) |
56
+ | πŸ“Š **Kaggle Training** | Train your own model with GPU acceleration | [Launch Notebook](https://www.kaggle.com/code/divitmittal/hybrid-transformer-mfif) |
57
+ | πŸ“¦ **Dataset** | Lytro Multi-Focus training data | [Download on Kaggle](https://www.kaggle.com/datasets/divitmittal/lytro-multi-focal-images) |
58
+
59
+ ## πŸ› οΈ Run Locally
60
+
61
+ ### 1. Clone the Repository
62
+ ```bash
63
+ git clone https://huggingface.co/spaces/divitmittal/hybridtransformer-mfif
64
+ cd hybridtransformer-mfif
65
+ ```
66
+
67
+ ### 2. Install Dependencies
68
+ ```bash
69
+ python -m venv venv
70
+ source venv/bin/activate
71
+ pip install -r requirements.txt
72
+ ## With uv
73
+ uv sync
74
+ ```
75
+
76
+ ### 3. Run the Gradio App
77
+ ```bash
78
+ python app.py
79
+ ## With uv
80
+ uv run app.py
81
+ ```
82
+
83
+ This will launch a local web server where you can interact with the demo.
84
+
85
+ ## 🎯 Use Cases
86
+
87
+ This technology is perfect for:
88
+ - **πŸ“± Mobile Photography**: Merge photos with different focus points
89
+ - **πŸ”¬ Scientific Imaging**: Combine microscopy images with varying focal depths
90
+ - **🏞️ Landscape Photography**: Create fully focused images from multiple shots
91
+ - **πŸ“š Document Scanning**: Ensure all text areas are in perfect focus
92
+ - **🎨 Creative Photography**: Artistic control over focus blending
93
+
94
+ ## πŸ“ˆ Performance Metrics
95
+
96
+ Our model achieves state-of-the-art results on the Lytro dataset:
97
+ - **πŸ“Š PSNR**: High peak signal-to-noise ratio
98
+ - **πŸ–ΌοΈ SSIM**: Excellent structural similarity preservation
99
+ - **πŸ‘οΈ VIF**: Superior visual information fidelity
100
+ - **⚑ QABF**: Outstanding edge information quality
101
+ - **🎯 Focus Transfer**: Optimal focus preservation from source images
102
+
103
+ ## πŸ”¬ Research Applications
104
+
105
+ This implementation supports:
106
+ - **πŸ§ͺ Ablation Studies**: Modular architecture for component analysis
107
+ - **πŸ“‹ Benchmarking**: Comprehensive evaluation metrics
108
+ - **πŸ”„ Reproducibility**: Deterministic training with detailed logging
109
+ - **βš™οΈ Customization**: Flexible configuration for different experiments
110
+
111
+ ## πŸ“š Citation
112
+
113
+ If you use this model in your research, please cite:
114
+
115
+ ```bibtex
116
+ @article{hybridtransformer2024,
117
+ title={Hybrid Transformer Architecture for Multi-Focus Image Fusion},
118
+ author={Your Name},
119
+ journal={Conference/Journal Name},
120
+ year={2024}
121
+ }
122
+ ```
123
+
124
+ ## 🀝 Contributing
125
+
126
+ Interested in improving the model? Check out our [GitHub repository](https://github.com/DivitMittal/HybridTransformer-MFIF) for:
127
+ - πŸ› Bug reports and feature requests
128
+ - πŸ’‘ Architecture improvements
129
+ - πŸ“Š New evaluation metrics
130
+ - πŸ”§ Performance optimizations
131
+
132
+ ## πŸ“„ License
133
+
134
+ This project is licensed under the MIT License - see the [LICENSE](https://github.com/DivitMittal/HybridTransformer-MFIF/blob/main/LICENSE) file for details.