Spaces:

divitmittal
/

hybridtransformer-mfif

Running

App Files Files Community

divitmittal commited on Aug 6

Commit

5f49440

1 Parent(s): 429742d

docs: add README for Hybrid Transformer MFIF

Browse files

Files changed (1) hide show

README.md +134 -0

README.md ADDED Viewed

	@@ -0,0 +1,134 @@

+---
+title: Hybrid Transformer for Multi-Focus Image Fusion
+emoji: 🖼️
+colorFrom: blue
+colorTo: green
+sdk: gradio
+app_file: app.py
+pinned: true
+---
+# 🔬 Hybrid Transformer (Focal & CrossViT) for Multi-Focus Image Fusion
+<div align="center">
+  <img src="https://github.com/DivitMittal/HybridTransformer-MFIF/raw/main/assets/logo.png" alt="HybridTransformer MFIF Logo" width="400"/>
+</div>
+This interactive demo showcases a novel hybrid transformer architecture that combines **Focal Transformers** and **CrossViT** for state-of-the-art multi-focus image fusion. Upload two images with different focus areas and watch the AI intelligently merge them into a single, perfectly focused result.
+## 🚀 Try the Demo
+Upload your own images or use the provided examples to see the fusion in action!
+## 🧠 How It Works
+Our hybrid model combines two powerful transformer architectures:
+- **🎯 Focal Transformer**: Provides adaptive spatial attention with multi-scale focal windows
+- **🔄 CrossViT**: Enables cross-attention between near and far-focused images
+- **⚡ Hybrid Integration**: Sequential processing pipeline optimized for image fusion
+### Model Architecture
+<div align="center">
+  <img src="https://github.com/DivitMittal/HybridTransformer-MFIF/raw/main/assets/model_architecture.png" alt="FocalCrossViTHybrid Architecture" width="700"/>
+</div>
+- **📐 Input Size**: 224×224 pixels
+- **🧩 Patch Size**: 16×16
+- **💾 Parameters**: 73M+ trainable parameters
+- **🏗️ Architecture**: 4 CrossViT blocks + 6 Focal Transformer blocks
+- **🎯 Attention Heads**: 12 multi-head attention mechanisms
+## 📊 Training Details
+The model was trained on the **Lytro Multi-Focus Dataset** using:
+- **🎨 Advanced Data Augmentation**: Random flips, rotations, color jittering
+- **📈 Multi-Component Loss**: L1 + SSIM + Perceptual + Gradient + Focus losses
+- **⚙️ Optimization**: Adam optimizer with cosine annealing scheduler
+- **🎯 Metrics**: PSNR, SSIM, VIF, QABF, and custom fusion quality measures
+## 🔗 Project Resources
+| Platform | Purpose | Link |
+|----------|---------|------|
+| 📁 **GitHub Source** | Complete source code & documentation | [View Repository](https://github.com/DivitMittal/HybridTransformer-MFIF) |
+| 📊 **Kaggle Training** | Train your own model with GPU acceleration | [Launch Notebook](https://www.kaggle.com/code/divitmittal/hybrid-transformer-mfif) |
+| 📦 **Dataset** | Lytro Multi-Focus training data | [Download on Kaggle](https://www.kaggle.com/datasets/divitmittal/lytro-multi-focal-images) |
+## 🛠️ Run Locally
+### 1. Clone the Repository
+```bash
+git clone https://huggingface.co/spaces/divitmittal/hybridtransformer-mfif
+cd hybridtransformer-mfif
+```
+### 2. Install Dependencies
+```bash
+python -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+## With uv
+uv sync
+```
+### 3. Run the Gradio App
+```bash
+python app.py
+## With uv
+uv run app.py
+```
+This will launch a local web server where you can interact with the demo.
+## 🎯 Use Cases
+This technology is perfect for:
+- **📱 Mobile Photography**: Merge photos with different focus points
+- **🔬 Scientific Imaging**: Combine microscopy images with varying focal depths
+- **🏞️ Landscape Photography**: Create fully focused images from multiple shots
+- **📚 Document Scanning**: Ensure all text areas are in perfect focus
+- **🎨 Creative Photography**: Artistic control over focus blending
+## 📈 Performance Metrics
+Our model achieves state-of-the-art results on the Lytro dataset:
+- **📊 PSNR**: High peak signal-to-noise ratio
+- **🖼️ SSIM**: Excellent structural similarity preservation
+- **👁️ VIF**: Superior visual information fidelity
+- **⚡ QABF**: Outstanding edge information quality
+- **🎯 Focus Transfer**: Optimal focus preservation from source images
+## 🔬 Research Applications
+This implementation supports:
+- **🧪 Ablation Studies**: Modular architecture for component analysis
+- **📋 Benchmarking**: Comprehensive evaluation metrics
+- **🔄 Reproducibility**: Deterministic training with detailed logging
+- **⚙️ Customization**: Flexible configuration for different experiments
+## 📚 Citation
+If you use this model in your research, please cite:
+```bibtex
+@article{hybridtransformer2024,
+  title={Hybrid Transformer Architecture for Multi-Focus Image Fusion},
+  author={Your Name},
+  journal={Conference/Journal Name},
+  year={2024}
+}
+```
+## 🤝 Contributing
+Interested in improving the model? Check out our [GitHub repository](https://github.com/DivitMittal/HybridTransformer-MFIF) for:
+- 🐛 Bug reports and feature requests
+- 💡 Architecture improvements
+- 📊 New evaluation metrics
+- 🔧 Performance optimizations
+## 📄 License
+This project is licensed under the MIT License - see the [LICENSE](https://github.com/DivitMittal/HybridTransformer-MFIF/blob/main/LICENSE) file for details.