A newer version of the Gradio SDK is available:
5.47.0
title: DINOv3 Web/Sat Interactive Similarity
emoji: 🦖
colorFrom: yellow
colorTo: gray
sdk: gradio
sdk_version: 5.43.1
app_file: app.py
pinned: false
license: mit
short_description: Visualize image patch similarity like in DINOv3 presentation
DINOv3 Patch Similarity Viewer Github Repo
Note: This README and repository are for educational purposes. The creation of this repo was inspired by the DINOv3 paper to help visualize and understand the output of the model.
Purpose
This repository provides interactive tools to visualize and explore patch-wise similarity in images using the DINOv3 vision transformer model. It is designed for researchers, students, and practitioners interested in understanding how self-supervised vision transformers perceive and relate different regions of an image.
About DINOv3
- Paper: DINOv3: Self-supervised Vision Transformers with Enormous Teacher Models
- Meta Research Page: Meta DINOv3 Publication
- Official GitHub: facebookresearch/dinov3
Note:
The DINOv3 model weights require access approval.
You can request access via the Meta Research page or by selecting the desired model on Hugging Face model collection.
Features
- Interactive Visualization: Click on image patches or use arrow keys to explore patch similarity heatmaps.
- Single or Two-Image Mode: If one image is specified, shows self-similarity. If two images are specified, shows both self-similarity and cross-image similarity overlays interactively.
- Image Preprocessing: Loads and pads images without resizing, preserving the original aspect ratio.
- Cosine Similarity Calculation: Computes and visualizes cosine similarity between image patches.
- Robust Fallback: If an image URL fails to load, a default image is used.
Installation
Install dependencies with:
pip install -r requirements.txt
Model Selection
You can choose from several DINOv3 models available on Hugging Face (click to view each model card):
LVD-1689M Dataset (Web data)
ViT
ConvNeXt
SAT-493M Dataset (Satellite data)
Usage
Gradio app
Run the Gradio app:
python app.py
After runnig the app, go to http://localhost:7860/ to see the app running.
Then:
- Choose Dataset and model name
- For Single image similarity:
- Choose only one file or URL
- For 2 image similarity:
- Choose images from file and/or URL
- Click button "Initialize / Update "
- Select the desired patch from the image
- Watch the results
Note: Overlay alpha is the intensity of the overlay of patches on top of image
Python Script
Run the interactive viewer with the default COCO image:
python DINOv3CosSimilarity.py
Single Image Mode
Specify your own image (local path or URL):
python DINOv3CosSimilarity.py --image path/to/your/image.jpg
python DINOv3CosSimilarity.py --image https://yourdomain.com/image.png
Two Image Mode
Specify two images (local paths or URLs):
python DINOv3CosSimilarity.py --image1 path/to/image1.jpg --image2 path/to/image2.jpg
python DINOv3CosSimilarity.py --image1 https://yourdomain.com/image1.png --image2 https://yourdomain.com/image2.png
Model Selection
Specify the model with --model
(default is vits16):
python DINOv3CosSimilarity.py --model facebook/dinov3-vitb16-pretrain-lvd1689m
Other Options
--show_grid
: Draw patch grid--annotate_indices
: Write patch indices on cells--overlay_alpha <float>
: Set heatmap alpha (default 0.55)--patch_size <int>
: Override patch size (default: model's patch size)
Controls
- Mouse click to select a patch
- Arrow keys to move selection
- '1', '2', or 't' to switch active image (in two-image mode)
- 'q' to quit
Demo Single Image
Demo 2 Images
Jupyter Notebook
- Open
PatchCosSimilarity.ipynb
in Jupyter Notebook. - Run the cells to load an image and visualize patch similarities.
- Set
url1
for single-image mode, or bothurl1
andurl2
for two-image mode. - If an image fails to load, a default image will be used automatically.
- Set the
model_id
variable to any of the models listed above (see commented lines at the top of the notebook).
Notebook Controls:
- Mouse click to select a patch
- Arrow keys to move selection
- '1', '2', or 't' to switch active image (in two-image mode)
License
This project is licensed under the MIT License. See the LICENSE
file for details.
Acknowledgments
This project utilizes the DINOv3 model from Hugging Face's Transformers library, along with PyTorch, Matplotlib, and Pillow