Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,115 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- image-classification
|
4 |
+
- birder
|
5 |
+
- pytorch
|
6 |
+
library_name: birder
|
7 |
+
license: apache-2.0
|
8 |
+
---
|
9 |
+
|
10 |
+
# Model Card for flexivit_reg1_s16_rms_ls_dino-v2-il-all
|
11 |
+
|
12 |
+
FlexiViT reg1 s16 RMS norm with layer scaling classification model pre-trained using DINOv2 on the `il-all` dataset and then fine-tuned on the `il-all` dataset.
|
13 |
+
|
14 |
+
The species list is derived from data available at <https://www.israbirding.com/checklist/>.
|
15 |
+
|
16 |
+
## Model Details
|
17 |
+
|
18 |
+
- **Model Type:** Image classification and detection backbone
|
19 |
+
- **Model Stats:**
|
20 |
+
- Params (M): 21.9
|
21 |
+
- Input image size: 240 x 240
|
22 |
+
- **Dataset:** il-all (550 classes)
|
23 |
+
|
24 |
+
- **Papers:**
|
25 |
+
- FlexiViT: One Model for All Patch Sizes: <https://arxiv.org/abs/2212.08013>
|
26 |
+
- DINOv2: Learning Robust Visual Features without Supervision: <https://arxiv.org/abs/2304.07193>
|
27 |
+
|
28 |
+
## Model Usage
|
29 |
+
|
30 |
+
### Image Classification
|
31 |
+
|
32 |
+
```python
|
33 |
+
import birder
|
34 |
+
from birder.inference.classification import infer_image
|
35 |
+
|
36 |
+
(net, model_info) = birder.load_pretrained_model("flexivit_reg1_s16_rms_ls_dino-v2-il-all", inference=True)
|
37 |
+
|
38 |
+
# Get the image size the model was trained on
|
39 |
+
size = birder.get_size_from_signature(model_info.signature)
|
40 |
+
|
41 |
+
# Create an inference transform
|
42 |
+
transform = birder.classification_transform(size, model_info.rgb_stats)
|
43 |
+
|
44 |
+
image = "path/to/image.jpeg" # or a PIL image, must be loaded in RGB format
|
45 |
+
(out, _) = infer_image(net, image, transform)
|
46 |
+
# out is a NumPy array with shape of (1, 550), representing class probabilities.
|
47 |
+
|
48 |
+
# Use the flexible patch size of FlexiViT
|
49 |
+
(out, _) = infer_image(net, image, transform, patch_size=24)
|
50 |
+
```
|
51 |
+
|
52 |
+
### Image Embeddings
|
53 |
+
|
54 |
+
```python
|
55 |
+
import birder
|
56 |
+
from birder.inference.classification import infer_image
|
57 |
+
|
58 |
+
(net, model_info) = birder.load_pretrained_model("flexivit_reg1_s16_rms_ls_dino-v2-il-all", inference=True)
|
59 |
+
|
60 |
+
# Get the image size the model was trained on
|
61 |
+
size = birder.get_size_from_signature(model_info.signature)
|
62 |
+
|
63 |
+
# Create an inference transform
|
64 |
+
transform = birder.classification_transform(size, model_info.rgb_stats)
|
65 |
+
|
66 |
+
image = "path/to/image.jpeg" # or a PIL image
|
67 |
+
(out, embedding) = infer_image(net, image, transform, return_embedding=True)
|
68 |
+
# embedding is a NumPy array with shape of (1, 384)
|
69 |
+
```
|
70 |
+
|
71 |
+
### Detection Feature Map
|
72 |
+
|
73 |
+
```python
|
74 |
+
from PIL import Image
|
75 |
+
import birder
|
76 |
+
|
77 |
+
(net, model_info) = birder.load_pretrained_model("flexivit_reg1_s16_rms_ls_dino-v2-il-all", inference=True)
|
78 |
+
|
79 |
+
# Get the image size the model was trained on
|
80 |
+
size = birder.get_size_from_signature(model_info.signature)
|
81 |
+
|
82 |
+
# Create an inference transform
|
83 |
+
transform = birder.classification_transform(size, model_info.rgb_stats)
|
84 |
+
|
85 |
+
image = Image.open("path/to/image.jpeg")
|
86 |
+
features = net.detection_features(transform(image).unsqueeze(0))
|
87 |
+
# features is a dict (stage name -> torch.Tensor)
|
88 |
+
print([(k, v.size()) for k, v in features.items()])
|
89 |
+
# Output example:
|
90 |
+
# [('neck', torch.Size([1, 384, 15, 15]))]
|
91 |
+
```
|
92 |
+
|
93 |
+
## Citation
|
94 |
+
|
95 |
+
```bibtex
|
96 |
+
@misc{beyer2023flexivitmodelpatchsizes,
|
97 |
+
title={FlexiViT: One Model for All Patch Sizes},
|
98 |
+
author={Lucas Beyer and Pavel Izmailov and Alexander Kolesnikov and Mathilde Caron and Simon Kornblith and Xiaohua Zhai and Matthias Minderer and Michael Tschannen and Ibrahim Alabdulmohsin and Filip Pavetic},
|
99 |
+
year={2023},
|
100 |
+
eprint={2212.08013},
|
101 |
+
archivePrefix={arXiv},
|
102 |
+
primaryClass={cs.CV},
|
103 |
+
url={https://arxiv.org/abs/2212.08013},
|
104 |
+
}
|
105 |
+
|
106 |
+
@misc{oquab2024dinov2learningrobustvisual,
|
107 |
+
title={DINOv2: Learning Robust Visual Features without Supervision},
|
108 |
+
author={Maxime Oquab and Timothée Darcet and Théo Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and Hervé Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
|
109 |
+
year={2024},
|
110 |
+
eprint={2304.07193},
|
111 |
+
archivePrefix={arXiv},
|
112 |
+
primaryClass={cs.CV},
|
113 |
+
url={https://arxiv.org/abs/2304.07193},
|
114 |
+
}
|
115 |
+
```
|