hassonofer commited on
Commit
a0cf6de
·
verified ·
1 Parent(s): 20467ad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -3
README.md CHANGED
@@ -1,3 +1,115 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - birder
5
+ - pytorch
6
+ library_name: birder
7
+ license: apache-2.0
8
+ ---
9
+
10
+ # Model Card for flexivit_reg1_s16_rms_ls_dino-v2-il-all
11
+
12
+ FlexiViT reg1 s16 RMS norm with layer scaling classification model pre-trained using DINOv2 on the `il-all` dataset and then fine-tuned on the `il-all` dataset.
13
+
14
+ The species list is derived from data available at <https://www.israbirding.com/checklist/>.
15
+
16
+ ## Model Details
17
+
18
+ - **Model Type:** Image classification and detection backbone
19
+ - **Model Stats:**
20
+ - Params (M): 21.9
21
+ - Input image size: 240 x 240
22
+ - **Dataset:** il-all (550 classes)
23
+
24
+ - **Papers:**
25
+ - FlexiViT: One Model for All Patch Sizes: <https://arxiv.org/abs/2212.08013>
26
+ - DINOv2: Learning Robust Visual Features without Supervision: <https://arxiv.org/abs/2304.07193>
27
+
28
+ ## Model Usage
29
+
30
+ ### Image Classification
31
+
32
+ ```python
33
+ import birder
34
+ from birder.inference.classification import infer_image
35
+
36
+ (net, model_info) = birder.load_pretrained_model("flexivit_reg1_s16_rms_ls_dino-v2-il-all", inference=True)
37
+
38
+ # Get the image size the model was trained on
39
+ size = birder.get_size_from_signature(model_info.signature)
40
+
41
+ # Create an inference transform
42
+ transform = birder.classification_transform(size, model_info.rgb_stats)
43
+
44
+ image = "path/to/image.jpeg" # or a PIL image, must be loaded in RGB format
45
+ (out, _) = infer_image(net, image, transform)
46
+ # out is a NumPy array with shape of (1, 550), representing class probabilities.
47
+
48
+ # Use the flexible patch size of FlexiViT
49
+ (out, _) = infer_image(net, image, transform, patch_size=24)
50
+ ```
51
+
52
+ ### Image Embeddings
53
+
54
+ ```python
55
+ import birder
56
+ from birder.inference.classification import infer_image
57
+
58
+ (net, model_info) = birder.load_pretrained_model("flexivit_reg1_s16_rms_ls_dino-v2-il-all", inference=True)
59
+
60
+ # Get the image size the model was trained on
61
+ size = birder.get_size_from_signature(model_info.signature)
62
+
63
+ # Create an inference transform
64
+ transform = birder.classification_transform(size, model_info.rgb_stats)
65
+
66
+ image = "path/to/image.jpeg" # or a PIL image
67
+ (out, embedding) = infer_image(net, image, transform, return_embedding=True)
68
+ # embedding is a NumPy array with shape of (1, 384)
69
+ ```
70
+
71
+ ### Detection Feature Map
72
+
73
+ ```python
74
+ from PIL import Image
75
+ import birder
76
+
77
+ (net, model_info) = birder.load_pretrained_model("flexivit_reg1_s16_rms_ls_dino-v2-il-all", inference=True)
78
+
79
+ # Get the image size the model was trained on
80
+ size = birder.get_size_from_signature(model_info.signature)
81
+
82
+ # Create an inference transform
83
+ transform = birder.classification_transform(size, model_info.rgb_stats)
84
+
85
+ image = Image.open("path/to/image.jpeg")
86
+ features = net.detection_features(transform(image).unsqueeze(0))
87
+ # features is a dict (stage name -> torch.Tensor)
88
+ print([(k, v.size()) for k, v in features.items()])
89
+ # Output example:
90
+ # [('neck', torch.Size([1, 384, 15, 15]))]
91
+ ```
92
+
93
+ ## Citation
94
+
95
+ ```bibtex
96
+ @misc{beyer2023flexivitmodelpatchsizes,
97
+ title={FlexiViT: One Model for All Patch Sizes},
98
+ author={Lucas Beyer and Pavel Izmailov and Alexander Kolesnikov and Mathilde Caron and Simon Kornblith and Xiaohua Zhai and Matthias Minderer and Michael Tschannen and Ibrahim Alabdulmohsin and Filip Pavetic},
99
+ year={2023},
100
+ eprint={2212.08013},
101
+ archivePrefix={arXiv},
102
+ primaryClass={cs.CV},
103
+ url={https://arxiv.org/abs/2212.08013},
104
+ }
105
+
106
+ @misc{oquab2024dinov2learningrobustvisual,
107
+ title={DINOv2: Learning Robust Visual Features without Supervision},
108
+ author={Maxime Oquab and Timothée Darcet and Théo Moutakanni and Huy Vo and Marc Szafraniec and Vasil Khalidov and Pierre Fernandez and Daniel Haziza and Francisco Massa and Alaaeldin El-Nouby and Mahmoud Assran and Nicolas Ballas and Wojciech Galuba and Russell Howes and Po-Yao Huang and Shang-Wen Li and Ishan Misra and Michael Rabbat and Vasu Sharma and Gabriel Synnaeve and Hu Xu and Hervé Jegou and Julien Mairal and Patrick Labatut and Armand Joulin and Piotr Bojanowski},
109
+ year={2024},
110
+ eprint={2304.07193},
111
+ archivePrefix={arXiv},
112
+ primaryClass={cs.CV},
113
+ url={https://arxiv.org/abs/2304.07193},
114
+ }
115
+ ```