Image Classification
Birder
PyTorch
hassonofer commited on
Commit
d9caae9
·
verified ·
1 Parent(s): 74ea3e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +172 -3
README.md CHANGED
@@ -1,3 +1,172 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - birder
5
+ - pytorch
6
+ library_name: birder
7
+ license: apache-2.0
8
+ datasets:
9
+ - bioscan-ml/BIOSCAN-5M
10
+ ---
11
+
12
+ # Model Card for rdnet_t_ibot-bioscan5m
13
+
14
+ A RDNet tiny image encoder pre-trained using iBOT.
15
+
16
+ The model is primarily a feature extractor. Separately trained linear probing classification heads for various taxonomic levels (order, family, genus, species) are available for classification tasks.
17
+
18
+ ## Model Details
19
+
20
+ - **Model Type:** Image classification and detection backbone
21
+ - **Model Stats:**
22
+ - Params (M): 22.8
23
+ - Input image size: 224 x 224
24
+ - **Dataset:** BIOSCAN-5M (pretrain split)
25
+
26
+ - **Papers:**
27
+ - DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs: <https://arxiv.org/abs/2403.19588>
28
+ - iBOT: Image BERT Pre-Training with Online Tokenizer: <https://arxiv.org/abs/2111.07832>
29
+
30
+ ## Linear Probing Results
31
+
32
+ The following table shows the Top-1 Accuracy (%) achieved by training a linear classification head on top of the frozen `rdnet_t_ibot-bioscan5m` encoder.
33
+ The linear probing was conducted using 289,203 samples for all taxonomic levels, and the model was evaluated on the validation (14,757 samples) and test (39,373 samples) splits of the BIOSCAN-5M dataset.
34
+
35
+ | Taxonomic Level | Classes (N) | Val Top-1 Acc. (%) | Test Top-1 Acc. (%) |
36
+ |-----------------|-------------|--------------------|---------------------|
37
+ | Order | 42 | 99.36 | 99.01 |
38
+ | Family | 606 | 95.79 | 92.89 |
39
+ | Genus | 4930 | 88.09 | 78.51 |
40
+ | Species | 11846 | 79.74 | 65.26 |
41
+
42
+ ## Unsupervised Evaluation (Adjusted Mutual Information)
43
+
44
+ The quality of the image embeddings was also evaluated intrinsically using Adjusted Mutual Information (AMI) following the setup of Lowe et al., 2024a ([An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders](https://arxiv.org/abs/2406.02465)):
45
+
46
+ 1. Extract embeddings from the pretrained encoder.
47
+ 1. Reduce dimensionality to 50 with [UMAP](https://arxiv.org/abs/1802.03426) (McInnes et al., 2018).
48
+ 1. Cluster reduced embeddings using Agglomerative Clustering (Ward's method).
49
+ 1. Compare against ground-truth taxonomic labels using AMI (Vinh et al., 2010).
50
+
51
+ The AMI score reflects how well the learned representations align with ground-truth taxonomy in an unsupervised setting.
52
+
53
+ | Taxonomic Level | AMI Score (%) |
54
+ |-----------------|---------------|
55
+ | Genus | 39.14 |
56
+ | Species | 26.91 |
57
+
58
+ ## Model Usage
59
+
60
+ ### Image Classification (with Linear Probing Head)
61
+
62
+ To use the model for classification, you must load the encoder and then load a specific pre-trained classification head for the desired taxonomic level. Heads are available for `order`, `family`, `genus`, and `species`.
63
+
64
+ ```python
65
+ import torch
66
+ import birder
67
+ from birder.inference.classification import infer_image
68
+
69
+ (net, model_info) = birder.load_pretrained_model("rdnet_t_ibot-bioscan5m", inference=True)
70
+
71
+ # Load a linear probing classification head (e.g., for 'family')
72
+ head_data = torch.load("models/rdnet_t_ibot-bioscan5m-family.head.pt")
73
+
74
+ # Reset the classifier layer and load the head weights
75
+ net.reset_classifier(len(head_data["class_to_idx"]))
76
+ net.classifier.load_state_dict(head_data["state"])
77
+
78
+ # Get the image size the model was trained on
79
+ size = birder.get_size_from_signature(model_info.signature)
80
+
81
+ # Create an inference transform
82
+ transform = birder.classification_transform(size, model_info.rgb_stats)
83
+
84
+ image = "path/to/image.jpeg" # or a PIL image, must be loaded in RGB format
85
+ (out, _) = infer_image(net, image, transform)
86
+ # out is a NumPy array with shape of (1, N_CLASSES) for the chosen level, representing class probabilities.
87
+ ```
88
+
89
+ ### Image Embeddings
90
+
91
+ ```python
92
+ import birder
93
+ from birder.inference.classification import infer_image
94
+
95
+ (net, model_info) = birder.load_pretrained_model("rdnet_t_ibot-bioscan5m", inference=True)
96
+
97
+ # Get the image size the model was trained on
98
+ size = birder.get_size_from_signature(model_info.signature)
99
+
100
+ # Create an inference transform
101
+ transform = birder.classification_transform(size, model_info.rgb_stats)
102
+
103
+ image = "path/to/image.jpeg" # or a PIL image
104
+ (out, embedding) = infer_image(net, image, transform, return_embedding=True)
105
+ # embedding is a NumPy array with shape of (1, 1040)
106
+ ```
107
+
108
+ ### Detection Feature Map
109
+
110
+ ```python
111
+ from PIL import Image
112
+ import birder
113
+
114
+ (net, model_info) = birder.load_pretrained_model("rdnet_t_ibot-bioscan5m", inference=True)
115
+
116
+ # Get the image size the model was trained on
117
+ size = birder.get_size_from_signature(model_info.signature)
118
+
119
+ # Create an inference transform
120
+ transform = birder.classification_transform(size, model_info.rgb_stats)
121
+
122
+ image = Image.open("path/to/image.jpeg")
123
+ features = net.detection_features(transform(image).unsqueeze(0))
124
+ # features is a dict (stage name -> torch.Tensor)
125
+ print([(k, v.size()) for k, v in features.items()])
126
+ # Output example:
127
+ # [('stage1', torch.Size([1, 256, 56, 56])),
128
+ # ('stage2', torch.Size([1, 440, 28, 28])),
129
+ # ('stage3', torch.Size([1, 744, 14, 14])),
130
+ # ('stage4', torch.Size([1, 1040, 7, 7]))]
131
+ ```
132
+
133
+ ## Citation
134
+
135
+ ```bibtex
136
+ @misc{kim2024densenetsreloadedparadigmshift,
137
+ title={DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs},
138
+ author={Donghyun Kim and Byeongho Heo and Dongyoon Han},
139
+ year={2024},
140
+ eprint={2403.19588},
141
+ archivePrefix={arXiv},
142
+ primaryClass={cs.CV},
143
+ url={https://arxiv.org/abs/2403.19588},
144
+ }
145
+
146
+ @misc{zhou2022ibotimagebertpretraining,
147
+ title={iBOT: Image BERT Pre-Training with Online Tokenizer},
148
+ author={Jinghao Zhou and Chen Wei and Huiyu Wang and Wei Shen and Cihang Xie and Alan Yuille and Tao Kong},
149
+ year={2022},
150
+ eprint={2111.07832},
151
+ archivePrefix={arXiv},
152
+ primaryClass={cs.CV},
153
+ url={https://arxiv.org/abs/2111.07832},
154
+ }
155
+
156
+ @inproceedings{gharaee2024bioscan5m,
157
+ title={{BIOSCAN-5M}: A Multimodal Dataset for Insect Biodiversity},
158
+ booktitle={Advances in Neural Information Processing Systems},
159
+ author={Zahra Gharaee and Scott C. Lowe and ZeMing Gong and Pablo Millan Arias
160
+ and Nicholas Pellegrino and Austin T. Wang and Joakim Bruslund Haurum
161
+ and Iuliia Zarubiieva and Lila Kari and Dirk Steinke and Graham W. Taylor
162
+ and Paul Fieguth and Angel X. Chang
163
+ },
164
+ editor={A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
165
+ pages={36285--36313},
166
+ publisher={Curran Associates, Inc.},
167
+ year={2024},
168
+ volume={37},
169
+ url={https://proceedings.neurips.cc/paper_files/paper/2024/file/3fdbb472813041c9ecef04c20c2b1e5a-Paper-Datasets_and_Benchmarks_Track.pdf},
170
+ }
171
+
172
+ ```