hassonofer commited on
Commit
a4ef7a6
·
verified ·
1 Parent(s): f5e8636

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -3
README.md CHANGED
@@ -1,3 +1,103 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - birder
5
+ library_name: birder
6
+ license: apache-2.0
7
+ ---
8
+
9
+ # Model Card for mvit_v2_t_il-all
10
+
11
+ MViTv2 image classification model. This model was trained on the `il-all` dataset (all the relevant bird species found in Israel inc. rarities).
12
+
13
+ The specie list is based on data from <https://www.israbirding.com/checklist/>.
14
+
15
+ ## Model Details
16
+
17
+ - **Model Type:** Image classification / detection backbone
18
+ - **Model Stats:**
19
+ - Params (M): 23.9
20
+ - Image size: 384 x 384
21
+ - **Dataset:** il-all (550 classes)
22
+
23
+ - **Papers:**
24
+ - MViTv2: Improved Multiscale Vision Transformers for Classification and Detection: <https://arxiv.org/abs/2112.01526>
25
+
26
+ ## Model Usage
27
+
28
+ ### Image Classification
29
+
30
+ ```python
31
+ import birder
32
+ from birder.inference.classification import infer_image
33
+
34
+ (net, class_to_idx, signature, rgb_stats) = birder.load_pretrained_model("mvit_v2_t_il-all", inference=True)
35
+
36
+ # Get the image size the model was trained on
37
+ size = birder.get_size_from_signature(signature)
38
+
39
+ # Create an inference transform
40
+ transform = birder.classification_transform(size, rgb_stats)
41
+
42
+ image = "path/to/image.jpeg" # or a PIL image
43
+ (out, _) = infer_image(net, image, transform)
44
+ # out is a NumPy array with shape of (1, num_classes)
45
+ ```
46
+
47
+ ### Image Embeddings
48
+
49
+ ```python
50
+ import birder
51
+ from birder.inference.classification import infer_image
52
+
53
+ (net, class_to_idx, signature, rgb_stats) = birder.load_pretrained_model("mvit_v2_t_il-all", inference=True)
54
+
55
+ # Get the image size the model was trained on
56
+ size = birder.get_size_from_signature(signature)
57
+
58
+ # Create an inference transform
59
+ transform = birder.classification_transform(size, rgb_stats)
60
+
61
+ image = "path/to/image.jpeg" # or a PIL image
62
+ (out, embedding) = infer_image(net, image, transform, return_embedding=True)
63
+ # embedding is a NumPy array with shape of (1, embedding_size)
64
+ ```
65
+
66
+ ### Detection Feature Map
67
+
68
+ ```python
69
+ from PIL import Image
70
+ import birder
71
+
72
+ (net, class_to_idx, signature, rgb_stats) = birder.load_pretrained_model("mvit_v2_t_il-all", inference=True)
73
+
74
+ # Get the image size the model was trained on
75
+ size = birder.get_size_from_signature(signature)
76
+
77
+ # Create an inference transform
78
+ transform = birder.classification_transform(size, rgb_stats)
79
+
80
+ image = Image.open("path/to/image.jpeg")
81
+ features = net.detection_features(transform(image).unsqueeze(0))
82
+ # features is a dict (stage name -> torch.Tensor)
83
+ print([(k, v.size()) for k, v in features.items()])
84
+ # Output example:
85
+ # [('stage1', torch.Size([1, 96, 96, 96])),
86
+ # ('stage2', torch.Size([1, 192, 48, 48])),
87
+ # ('stage3', torch.Size([1, 384, 24, 24])),
88
+ # ('stage4', torch.Size([1, 768, 12, 12]))]
89
+ ```
90
+
91
+ ## Citation
92
+
93
+ ```bibtex
94
+ @misc{li2022mvitv2improvedmultiscalevision,
95
+ title={MViTv2: Improved Multiscale Vision Transformers for Classification and Detection},
96
+ author={Yanghao Li and Chao-Yuan Wu and Haoqi Fan and Karttikeya Mangalam and Bo Xiong and Jitendra Malik and Christoph Feichtenhofer},
97
+ year={2022},
98
+ eprint={2112.01526},
99
+ archivePrefix={arXiv},
100
+ primaryClass={cs.CV},
101
+ url={https://arxiv.org/abs/2112.01526},
102
+ }
103
+ ```