Update pipeline tag, add Github link and library name

#1
by nielsr HF staff - opened
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -1,15 +1,17 @@
1
  ---
 
 
2
  license: other
3
  license_name: nvclv1
4
  license_link: LICENSE
5
- datasets:
6
- - ILSVRC/imagenet-1k
7
- pipeline_tag: image-feature-extraction
8
  ---
9
 
10
-
11
  [**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**](https://arxiv.org/abs/2407.08083).
12
 
 
 
13
  ## Model Overview
14
 
15
  We have developed the first hybrid model for computer vision which leverages the strengths of Mamba and Transformers. Specifically, our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features. In addition, we conducted a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba. Our results demonstrate that equipping the Mamba architecture with several self-attention blocks at the final layers greatly improves the modeling capacity to capture long-range spatial dependencies. Based on our findings, we introduce a family of MambaVision models with a hierarchical architecture to meet various design criteria.
@@ -71,7 +73,7 @@ transform = create_transform(input_size=input_resolution,
71
  is_training=False,
72
  mean=model.config.mean,
73
  std=model.config.std,
74
- crop_mode=model.config.crop_mode,
75
  crop_pct=model.config.crop_pct)
76
 
77
  inputs = transform(image).unsqueeze(0).cuda()
@@ -112,7 +114,7 @@ transform = create_transform(input_size=input_resolution,
112
  is_training=False,
113
  mean=model.config.mean,
114
  std=model.config.std,
115
- crop_mode=model.config.crop_mode,
116
  crop_pct=model.config.crop_pct)
117
  inputs = transform(image).unsqueeze(0).cuda()
118
  # model inference
 
1
  ---
2
+ datasets:
3
+ - ILSVRC/imagenet-1k
4
  license: other
5
  license_name: nvclv1
6
  license_link: LICENSE
7
+ pipeline_tag: image-classification
8
+ library_name: transformers
 
9
  ---
10
 
 
11
  [**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**](https://arxiv.org/abs/2407.08083).
12
 
13
+ Code: https://github.com/NVlabs/MambaVision
14
+
15
  ## Model Overview
16
 
17
  We have developed the first hybrid model for computer vision which leverages the strengths of Mamba and Transformers. Specifically, our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features. In addition, we conducted a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba. Our results demonstrate that equipping the Mamba architecture with several self-attention blocks at the final layers greatly improves the modeling capacity to capture long-range spatial dependencies. Based on our findings, we introduce a family of MambaVision models with a hierarchical architecture to meet various design criteria.
 
73
  is_training=False,
74
  mean=model.config.mean,
75
  std=model.config.std,
76
+ crop_mode=model.config.crop_pct,
77
  crop_pct=model.config.crop_pct)
78
 
79
  inputs = transform(image).unsqueeze(0).cuda()
 
114
  is_training=False,
115
  mean=model.config.mean,
116
  std=model.config.std,
117
+ crop_mode=model.config.crop_pct,
118
  crop_pct=model.config.crop_pct)
119
  inputs = transform(image).unsqueeze(0).cuda()
120
  # model inference