nielsr HF Staff commited on
Commit
70ea72e
·
verified ·
1 Parent(s): ed8ecf2

Update pipeline tag and add library name

Browse files

This PR updates the model card metadata to reflect the model's capabilities more accurately.
The `pipeline_tag` is changed to `image-text-to-text` to better represent the model's ability to process both image and text inputs to generate text outputs. The `library_name` is added to indicate compatibility with the Transformers library.
The existing links to the paper and Github repository are excellent and remain unchanged.

Files changed (1) hide show
  1. README.md +42 -12
README.md CHANGED
@@ -1,5 +1,4 @@
1
  ---
2
- license: apache-2.0
3
  datasets:
4
  - liuhaotian/LLaVA-CC3M-Pretrain-595K
5
  - liuhaotian/LLaVA-Instruct-150K
@@ -8,7 +7,9 @@ datasets:
8
  language:
9
  - zh
10
  - en
11
- pipeline_tag: visual-question-answering
 
 
12
  ---
13
 
14
  # Model Card for IAA: Inner-Adaptor Architecture
@@ -152,6 +153,42 @@ outputs = outputs.strip()
152
  print(outputs)
153
  ```
154
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155
  ## We Are Hiring
156
  We are seeking academic interns in the Multimodal field. If interested, please send your resume to [email protected].
157
 
@@ -168,19 +205,12 @@ If you find IAA useful for your research and applications, please cite using thi
168
  ```
169
 
170
  ## License
171
- This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses.
172
- The content of this project itself is licensed under the [Apache license 2.0]
173
-
174
- **Where to send questions or comments about the model:**
175
- https://github.com/360CVGroup/Inner-Adaptor-Architecture
176
-
177
 
 
 
178
 
179
  ## Related Projects
180
  This work wouldn't be possible without the incredible open-source code of these projects. Huge thanks!
181
  - [Meta Llama 3](https://github.com/meta-llama/llama3)
182
  - [LLaVA: Large Language and Vision Assistant](https://github.com/haotian-liu/LLaVA)
183
- - [360VL](https://github.com/360CVGroup/360VL)
184
-
185
-
186
-
 
1
  ---
 
2
  datasets:
3
  - liuhaotian/LLaVA-CC3M-Pretrain-595K
4
  - liuhaotian/LLaVA-Instruct-150K
 
7
  language:
8
  - zh
9
  - en
10
+ license: apache-2.0
11
+ pipeline_tag: image-text-to-text
12
+ library_name: transformers
13
  ---
14
 
15
  # Model Card for IAA: Inner-Adaptor Architecture
 
153
  print(outputs)
154
  ```
155
 
156
+ ## CLI Inference
157
+
158
+ Chat about images using IAA without the need of Gradio interface.
159
+
160
+ ```Shell
161
+ name="qihoo360/Inner-Adaptor-Architecture"
162
+ python -m iaa.eval.infer \
163
+ --model-path $name \
164
+ --image-path testimg/readpanda.jpg \
165
+ --task_type MM \
166
+ ```
167
+ ```Shell
168
+ name="qihoo360/Inner-Adaptor-Architecture"
169
+
170
+ python -m iaa.eval.infer_interleave \
171
+ --model-path $name \
172
+ --image-path testimg/COCO_train2014_000000014502.jpg \
173
+ ```
174
+
175
+ ## Evaluation
176
+
177
+ First, download the MME image from the following link to ./MME/MME_Benchmark_release_version.
178
+ https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation
179
+
180
+ ```Shell
181
+ bash scripts/mme.sh
182
+ ```
183
+
184
+ For Refcoco testing, please refer to the following links for data downloads
185
+ https://github.com/lichengunc/refer
186
+
187
+ ```Shell
188
+ bash scripts/refcoco.sh
189
+ ```
190
+
191
+ <!-- ## Acknowledgement -->
192
  ## We Are Hiring
193
  We are seeking academic interns in the Multimodal field. If interested, please send your resume to [email protected].
194
 
 
205
  ```
206
 
207
  ## License
 
 
 
 
 
 
208
 
209
+ This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses.
210
+ The content of this project itself is licensed under the [Apache license 2.0](./LICENSE).
211
 
212
  ## Related Projects
213
  This work wouldn't be possible without the incredible open-source code of these projects. Huge thanks!
214
  - [Meta Llama 3](https://github.com/meta-llama/llama3)
215
  - [LLaVA: Large Language and Vision Assistant](https://github.com/haotian-liu/LLaVA)
216
+ - [360VL](https://github.com/360CVGroup/360VL)