Update pipeline tag and add library name

This PR updates the model card metadata to reflect the model's capabilities more accurately.
The `pipeline_tag` is changed to `image-text-to-text` to better represent the model's ability to process both image and text inputs to generate text outputs. The `library_name` is added to indicate compatibility with the Transformers library.
The existing links to the paper and Github repository are excellent and remain unchanged.

Files changed (1) hide show

README.md +42 -12

README.md CHANGED Viewed

@@ -1,5 +1,4 @@
 ---
-license: apache-2.0
 datasets:
 - liuhaotian/LLaVA-CC3M-Pretrain-595K
 - liuhaotian/LLaVA-Instruct-150K
@@ -8,7 +7,9 @@ datasets:
 language:
 - zh
 - en
-pipeline_tag: visual-question-answering
 ---
 # Model Card for IAA: Inner-Adaptor Architecture
@@ -152,6 +153,42 @@ outputs = outputs.strip()
 print(outputs)
 ```
 ## We Are Hiring
 We are seeking academic interns in the Multimodal field. If interested, please send your resume to [email protected].
@@ -168,19 +205,12 @@ If you find IAA useful for your research and applications, please cite using thi
 ```
 ## License
-This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses.
-The content of this project itself is licensed under the [Apache license 2.0]
-**Where to send questions or comments about the model:**
-https://github.com/360CVGroup/Inner-Adaptor-Architecture
 ## Related Projects
 This work wouldn't be possible without the incredible open-source code of these projects. Huge thanks!
 - [Meta Llama 3](https://github.com/meta-llama/llama3)
 - [LLaVA: Large Language and Vision Assistant](https://github.com/haotian-liu/LLaVA)
-- [360VL](https://github.com/360CVGroup/360VL)

 ---
 datasets:
 - liuhaotian/LLaVA-CC3M-Pretrain-595K
 - liuhaotian/LLaVA-Instruct-150K
 language:
 - zh
 - en
+license: apache-2.0
+pipeline_tag: image-text-to-text
+library_name: transformers
 ---
 # Model Card for IAA: Inner-Adaptor Architecture
 print(outputs)
 ```
+## CLI Inference
+Chat about images using IAA without the need of Gradio interface.
+```Shell
+name="qihoo360/Inner-Adaptor-Architecture"
+python -m iaa.eval.infer \
+    --model-path $name \
+    --image-path testimg/readpanda.jpg \
+    --task_type MM \
+```
+```Shell
+name="qihoo360/Inner-Adaptor-Architecture"
+python -m iaa.eval.infer_interleave \
+    --model-path $name \
+    --image-path testimg/COCO_train2014_000000014502.jpg \
+```
+## Evaluation
+First, download the MME image from the following link to ./MME/MME_Benchmark_release_version.
+https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation
+```Shell
+bash scripts/mme.sh
+```
+For Refcoco testing, please refer to the following links for data downloads
+https://github.com/lichengunc/refer
+```Shell
+bash scripts/refcoco.sh
+```
+<!-- ## Acknowledgement -->
 ## We Are Hiring
 We are seeking academic interns in the Multimodal field. If interested, please send your resume to [email protected].
 ```
 ## License
+This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses.
+The content of this project itself is licensed under the [Apache license 2.0](./LICENSE).
 ## Related Projects
 This work wouldn't be possible without the incredible open-source code of these projects. Huge thanks!
 - [Meta Llama 3](https://github.com/meta-llama/llama3)
 - [LLaVA: Large Language and Vision Assistant](https://github.com/haotian-liu/LLaVA)
+- [360VL](https://github.com/360CVGroup/360VL)