Update pipeline tag and add library name
Browse filesThis PR updates the model card metadata to reflect the model's capabilities more accurately.
The `pipeline_tag` is changed to `image-text-to-text` to better represent the model's ability to process both image and text inputs to generate text outputs. The `library_name` is added to indicate compatibility with the Transformers library.
The existing links to the paper and Github repository are excellent and remain unchanged.
README.md
CHANGED
@@ -1,5 +1,4 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
datasets:
|
4 |
- liuhaotian/LLaVA-CC3M-Pretrain-595K
|
5 |
- liuhaotian/LLaVA-Instruct-150K
|
@@ -8,7 +7,9 @@ datasets:
|
|
8 |
language:
|
9 |
- zh
|
10 |
- en
|
11 |
-
|
|
|
|
|
12 |
---
|
13 |
|
14 |
# Model Card for IAA: Inner-Adaptor Architecture
|
@@ -152,6 +153,42 @@ outputs = outputs.strip()
|
|
152 |
print(outputs)
|
153 |
```
|
154 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
155 |
## We Are Hiring
|
156 |
We are seeking academic interns in the Multimodal field. If interested, please send your resume to [email protected].
|
157 |
|
@@ -168,19 +205,12 @@ If you find IAA useful for your research and applications, please cite using thi
|
|
168 |
```
|
169 |
|
170 |
## License
|
171 |
-
This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses.
|
172 |
-
The content of this project itself is licensed under the [Apache license 2.0]
|
173 |
-
|
174 |
-
**Where to send questions or comments about the model:**
|
175 |
-
https://github.com/360CVGroup/Inner-Adaptor-Architecture
|
176 |
-
|
177 |
|
|
|
|
|
178 |
|
179 |
## Related Projects
|
180 |
This work wouldn't be possible without the incredible open-source code of these projects. Huge thanks!
|
181 |
- [Meta Llama 3](https://github.com/meta-llama/llama3)
|
182 |
- [LLaVA: Large Language and Vision Assistant](https://github.com/haotian-liu/LLaVA)
|
183 |
-
- [360VL](https://github.com/360CVGroup/360VL)
|
184 |
-
|
185 |
-
|
186 |
-
|
|
|
1 |
---
|
|
|
2 |
datasets:
|
3 |
- liuhaotian/LLaVA-CC3M-Pretrain-595K
|
4 |
- liuhaotian/LLaVA-Instruct-150K
|
|
|
7 |
language:
|
8 |
- zh
|
9 |
- en
|
10 |
+
license: apache-2.0
|
11 |
+
pipeline_tag: image-text-to-text
|
12 |
+
library_name: transformers
|
13 |
---
|
14 |
|
15 |
# Model Card for IAA: Inner-Adaptor Architecture
|
|
|
153 |
print(outputs)
|
154 |
```
|
155 |
|
156 |
+
## CLI Inference
|
157 |
+
|
158 |
+
Chat about images using IAA without the need of Gradio interface.
|
159 |
+
|
160 |
+
```Shell
|
161 |
+
name="qihoo360/Inner-Adaptor-Architecture"
|
162 |
+
python -m iaa.eval.infer \
|
163 |
+
--model-path $name \
|
164 |
+
--image-path testimg/readpanda.jpg \
|
165 |
+
--task_type MM \
|
166 |
+
```
|
167 |
+
```Shell
|
168 |
+
name="qihoo360/Inner-Adaptor-Architecture"
|
169 |
+
|
170 |
+
python -m iaa.eval.infer_interleave \
|
171 |
+
--model-path $name \
|
172 |
+
--image-path testimg/COCO_train2014_000000014502.jpg \
|
173 |
+
```
|
174 |
+
|
175 |
+
## Evaluation
|
176 |
+
|
177 |
+
First, download the MME image from the following link to ./MME/MME_Benchmark_release_version.
|
178 |
+
https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation
|
179 |
+
|
180 |
+
```Shell
|
181 |
+
bash scripts/mme.sh
|
182 |
+
```
|
183 |
+
|
184 |
+
For Refcoco testing, please refer to the following links for data downloads
|
185 |
+
https://github.com/lichengunc/refer
|
186 |
+
|
187 |
+
```Shell
|
188 |
+
bash scripts/refcoco.sh
|
189 |
+
```
|
190 |
+
|
191 |
+
<!-- ## Acknowledgement -->
|
192 |
## We Are Hiring
|
193 |
We are seeking academic interns in the Multimodal field. If interested, please send your resume to [email protected].
|
194 |
|
|
|
205 |
```
|
206 |
|
207 |
## License
|
|
|
|
|
|
|
|
|
|
|
|
|
208 |
|
209 |
+
This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses.
|
210 |
+
The content of this project itself is licensed under the [Apache license 2.0](./LICENSE).
|
211 |
|
212 |
## Related Projects
|
213 |
This work wouldn't be possible without the incredible open-source code of these projects. Huge thanks!
|
214 |
- [Meta Llama 3](https://github.com/meta-llama/llama3)
|
215 |
- [LLaVA: Large Language and Vision Assistant](https://github.com/haotian-liu/LLaVA)
|
216 |
+
- [360VL](https://github.com/360CVGroup/360VL)
|
|
|
|
|
|