This model dubbed Spoonbill is close friend of Open-Flamingo, Otter, Otter Dataset, Llama2 derived on limited datasets CGD, SD from above said Otter's MIMIC datasets.
As a huge fan of vision-language model and instruction tuning, dubbed Spoonbill is multi-model vision-language model only for image and text, fine tuned on limited dataset and using Llama2-7B-Chat as the language model.
Please refer to licenses of Open-Flamingo, Otter, Llama2
@article{awadalla2023openflamingo, title={OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models}, author={Anas Awadalla and Irena Gao and Josh Gardner and Jack Hessel and Yusuf Hanafy and Wanrong Zhu and Kalyani Marathe and Yonatan Bitton and Samir Gadre and Shiori Sagawa and Jenia Jitsev and Simon Kornblith and Pang Wei Koh and Gabriel Ilharco and Mitchell Wortsman and Ludwig Schmidt}, journal={arXiv preprint arXiv:2308.01390}, year={2023} }
https://ai.meta.com/llama/license/
@article{li2023otter, title={Otter: A Multi-Modal Model with In-Context Instruction Tuning}, author={Li, Bo and Zhang, Yuanhan and Chen, Liangyu and Wang, Jinghao and Yang, Jingkang and Liu, Ziwei}, journal={arXiv preprint arXiv:2305.03726}, year={2023} }
@article{li2023mimicit, title={MIMIC-IT: Multi-Modal In-Context Instruction Tuning}, author={Bo Li and Yuanhan Zhang and Liangyu Chen and Jinghao Wang and Fanyi Pu and Jingkang Yang and Chunyuan Li and Ziwei Liu}, year={2023}, eprint={2306.05425}, archivePrefix={arXiv}, primaryClass={cs.CV} }
- Downloads last month
- 0