adept/fuyu-8b · Support multi-image?

WaltonFuture

Oct 19, 2023

Thanks for your great work! Does fuyu-8b support multi-image as input?

ArthurZ

Oct 20, 2023

It does but the processor does not at the moment, a PR is on it's way for this!

yingss

Oct 24, 2023

•

edited Oct 25, 2023

Thank you for the great work! Regarding the ability to support multi-image, does it mean that fuyu-8b can handle interleaved text and multi-images? Could you clarify the input format for the transformer decoder when there is interleaved text and multi-images?

From the blog, it appears that the input for a single image and text would be: [img_patch] [img_patch] \n [img_patch] [img_patch] \n [text].

How would the format change in the scenario where the sequence is "<img1> <img2> some text"? Would there be some special image token to separate the two images? Or it would be like [img1_patch] [img1_patch] \n [img1_patch] [img1_patch] \n [img2_patch] [img2_patch] \n [img2_patch] [img2_patch] \n [text]?

Also, does FuyuForCausalLM need any modification in order to accommodate interleaved text and multi-images? Thank you!

ArthurZ

Oct 25, 2023

I think it does support interleaved images, I'm not entirely sure how two images are prompted, we'll try to ask the authors!

yingss

Nov 3, 2023

Any update on supporting interleaved images?

deleted

Nov 10, 2023

This comment has been hidden