How to use multiple images with single prompts
Hi,
It is mentioned that LLaVA-OneVision supports multiple images.
Could you please provide guidance or examples on how LLaVAGuard-ov can process multiple images within a single prompt?
Thank you!
Hi ctdfuji,
could you please clarify where we mention this?
Hi Felfri,
Thank for your respond.
The model card mentioned:
"We here provide the transformers converted weights for LlavaGuard v1.2 7B. It builds upon LLaVA-OneVision 7B and has achieved the best overall performance so far with improved reasoning capabilities within the rationales."
And in this link: https://github.com/LLaVA-VL/LLaVA-NeXT for LLaVA-NeXT, it is mentioned:
"The new LLaVA-OV models (0.5B/7B/72B) achieve new state-of-the-art performance across single-image, multi-image, and video benchmarks, sometimes rivaling top commercial models on 47 diverse benchmarks."
I guess the LlavaGuard-OV version here is fine-tuned from LLaVA-OV which also support multiple images and video.
We have not checked if and how multiple images work. Feel free to check it our yourself and let us and the others know if it works. Thanks!