AIML-TUDA/LlavaGuard-v1.2-7B-OV-hf · How to use multiple images with single prompts

13 days ago

•

Hi,
It is mentioned that LLaVA-OneVision supports multiple images.
Could you please provide guidance or examples on how LLaVAGuard-ov can process multiple images within a single prompt?
Thank you!

felfri

Artificial Intelligence & Machine Learning Lab at TU Darmstadt org 12 days ago

Hi ctdfuji,
could you please clarify where we mention this?

ctdfuji

11 days ago

Hi Felfri,
Thank for your respond.

The model card mentioned:
"We here provide the transformers converted weights for LlavaGuard v1.2 7B. It builds upon LLaVA-OneVision 7B and has achieved the best overall performance so far with improved reasoning capabilities within the rationales."
And in this link: https://github.com/LLaVA-VL/LLaVA-NeXT for LLaVA-NeXT, it is mentioned:
"The new LLaVA-OV models (0.5B/7B/72B) achieve new state-of-the-art performance across single-image, multi-image, and video benchmarks, sometimes rivaling top commercial models on 47 diverse benchmarks."
I guess the LlavaGuard-OV version here is fine-tuned from LLaVA-OV which also support multiple images and video.

felfri

Artificial Intelligence & Machine Learning Lab at TU Darmstadt org 10 days ago

We have not checked if and how multiple images work. Feel free to check it our yourself and let us and the others know if it works. Thanks!