id-align / docs /LLaVA-NeXT-Interleave.md
zooblastlbz's picture
Upload folder using huggingface_hub
a9e1e1a verified

LLaVA-NeXT: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Contents

Demo

make sure you installed the LLaVA-NeXT model files via outside REAME.md

  1. Example model: lmms-lab/llava-next-interleave-7b

To run a demo, execute:

# If you find any bug when running the demo, please make sure checkpoint path contains 'qwen'.
# You can try command like 'mv llava-next-interleave-7b llava-next-interleave-qwen-7b'
python playground/demo/interleave_demo.py --model_path path/to/ckpt

Evaluation

Preparation

Please download the evaluation data and its metadata from the following links:

  1. llava-interleave-bench: here.

Unzip eval_images.zip and there are Split1 and Split2 in it. Organize the downloaded data into the following structure:


interleave_data
β”œβ”€β”€ Split1
β”‚   β”œβ”€β”€ ...
β”‚   └── ...
|
β”œβ”€β”€ Split2
|   β”œβ”€β”€ ...
β”‚   └── ...
β”œβ”€β”€ multi_image_in_domain.json
β”œβ”€β”€ multi_image_out_domain.json
└── multi_view_in_domain.json

Inference and Evaluation

Example: Please first edit /path/to/ckpt to the path of checkpoint, /path/to/images to the path of "interleave_data" in scripts/interleave/eval_all.sh and then run

bash scripts/interleave/eval_all.sh