MLCD-Embodied-7B / README.md
xiangan's picture
Update README.md
4ec92e0 verified
metadata
license: apache-2.0
language:
  - zh
  - en
metrics:
  - bleu
base_model:
  - Qwen/Qwen2.5-7B-Instruct

[Paper] [GitHub]

Embodied Ability Evaluation: Performance in RoboVQA and OpenEQA

MLCD
Embodied-7B
LLaVA
OneVision-7B
GPT-4v RoboMamba
RoboVQA BLEU1 73.16 38.12 - 54.9
BLEU2 66.39 33.56 - 44.2
BLEU3 60.61 31.76 - 39.5
BLEU4 56.56 30.97 - 36.3
OpenEQA Object State Recognition 71.83 - 63.2 -
Object Recognition 49.46 - 43.4 -
Functional Reasoning 54.38 - 57.4 -
Spatial Understanding 48.64 - 33.6 -
Attribute Recognition 67.08 - 57.2 -
World Knowledge 53.87 - 50.7 -
Object Localization 43.06 - 42.0 -

General Ability Evaluation: Comparison with LLaVA OneVision-7B and GPT-4

Dataset Split MLCD
Embodied-7B
LLaVA
OneVision-7B
GPT-4v GPT-4o
A12D test 79.9 81.4 78.2 94.2
ChartQA test 83.0 80.0 78.5 85.7
DocVQA test 91.6 87.5 88.4 92.8
InfoVQA val 73.9 70.7 - -
InfoVQA test 70.0 68.8 - -
MMMU val 47.3 48.8 56.8 69.1
MMStar test 58.5 61.7 57.1 63.9
OCRBench - 749.0 697.0 656.0 805.0
RealWorldQA test 68.9 66.3 61.4 58.6
SeedBench image 74.9 75.4 49.9 76.2
MMbench en-dev 81.1 83.2 81.3 83.4
MMbench en-test 80.1 80.8 75.0 -
MME test 578/1603 418/1580 517/1409 -

Usage

A. Installation

git clone https://github.com/deepglint/unicom
cd unicom

# Upgrade pip and install necessary dependencies
pip install --upgrade pip
pip install -e ".[train]"

B. Inference

git clone https://github.com/deepglint/unicom
cd unicom
pip install --upgrade pip
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

CUDA_VISIBLE_DEVICES=0 python infer.py --model_dir DeepGlint-AI/MLCD-Embodied-7B

# example:
# >> Enter 'exit' to end the conversation, 'reset' to clear the chat history.
# >> Enter image file paths (comma-separated): ./asserts/logo.png
# >> User: <image>What kind of animal is it in this picture?
# >> Assistant: The image features a stylized representation of a cat, characterized by its vibrant and abstract depiction.
# >> User: What color is this cat?
# >> Assistant: The cat in the image is primarily white with blue, orange and pink accents, creating a visually appealing and unique appearance.
# >> User: <image>请你介绍一下这个图片
# >> Assistant: 这是一幅充满创意的猫头艺术作品。它采用了多色渐变和抽象风格,将猫的头部描绘成一个充满活力和色彩的视觉冲击。猫的眼睛用金色渲染,显得非常有神采,
# 而粉色的鼻子则增添了一丝可爱感。整体设计融合了现代艺术与传统猫头图案,创造出一种既独特又引人入胜的视觉效果。。

C. Evaluation for Embodied Ability

Step 1

Download raw data following OpenEQA and RoboVQA(val part)

Step 2

Converting raw data into the format required for model evaluation.

# convert OpenEQA benchmark. Note: replace the paths with your own.
python llava/benchmark/make_openeqa_bmk.py

# convert RoboVQA benchmark. Note: replace the paths with your own.
python llava/benchmark/make_robovqa_bmk.py

Step 3

Make sure that your top-level directory structure should look like this:

|--/path/to/your/benchmarks
|  |--OpenEQA
|  |  |--openeqa_scannet.parquet
|  |  |--openeqa_hm3d.parquet
|  |--RoboVQA
|     |--robovqa.parquet
|--/path/to/your/images
   |--openeqa_val
   |  |--scannet-v0
   |  |  |--002-scannet-scene0709_00
   |  |  |--xxx-scannet-scenexxxx_xx
   |  |--hm3d-v0
   |     |--000-hm3d-BFRyYbPCCPE
   |     |--xxx-hm3d-xxxxxxxxxxx
   |--robovqa_val
      |--robovqa_221911
      |--robovqa_xxxxxx

Step 4

Run script for evaluation

# Note: replace 'YOUR_API_KEY', 'YOUR_ENDPOINT', 'bmk_root', 'image_folder' with your own.
bash scripts/eval/eval_robo.sh /path/to/your/model

D. Evaluation for General Ability

Install the evaluation tool and execute the evaluation script:

pip install lmms-eval==0.2.0
PYTHONPATH=./ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m accelerate.commands.launch \
    --main_process_port=12444 \
    --num_processes=8 \
    -m lmms_eval \
    --model llava \
    --model_args pretrained=DeepGlint-AI/MLCD-Embodied-7B,conv_template=qwen_1_5 \
    --tasks mme \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix mlcd \
    --output_path ./eval_log/

We would like to express our gratitude to Huajie Tan, Yumeng Wang, Yin Xie for his significant contributions to the experimental validation in MLLMs.