JasperHaozhe commited on
Commit
92fbee0
·
verified ·
1 Parent(s): 6f16a4b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -4
README.md CHANGED
@@ -13,14 +13,22 @@ pipeline_tag: visual-question-answering
13
  ## Model Overview
14
  - **INFRL-Qwen2.5-VL-72B-Preview** improves visual reasoning upon [Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) model.
15
 
16
- - As of March 25th, 2025, **INFRL-Qwen2.5-VL-72B-Preview** is the best-performing open-sourced VL model on various visual reasoning benchmarks ([MathVision](https://mathllm.github.io/mathvision/),[MathVista](https://mathvista.github.io/), [EMMA](https://emma-benchmark.github.io/#leaderboard), [MMMUPro](https://mmmu-benchmark.github.io/), [MathVerse](https://mathverse-cuhk.github.io/)).
17
 
18
 
19
  ## Evaluation
20
 
21
- We will release a code repository with vLLM support for VLM evaluation.
22
- - 10x faster than hf.generate(). Better efficiency in evaluating larger benchmark.
23
- - Efficient answer extraction and matching with aligned performance with Qwen2.5-VL. No need for costly LLM-Judge.
 
 
 
 
 
 
 
 
24
 
25
  Stay tuned!
26
 
 
13
  ## Model Overview
14
  - **INFRL-Qwen2.5-VL-72B-Preview** improves visual reasoning upon [Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) model.
15
 
16
+ - As of March 25th, 2025, **INFRL-Qwen2.5-VL-72B-Preview** is the best-performing open-sourced VL model on various visual reasoning benchmarks ([MathVision](https://mathllm.github.io/mathvision/),[MathVista](https://mathvista.github.io/), [MathVerse](https://mathverse-cuhk.github.io/)).
17
 
18
 
19
  ## Evaluation
20
 
21
+ | Models | MathVision (test) | MathVista (testmini) | MathVerse (testmini) |
22
+ |-------------------|-------------------|----------------------|----------------------|
23
+ | GPT4o (R1-1V Rep) | 30.6 | 60 | 41.2 |
24
+ | Gemini-2.0-Flash | 41.3 | 70.1 | 50.6 |
25
+ | Claude 3.5 Sonnet | 33.5 | 67.7 | 47.8 |
26
+ | QvQ-72B | 35.9 | 71.4 | 48.6 |
27
+ | InternVL2.5-78B | 34.9 | 72.3 | 51.7 |
28
+ | Qwen-VL-2.5-72B | 38.1 | 74.8 | 57.18 |
29
+ | INFRL-VL-Preview | 41.9 | 77.8 | 58.84 |
30
+
31
+ We will release a code repository for VLM evaluation. It supports RL training with simple rule-based rewards, meanwhile aligning with LLM-Judge results.
32
 
33
  Stay tuned!
34