Update README.md
Browse files
README.md
CHANGED
@@ -13,14 +13,22 @@ pipeline_tag: visual-question-answering
|
|
13 |
## Model Overview
|
14 |
- **INFRL-Qwen2.5-VL-72B-Preview** improves visual reasoning upon [Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) model.
|
15 |
|
16 |
-
- As of March 25th, 2025, **INFRL-Qwen2.5-VL-72B-Preview** is the best-performing open-sourced VL model on various visual reasoning benchmarks ([MathVision](https://mathllm.github.io/mathvision/),[MathVista](https://mathvista.github.io/),
|
17 |
|
18 |
|
19 |
## Evaluation
|
20 |
|
21 |
-
|
22 |
-
|
23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
|
25 |
Stay tuned!
|
26 |
|
|
|
13 |
## Model Overview
|
14 |
- **INFRL-Qwen2.5-VL-72B-Preview** improves visual reasoning upon [Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) model.
|
15 |
|
16 |
+
- As of March 25th, 2025, **INFRL-Qwen2.5-VL-72B-Preview** is the best-performing open-sourced VL model on various visual reasoning benchmarks ([MathVision](https://mathllm.github.io/mathvision/),[MathVista](https://mathvista.github.io/), [MathVerse](https://mathverse-cuhk.github.io/)).
|
17 |
|
18 |
|
19 |
## Evaluation
|
20 |
|
21 |
+
| Models | MathVision (test) | MathVista (testmini) | MathVerse (testmini) |
|
22 |
+
|-------------------|-------------------|----------------------|----------------------|
|
23 |
+
| GPT4o (R1-1V Rep) | 30.6 | 60 | 41.2 |
|
24 |
+
| Gemini-2.0-Flash | 41.3 | 70.1 | 50.6 |
|
25 |
+
| Claude 3.5 Sonnet | 33.5 | 67.7 | 47.8 |
|
26 |
+
| QvQ-72B | 35.9 | 71.4 | 48.6 |
|
27 |
+
| InternVL2.5-78B | 34.9 | 72.3 | 51.7 |
|
28 |
+
| Qwen-VL-2.5-72B | 38.1 | 74.8 | 57.18 |
|
29 |
+
| INFRL-VL-Preview | 41.9 | 77.8 | 58.84 |
|
30 |
+
|
31 |
+
We will release a code repository for VLM evaluation. It supports RL training with simple rule-based rewards, meanwhile aligning with LLM-Judge results.
|
32 |
|
33 |
Stay tuned!
|
34 |
|