---
base_model:
- Qwen/Qwen2.5-VL-72B-Instruct
language:
- en
license: apache-2.0
tags:
- transformers
- multimodal
pipeline_tag: visual-question-answering
---
# INFRL-Qwen2.5-VL-72B-Preview
## Model Overview
- **INFRL-Qwen2.5-VL-72B-Preview** improves visual reasoning upon [Qwen2.5-VL-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) model.

- As of March 25th, 2025, **INFRL-Qwen2.5-VL-72B-Preview** is the best-performing open-sourced VL model on various visual reasoning benchmarks ([MathVision](https://mathllm.github.io/mathvision/),[MathVista](https://mathvista.github.io/), [EMMA](https://emma-benchmark.github.io/#leaderboard), [MMMUPro](https://mmmu-benchmark.github.io/), [MathVerse](https://mathverse-cuhk.github.io/)). 


| Models            | MathVision (test) | MathVista (testmini) | MathVerse (testmini) |
|-------------------|-------------------|----------------------|----------------------|
| GPT4o (R1-1V Rep) | 30.6              | 60                   | 41.2                 |
| Gemini-2.0-Flash  | 41.3              | 70.1                 | 50.6                 |
| Claude 3.5 Sonnet | 33.5              | 67.7                 | 47.8                 |
| QvQ-72B           | 35.9              | 71.4                 | 48.6                 |
| InternVL2.5-78B   | 34.9              | 72.3                 | 51.7                 |
| Qwen-VL-2.5-72B   | 38.1              | 74.8                 | 57.18                |
| INFRL-VL-Preview  | 41.9              | 77.8                 | 58.84                |

## Evaluation

We will release a code repository with vLLM support for VLM evaluation. 
  - 10x faster than hf.generate(). Better efficiency in evaluating larger benchmark.
  - Efficient answer extraction and matching with aligned performance with Qwen2.5-VL. No need for costly LLM-Judge.
    
Stay tuned!

## Contributors
### Supervisors
Wei Chu • Yuan Qi

### VL Team 
Haozhe Wang • Zuming Huang 

### RL Team
Haozhe Wang • Chao Qu • Long Li

## Thanks 
Thanks to Jiaran Hao, Liuyihan Song for supports in the RL infrastructure.

## Citation
If you find our model useful, please consider citing:

```
@misc {INFRL_VL_Preview,
	author       = { {Wang, Haozhe and Huang, Zuming and Qu, Chao and Chu, Wei and Qi, Yuan} },
	title        = { INFRL-Qwen2.5-VL-72B-Preview },
	year         = 2025,
	url          = { https://huggingface.co/infly/INFRL-Qwen2.5-VL-72B-Preview},
	publisher    = { Hugging Face }
}
```