File size: 622 Bytes
a5c1cc6
4007063
 
a5c1cc6
 
 
 
4007063
 
a5c1cc6
 
4007063
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
---
base_model:
- Qwen/Qwen2-VL-2B-Instruct
datasets:
- rp-yu/VPT_Datasets
language:
- en
library_name: transformers
license: apache-2.0
metrics:
- accuracy
pipeline_tag: image-text-to-text
---

# Introducing Visual Perception Token into Multimodal Large Language Model

This repository contains models based on the paper [Introducing Visual Perception Token into Multimodal Large Language Model](https://arxiv.org/abs/2502.17425). These models utilize Visual Perception Tokens to enhance the visual perception capabilities of multimodal large language models (MLLMs).

Code: https://github.com/yu-rp/VisualPerceptionToken