nielsr's picture
nielsr HF Staff
Add pipeline tag and Github link to model card
0bf5eda verified
|
raw
history blame
622 Bytes
metadata
base_model:
  - Qwen/Qwen2-VL-2B-Instruct
datasets:
  - rp-yu/VPT_Datasets
language:
  - en
library_name: transformers
license: apache-2.0
metrics:
  - accuracy
pipeline_tag: image-text-to-text

Introducing Visual Perception Token into Multimodal Large Language Model

This repository contains models based on the paper Introducing Visual Perception Token into Multimodal Large Language Model. These models utilize Visual Perception Tokens to enhance the visual perception capabilities of multimodal large language models (MLLMs).

Code: https://github.com/yu-rp/VisualPerceptionToken