DeepPerception / README.md
MaxyLee's picture
Add library_name to metadata (#1)
c215f2a verified
metadata
base_model:
  - Qwen/Qwen2-VL-7B-Instruct
language:
  - en
license: apache-2.0
metrics:
  - accuracy
pipeline_tag: image-text-to-text
library_name: transformers

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Xinyu Ma, Ziyang Ding, Zhicong Luo, Chi Chen, Zonghao Guo, Derek F. Wong, Xiaoyi Feng, Maosong Sun


This is the official repository of DeepPerception, an MLLM enhanced with cognitive visual perception capabilities.