metadata
base_model:
- Qwen/Qwen2-VL-7B-Instruct
language:
- en
license: apache-2.0
metrics:
- accuracy
pipeline_tag: image-text-to-text
library_name: transformers
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Xinyu Ma, Ziyang Ding, Zhicong Luo, Chi Chen, Zonghao Guo, Derek F. Wong, Xiaoyi Feng, Maosong Sun
This is the official repository of DeepPerception, an MLLM enhanced with cognitive visual perception capabilities.