--- license: apache-2.0 language: - en metrics: - accuracy base_model: - Qwen/Qwen2-VL-7B-Instruct pipeline_tag: image-text-to-text --- # DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Xinyu Ma, Ziyang Ding, Zhicong Luo, Chi Chen, Zonghao Guo, Derek F. Wong, Xiaoyi Feng, Maosong Sun ----- This is the official repository of **DeepPerception**, an MLLM enhanced with cognitive visual perception capabilities.