--- base_model: - Qwen/Qwen2-VL-7B-Instruct language: - en license: apache-2.0 metrics: - accuracy pipeline_tag: image-text-to-text library_name: transformers --- # DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Xinyu Ma, Ziyang Ding, Zhicong Luo, Chi Chen, Zonghao Guo, Derek F. Wong, Xiaoyi Feng, Maosong Sun ----- This is the official repository of **DeepPerception**, an MLLM enhanced with cognitive visual perception capabilities.