VLM-R1-models A collection of VLM-R1 Models omlab/Qwen2.5VL-3B-VLM-R1-REC-500steps Zero-Shot Object Detection • 4B • Updated Apr 14 • 1.3k • 23 omlab/VLM-R1-Qwen2.5VL-3B-Math-0305 Visual Question Answering • 4B • Updated Apr 14 • 1.55k • 6 omlab/VLM-R1-Qwen2.5VL-3B-OVD-0321 Image-Text-to-Text • 4B • Updated 7 days ago • 1.83k • 20
Multimodal Research ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration Paper • 2411.16044 • Published Nov 25, 2024 • 2 OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding Paper • 2407.04923 • Published Jul 6, 2024 • 2 OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Paper • 2209.05946 • Published Sep 10, 2022 • 2 VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations Paper • 2207.00221 • Published Jul 1, 2022 • 2
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration Paper • 2411.16044 • Published Nov 25, 2024 • 2
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding Paper • 2407.04923 • Published Jul 6, 2024 • 2
OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Paper • 2209.05946 • Published Sep 10, 2022 • 2
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations Paper • 2207.00221 • Published Jul 1, 2022 • 2
VLM-R1-models A collection of VLM-R1 Models omlab/Qwen2.5VL-3B-VLM-R1-REC-500steps Zero-Shot Object Detection • 4B • Updated Apr 14 • 1.3k • 23 omlab/VLM-R1-Qwen2.5VL-3B-Math-0305 Visual Question Answering • 4B • Updated Apr 14 • 1.55k • 6 omlab/VLM-R1-Qwen2.5VL-3B-OVD-0321 Image-Text-to-Text • 4B • Updated 7 days ago • 1.83k • 20
Multimodal Research ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration Paper • 2411.16044 • Published Nov 25, 2024 • 2 OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding Paper • 2407.04923 • Published Jul 6, 2024 • 2 OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Paper • 2209.05946 • Published Sep 10, 2022 • 2 VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations Paper • 2207.00221 • Published Jul 1, 2022 • 2
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration Paper • 2411.16044 • Published Nov 25, 2024 • 2
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding Paper • 2407.04923 • Published Jul 6, 2024 • 2
OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Paper • 2209.05946 • Published Sep 10, 2022 • 2
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations Paper • 2207.00221 • Published Jul 1, 2022 • 2