VL-BERT: Pre-training of Generic Visual-Linguistic Representations Paper • 1908.08530 • Published Aug 22, 2019
Deformable DETR: Deformable Transformers for End-to-End Object Detection Paper • 2010.04159 • Published Oct 8, 2020 • 1
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Paper • 2312.14238 • Published Dec 21, 2023 • 20
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information Paper • 2211.09807 • Published Nov 17, 2022
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models Paper • 2412.09613 • Published Dec 12, 2024 • 1
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning Paper • 2406.07543 • Published Jun 11, 2024
ZeroGUI: Automating Online GUI Learning at Zero Human Cost Paper • 2505.23762 • Published 24 days ago • 45
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces Paper • 2506.00123 • Published 23 days ago • 34
ZeroGUI: Automating Online GUI Learning at Zero Human Cost Paper • 2505.23762 • Published 24 days ago • 45
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 274
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 274
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper • 2412.09604 • Published Dec 12, 2024 • 39
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 159
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published Nov 15, 2024 • 80
Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory Paper • 2305.17144 • Published May 25, 2023 • 2