BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper • 2411.07461 • Published Nov 12, 2024 • 24
Maya: An Instruction Finetuned Multilingual Multimodal Model Paper • 2412.07112 • Published Dec 10, 2024 • 29
Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers Paper • 2404.13594 • Published Apr 21, 2024 • 1
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling Paper • 2409.05395 • Published Sep 9, 2024 • 5