CoMP: Continual Multimodal Pre-training for Vision Foundation Models Paper • 2503.18931 • Published 13 days ago • 29
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Paper • 2412.03565 • Published Dec 4, 2024 • 11
SliMM Collection A Simple LMM baseline with Dynamic Visual Resolution • 5 items • Updated Dec 15, 2024