Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Paper • 2503.03983 • Published 4 days ago • 18
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 4 days ago • 64
Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions Paper • 2503.03278 • Published 5 days ago • 12
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding Paper • 2503.02951 • Published 5 days ago • 25
ABC: Achieving Better Control of Multimodal Embeddings using VLMs Paper • 2503.00329 • Published 9 days ago • 18
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published 6 days ago • 65
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published 11 days ago • 56