OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? Paper • 2501.05510 • Published Jan 9 • 39
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models Paper • 2502.06788 • Published 11 days ago • 11
Exploring the Potential of Encoder-free Architectures in 3D LMMs Paper • 2502.09620 • Published 8 days ago • 26
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper • 2502.09621 • Published 8 days ago • 27
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published 10 days ago • 27
VideoRoPE: What Makes for Good Video Rotary Position Embedding? Paper • 2502.05173 • Published 14 days ago • 60
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper • 2502.04328 • Published 15 days ago • 26