Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity Paper • 2501.16295 • Published 2 days ago • 5
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 3 items • Updated 3 days ago • 273
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published 6 days ago • 21
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 7 days ago • 74
SmolVLM 256M & 500M Collection Collection for models & demos for even smoller SmolVLM release • 12 items • Updated 6 days ago • 60
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise Paper • 2501.08331 • Published 15 days ago • 20
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation Paper • 2501.12202 • Published 8 days ago • 28
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Paper • 2501.12368 • Published 8 days ago • 39
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published 8 days ago • 47
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published 8 days ago • 79