view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 465
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others • May 21 • 174
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 7 items • Updated May 21 • 145
Ola Collection Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment • 4 items • Updated Feb 21 • 3
Lost in the Middle: How Language Models Use Long Contexts Paper • 2307.03172 • Published Jul 6, 2023 • 40
view article Article How NuminaMath Won the 1st AIMO Progress Prize By yfleureau and 7 others • Jul 11, 2024 • 121
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published Jan 23 • 26
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models Paper • 2411.14982 • Published Nov 22, 2024 • 18