Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper โข 2504.01990 โข Published 15 days ago โข 238
One-Minute Video Generation with Test-Time Training Paper โข 2504.05298 โข Published 8 days ago โข 92
SmolVLM: Redefining small and efficient multimodal models Paper โข 2504.05299 โข Published 8 days ago โข 158
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts Paper โข 2503.22952 โข Published 17 days ago โข 18
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper โข 2503.11576 โข Published Mar 14 โข 94
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality Mar 4 โข 73
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper โข 2502.02737 โข Published Feb 4 โข 223
SmolVLM 256M & 500M Collection Collection for models & demos for even smoller SmolVLM release โข 12 items โข Updated Feb 20 โข 74
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper โข 2412.05271 โข Published Dec 6, 2024 โข 153
view article Article Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 โข 31