SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper ā¢ 2503.11576 ā¢ Published 11 days ago ā¢ 74
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality 21 days ago ā¢ 69
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper ā¢ 2502.02737 ā¢ Published Feb 4 ā¢ 212
SmolVLM 256M & 500M Collection Collection for models & demos for even smoller SmolVLM release ā¢ 12 items ā¢ Updated Feb 20 ā¢ 72
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper ā¢ 2412.05271 ā¢ Published Dec 6, 2024 ā¢ 146
view article Article Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 ā¢ 31
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Paper ā¢ 2412.10302 ā¢ Published Dec 13, 2024 ā¢ 17
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper ā¢ 2412.10360 ā¢ Published Dec 13, 2024 ā¢ 145
view article Article Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints May 1, 2024 ā¢ 74