VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper β’ 2503.10582 β’ Published 13 days ago β’ 20
OLMo 2 Collection Artifacts for the second set of OLMo models. β’ 27 items β’ Updated 6 days ago β’ 106
C4AI Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. β’ 5 items β’ Updated 22 days ago β’ 68
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org β’ 4 items β’ Updated 7 days ago β’ 101
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published Feb 20 β’ 137
Ovis2 Collection Our latest advancement in multi-modal large language models (MLLMs) β’ 15 items β’ Updated about 22 hours ago β’ 56
Hibiki fr-en Collection Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki. β’ 5 items β’ Updated Feb 6 β’ 52
AceMath Collection We are releasing math instruction models, math reward models, general instruction models, all training datasets, and a math reward benchmark. β’ 11 items β’ Updated 8 days ago β’ 11
SmolVLM 256M & 500M Collection Collection for models & demos for even smoller SmolVLM release β’ 12 items β’ Updated Feb 20 β’ 72
ViTPose Collection Collection for ViTPose models based on transformers implementation. β’ 10 items β’ Updated Jan 12 β’ 13
Sa2VA Model Zoo Collection Huggingace Model Zoo For Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos By Bytedance Seed CV Research β’ 4 items β’ Updated Feb 9 β’ 33