LLM2CLIP Collection LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 11 items • Updated 16 days ago • 57
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 16 days ago • 351
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 4 items • Updated 8 days ago • 101
Phi-4 Collection Phi-4 family of small language and multi-modal models. • 7 items • Updated 24 days ago • 112
Ovis2 Collection Our latest advancement in multi-modal large language models (MLLMs) • 15 items • Updated 3 days ago • 56
Sa2VA Model Zoo Collection Huggingace Model Zoo For Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos By Bytedance Seed CV Research • 4 items • Updated Feb 9 • 34
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper • 2501.04001 • Published Jan 7 • 46
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 138
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22, 2024 • 74