Holo1.5 Collection Holo1.5 - Open Foundation Models for Computer Use Agents • 5 items • Updated 13 days ago • 31
Towards Reliable and Interpretable Document Question Answering via VLMs Paper • 2509.10129 • Published 16 days ago
Running 1.08k 1.08k FineWeb: decanting the web for the finest text data at scale 🍷 Generate high-quality text data for LLMs using FineWeb
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • Jul 8 • 686
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Paper • 2507.01955 • Published Jul 2 • 35
view article Article SmolVLM2: Bringing Video Understanding to Every Device By orrzohar and 6 others • Feb 20 • 302
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM By ariG23498 and 3 others • Mar 12 • 463
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 538