🛰️🌍 Geospatial Datasets Collection A curated collections of diverse geospatial and satellite imagery datasets. • 56 items • Updated Mar 11 • 23
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces Paper • 2506.00123 • Published May 30 • 34
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 488
D-FINE Collection State-of-the-art real-time object detection model with Apache 2.0 licence • 15 items • Updated May 5 • 55
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality By saurabhdash and 3 others • Mar 4 • 75
view article Article FastRTC: The Real-Time Communication Library for Python By freddyaboulton and 1 other • Feb 25 • 171
Visual Document Retrieval Collection A collection of models, datasets, and spaces in the VDR series • 5 items • Updated Jan 10 • 8
view article Article 🚀 Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker! By ariG23498 • Jan 29 • 19
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22, 2024 • 80
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 626
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated May 1 • 573
Nemotron 4 340B Collection Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 4 days ago • 163