view article Article NVIDIA's GTC 2025 Announcement for Physical AI Developers: New Open Models and Datasets 7 days ago β’ 29
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 13 days ago β’ 342
Shot categorizer Collection Fine-tune of Florence-2 to generate shot categories, useful for data curation. Code: https://github.com/huggingface/movie-shot-categorizer. β’ 3 items β’ Updated 19 days ago β’ 2
C4AI Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. β’ 5 items β’ Updated 21 days ago β’ 68
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality 21 days ago β’ 69
view article Article HuggingFace, IISc partner to supercharge model building on India's diverse languages 26 days ago β’ 17
Phi-4 Collection Phi-4 family of small language and multi-modal models. β’ 7 items β’ Updated 22 days ago β’ 111
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published Feb 20 β’ 137
view article Article PaliGemma 2 Mix - New Instruction Vision Language Models by Google Feb 19 β’ 65