FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation Paper • 2506.01144 • Published 4 days ago • 14
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others • 3 days ago • 87
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published 3 days ago • 68
view changelog Changelog Xet is now the default storage option for new users and organizations 14 days ago • 57
view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others • 22 days ago • 110
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others • 16 days ago • 136
view article Article Microsoft and Hugging Face expand collaboration By jeffboudier and 2 others • 18 days ago • 20
MobileCLIP Models + DataCompDR Data Collection MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. • 22 items • Updated Oct 4, 2024 • 29
view article Article Improving Hugging Face Model Access for Kaggle Users By roseberryv and 4 others • 23 days ago • 27
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • 25 days ago • 414
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29, 2024 • 53
view article Article A Dive into Pretraining Strategies for Vision-Language Models By adirik and 1 other • Feb 3, 2023 • 66
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation Paper • 2501.17162 • Published Jan 28 • 1
D-FINE Collection State-of-the-art real-time object detection model with Apache 2.0 licence • 15 items • Updated May 5 • 55