view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others • 9 days ago • 120
view article Article Scaling robotics datasets with video encoding By aliberts and 2 others • Aug 27, 2024 • 40
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20, 2024 • 62