view article Article SmolVLM - small yet mighty Vision Language Model By andito and 4 others • Nov 26, 2024 • 311
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics Paper • 2506.00070 • Published 13 days ago • 28
Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction Paper • 2411.14762 • Published Nov 22, 2024 • 11
Meta-Transformer: A Unified Framework for Multimodal Learning Paper • 2307.10802 • Published Jul 20, 2023 • 44
Collaborative Score Distillation for Consistent Visual Synthesis Paper • 2307.04787 • Published Jul 4, 2023 • 29