view article Article Ο0 and Ο0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 β’ 111
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper β’ 2411.14402 β’ Published Nov 21, 2024 β’ 43
eP-ALM: Efficient Perceptual Augmentation of Language Models Paper β’ 2303.11403 β’ Published Mar 20, 2023 β’ 3
Unified Model for Image, Video, Audio and Language Tasks Paper β’ 2307.16184 β’ Published Jul 30, 2023 β’ 15