AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published Feb 3 • 39
Multimodal foundation world models for generalist embodied agents Paper • 2406.18043 • Published Jun 26, 2024 • 1