From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models Paper • 2506.09930 • Published Jun 11 • 8
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos Paper • 2411.17820 • Published Nov 26, 2024 • 2