InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation Paper • 2507.17520 • Published about 1 month ago • 14
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation Paper • 2507.17520 • Published about 1 month ago • 14
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation Paper • 2507.17520 • Published about 1 month ago • 14 • 1
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations Paper • 2406.09401 • Published Jun 13, 2024
Unified Generative and Discriminative Training for Multi-modal Large Language Models Paper • 2411.00304 • Published Nov 1, 2024