DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning Paper • 2508.05405 • Published about 1 month ago • 63
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos Paper • 2507.15597 • Published Jul 21 • 33
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning Paper • 2505.12081 • Published May 17 • 18
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published May 12 • 82
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published Mar 16 • 69
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published Dec 12, 2024 • 49
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation Paper • 2306.13460 • Published Jun 23, 2023 • 1