VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning Paper • 2505.12081 • Published 25 days ago • 17
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published 30 days ago • 79
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published Mar 16 • 66
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published Dec 12, 2024 • 49
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation Paper • 2306.13460 • Published Jun 23, 2023 • 1