Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation Paper • 2508.13998 • Published 3 days ago • 13
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos Paper • 2508.14041 • Published 3 days ago • 47
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published 11 days ago • 38
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning Paper • 2508.05405 • Published 15 days ago • 61
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning Paper • 2507.17512 • Published 30 days ago • 36
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published about 1 month ago • 37
"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models Paper • 2507.13428 • Published Jul 17 • 15
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Paper • 2507.12841 • Published Jul 17 • 40
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17 • 72
AnyI2V: Animating Any Conditional Image with Motion Control Paper • 2507.02857 • Published Jul 3 • 12
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation Paper • 2506.18088 • Published Jun 22 • 17
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models Paper • 2506.15681 • Published Jun 18 • 40
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers Paper • 2502.06527 • Published Feb 10 • 11
Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos Paper • 2410.16259 • Published Oct 21, 2024 • 5