Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation Paper • 2508.13998 • Published 3 days ago • 13
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos Paper • 2508.14041 • Published 3 days ago • 47
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published 11 days ago • 38
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning Paper • 2508.05405 • Published 15 days ago • 61
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning Paper • 2507.17512 • Published 30 days ago • 36
VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models Paper • 2503.21781 • Published Mar 27
MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching Paper • 2502.13234 • Published Feb 18
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published about 1 month ago • 37
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published about 1 month ago • 37
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published about 1 month ago • 37 • 1
"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models Paper • 2507.13428 • Published Jul 17 • 15
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Paper • 2507.12841 • Published Jul 17 • 40
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17 • 72
AnyI2V: Animating Any Conditional Image with Motion Control Paper • 2507.02857 • Published Jul 3 • 12