Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Paper • 2508.05635 • Published 17 days ago • 72
π^3: Scalable Permutation-Equivariant Visual Geometry Learning Paper • 2507.13347 • Published Jul 17 • 64
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 232
Use Property-Based Testing to Bridge LLM Code Generation and Validation Paper • 2506.18315 • Published Jun 23 • 10
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models Paper • 2506.19851 • Published Jun 24 • 59
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Paper • 2506.04308 • Published Jun 4 • 43
UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation Paper • 2505.24521 • Published May 30 • 16
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection Paper • 2412.04455 • Published Dec 5, 2024 • 39
MV-Adapter: Multi-view Consistent Image Generation Made Easy Paper • 2412.03632 • Published Dec 4, 2024 • 24
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation Paper • 2412.03558 • Published Dec 4, 2024 • 19
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion Paper • 2406.03184 • Published Jun 5, 2024 • 22
From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation Paper • 2404.15267 • Published Apr 23, 2024 • 4
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion Paper • 2312.06725 • Published Dec 11, 2023 • 1