Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data
Abstract
A novel real-to-sim framework merges 3D Gaussian Splatting and object meshes for accurate physics simulation, refining geometry, appearance, and robot poses from raw trajectories.
Creating accurate, physical simulations directly from real-world robot motion holds great value for safe, scalable, and affordable robot learning, yet remains exceptionally challenging. Real robot data suffers from occlusions, noisy camera poses, dynamic scene elements, which hinder the creation of geometrically accurate and photorealistic digital twins of unseen objects. We introduce a novel real-to-sim framework tackling all these challenges at once. Our key insight is a hybrid scene representation merging the photorealistic rendering of 3D Gaussian Splatting with explicit object meshes suitable for physics simulation within a single representation. We propose an end-to-end optimization pipeline that leverages differentiable rendering and differentiable physics within MuJoCo to jointly refine all scene components - from object geometry and appearance to robot poses and physical parameters - directly from raw and imprecise robot trajectories. This unified optimization allows us to simultaneously achieve high-fidelity object mesh reconstruction, generate photorealistic novel views, and perform annotation-free robot pose calibration. We demonstrate the effectiveness of our approach both in simulation and on challenging real-world sequences using an ALOHA 2 bi-manual manipulator, enabling more practical and robust real-to-simulation pipelines.
Community
We're excited to share our work on bridging the real-to-sim gap in robotics. Creating accurate, physics-ready simulations from real robot data is very challenging, especially when using low-cost hardware. Methods like 3D Gaussian Splatting (3DGS) are fantastic for photorealism, but their representations aren't directly compatible with physics engines. In our paper we introduce SplatMesh, a hybrid scene representation that combines 3DGS with explicit, physics-ready triangle meshes. We embed this in a fully differentiable, end-to-end framework that uses both differentiable rendering and differentiable physics with MuJoCo MJX.
This allows us to use raw RGB images to simultaneously refine everything: the object's geometry and appearance, the robot's pose, and the camera parameters. We also show that our approach and representation is comprehensive and allows to generate new assets, both in terms of Gaussian Splats and meshes. We demonstrate our method on a real ALOHA 2 manipulator, successfully reconstructing high-fidelity 3D assets from its imperfect trajectory data.
We hope this work makes creating high-fidelity digital twins more practical and robust!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DRAWER: Digital Reconstruction and Articulation With Environment Realism (2025)
- Sparfels: Fast Reconstruction from Sparse Unposed Imagery (2025)
- 4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians (2025)
- SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting (2025)
- Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation (2025)
- ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos (2025)
- AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper