Reconstructing 4D Spatial Intelligence: A Survey
Abstract
A survey organizes methods for reconstructing 4D spatial intelligence from visual observations into five progressive levels, offering analysis and identifying future research directions.
Reconstructing 4D spatial intelligence from visual observations has long been a central yet challenging task in computer vision, with broad real-world applications. These range from entertainment domains like movies, where the focus is often on reconstructing fundamental visual elements, to embodied AI, which emphasizes interaction modeling and physical realism. Fueled by rapid advances in 3D representations and deep learning architectures, the field has evolved quickly, outpacing the scope of previous surveys. Additionally, existing surveys rarely offer a comprehensive analysis of the hierarchical structure of 4D scene reconstruction. To address this gap, we present a new perspective that organizes existing methods into five progressive levels of 4D spatial intelligence: (1) Level 1 -- reconstruction of low-level 3D attributes (e.g., depth, pose, and point maps); (2) Level 2 -- reconstruction of 3D scene components (e.g., objects, humans, structures); (3) Level 3 -- reconstruction of 4D dynamic scenes; (4) Level 4 -- modeling of interactions among scene components; and (5) Level 5 -- incorporation of physical laws and constraints. We conclude the survey by discussing the key challenges at each level and highlighting promising directions for advancing toward even richer levels of 4D spatial intelligence. To track ongoing developments, we maintain an up-to-date project page: https://github.com/yukangcao/Awesome-4D-Spatial-Intelligence.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey (2025)
- Sparse-View 3D Reconstruction: Recent Advances and Open Challenges (2025)
- Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation (2025)
- E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models (2025)
- From 2D to 3D Cognition: A Brief Survey of General World Models (2025)
- Review of Feed-forward 3D Reconstruction: From DUSt3R to VGGT (2025)
- DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper