GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
Abstract
Despite remarkable advancements in video depth estimation, existing methods exhibit inherent limitations in achieving geometric fidelity through the affine-invariant predictions, limiting their applicability in reconstruction and other metrically grounded downstream tasks. We propose GeometryCrafter, a novel framework that recovers high-fidelity point map sequences with temporal coherence from open-world videos, enabling accurate 3D/4D reconstruction, camera parameter estimation, and other depth-based applications. At the core of our approach lies a point map Variational Autoencoder (VAE) that learns a latent space agnostic to video latent distributions for effective point map encoding and decoding. Leveraging the VAE, we train a video diffusion model to model the distribution of point map sequences conditioned on the input videos. Extensive evaluations on diverse datasets demonstrate that GeometryCrafter achieves state-of-the-art 3D accuracy, temporal consistency, and generalization capability.
Community
video-to-4d, Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Can Video Diffusion Model Reconstruct 4D Geometry? (2025)
- TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models (2025)
- DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation (2025)
- Aether: Geometric-Aware Unified World Modeling (2025)
- Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception (2025)
- MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction (2025)
- Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper