Navigation World Models, CVPR 2025 (Oral)
Paper
This repo contains pretrained models of Navigation World Models- the Conditional Diffusion Transformer (CDiT) model training code. See the project page for additional results.
Navigation World Models
Amir Bar, Gaoyue "Kathy" Zhou, Danny Tran, Trevor Darrell, Yann LeCun
AI at Meta, UC Berkeley, New York University
Pretrained Models
Model type | # Parameters | Training Steps | Datasets | Link |
---|---|---|---|---|
CDiT/XL | 1B | 100k | RECON, SCAND, TartanDrive, HuRoN | Link |
CDiT/XL | 1B | 200k | RECON, SCAND, TartanDrive, HuRoN, +Ego4D | Link |
Note: All models were retrained after face blurring on the training data. Thus, results might vary compared to the main paper.
BibTeX
@article{bar2024navigation,
title={Navigation world models},
author={Bar, Amir and Zhou, Gaoyue and Tran, Danny and Darrell, Trevor and LeCun, Yann},
journal={arXiv preprint arXiv:2412.03572},
year={2024}
}
Acknowledgments
We thank Noriaki Hirose for his help with the HuRoN dataset and for sharing his insights, and to Manan Tomar, David Fan, Sonia Joseph, Angjoo Kanazawa, Ethan Weber, Nicolas Ballas, and the anonymous reviewers for their helpful discussions and feedback.
License
The code and model weights are licensed under Creative Commons Attribution-NonCommercial 4.0 International. See LICENSE.txt
for details.