facebook/nwm · Hugging Face

Navigation World Models, CVPR 2025 (Oral)

Paper

This repo contains pretrained models of Navigation World Models- the Conditional Diffusion Transformer (CDiT) model training code. See the project page for additional results.

Navigation World Models
Amir Bar, Gaoyue "Kathy" Zhou, Danny Tran, Trevor Darrell, Yann LeCun
AI at Meta, UC Berkeley, New York University

Pretrained Models

Model type	# Parameters	Training Steps	Datasets	Link
CDiT/XL	1B	100k	RECON, SCAND, TartanDrive, HuRoN	Link
CDiT/XL	1B	200k	RECON, SCAND, TartanDrive, HuRoN, +Ego4D	Link

Note: All models were retrained after face blurring on the training data. Thus, results might vary compared to the main paper.

BibTeX

@article{bar2024navigation,
  title={Navigation world models},
  author={Bar, Amir and Zhou, Gaoyue and Tran, Danny and Darrell, Trevor and LeCun, Yann},
  journal={arXiv preprint arXiv:2412.03572},
  year={2024}
}

Acknowledgments

We thank Noriaki Hirose for his help with the HuRoN dataset and for sharing his insights, and to Manan Tomar, David Fan, Sonia Joseph, Angjoo Kanazawa, Ethan Weber, Nicolas Ballas, and the anonymous reviewers for their helpful discussions and feedback.

License

The code and model weights are licensed under Creative Commons Attribution-NonCommercial 4.0 International. See LICENSE.txt for details.