arxiv:2509.26645

TTT3R: 3D Reconstruction as Test-Time Training

Published on Sep 30

· Submitted by

Xingyu Chen on Oct 1

Upvote

Authors:

Xingyu Chen ,

Yue Chen ,

Yuliang Xiu ,

Abstract

TTT3R, a test-time training intervention, enhances length generalization in 3D reconstruction by dynamically adjusting memory updates based on alignment confidence, improving global pose estimation and processing efficiency.

AI-generated summary

Modern Recurrent Neural Networks have become a competitive architecture for 3D reconstruction due to their linear-time complexity. However, their performance degrades significantly when applied beyond the training context length, revealing limited length generalization. In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. Building on this perspective, we leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate for memory updates, to balance between retaining historical information and adapting to new observations. This training-free intervention, termed TTT3R, substantially improves length generalization, achieving a 2times improvement in global pose estimation over baselines, while operating at 20 FPS with just 6 GB of GPU memory to process thousands of images. Code available in https://rover-xingyu.github.io/TTT3R

View arXiv page View PDF Project page GitHub 402 Add to collection

Community

rover-xingyu

Paper author Paper submitter 12 days ago

Modern Recurrent Neural Networks have become a competitive architecture for 3D reconstruction due to their linear-time complexity. However, their performance degrades significantly when applied beyond the training context length, revealing limited length generalization. In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. Building on this perspective, we leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate for memory updates, to balance between retaining historical information and adapting to new observations. This training-free intervention, termed TTT3R, substantially improves length generalization, achieving a improvement in global pose estimation over baselines, while operating at 20 FPS with just 6 GB of GPU memory to process thousands of images.