license: mit
pipeline_tag: keypoint-detection
Track-On2: Enhancing Online Point Tracking with Memory
📚 Paper - 🌐 Project Page - 💻 Code
Overview
Track-On2 is an efficient online point tracking model that processes videos frame-by-frame with a compact transformer memory—no future frames, no windows. Track-On2 builds on this with improved accuracy and efficiency.
Pretrained models
We provide two pretrained Track-On2 checkpoints, each using a different backbone:
Track-On2 with DINOv3 Download here This checkpoint uses the DINOv3 visual backbone.
- To use it, you must separately obtain the official pretrained DINOv3 weights of dinov3-vits16plus by requesting access through Hugging Face.
- Our released checkpoints do not include backbone weights in order to comply with DINOv3’s licensing and distribution policy.
Track-On2 with DINOv2 Download here No additional permissions or downloads are needed.
- It offers competitive, often comparable (or stronger) performance to the DINOv3 variant.
- Recommended if you want a quick setup without external dependencies.
Usage
You can track points on a video using the Predictor
class.
Minimal example
import torch
from model.trackon_predictor import Predictor
device = "cuda" if torch.cuda.is_available() else "cpu"
# Initialize
model = Predictor(args, checkpoint_path="path/to/checkpoint.pth").to(device).eval()
# Inputs
# video: (1, T, 3, H, W) in range 0-255
# queries: (1, N, 3) with rows = (t, x, y) in pixel coordinates
# or use None to enable the model's uniform grid querying
video = ... # e.g., torchvision.io.read_video -> (T, H, W, 3) -> (T, 3, H, W) -> add batch dim
queries = ... # e.g., torch.tensor([[0, 190, 190], [0, 200, 190], ...]).unsqueeze(0).to(device)
# Inference
traj, vis = model(video, queries)
# Outputs
# traj: (1, T, N, 2) -> per-point (x, y) in pixels
# vis: (1, T, N) -> per-point visibility in {0, 1}
Using demo.py
A ready-to-run script (demo.py
) handles loading, preprocessing, inference, and visualization.
Given:
$video_path
: Path to the input video file (e.g.,.mp4
)$config_path
: Config file of the model withyaml
extension (default:./config/test.yaml
)$ckpt_path
: Path to the Track-On2 checkpoint (.pth
)$output_path
: Path to save the rendered tracking video (e.g.,demo_output.mp4
)$use_grid
: Whether to use a uniform grid of queries (true
orfalse
)
you can run the demo by
python demo.py \
--video $video_path \
--config $config_path \
--ckpt $ckpt_path \
--output $output_path \
--use-grid $use_grid
Running the model with uniform grid queries on the video at media/sample.mp4
produces the visualization shown below.
Citation
If you find this work useful, please cite:
@article{Aydemir2025TrackOn2,
title={{Track-On2}: Enhancing Online Point Tracking with Memory},
author={Aydemir, G\"orkay and Xie, Weidi and G\"uney, Fatma},
journal={arXiv preprint arXiv:2509.19115},
year={2025}
}
@InProceedings{Aydemir2025TrackOn,
title = {{Track-On}: Transformer-based Online Point Tracking with Memory},
author = {Aydemir, G\"orkay and Cai, Xiongyi and Xie, Weidi and G\"uney, Fatma},
booktitle = {The Thirteenth International Conference on Learning Representations},
year = {2025}
}
Acknowledgments
This repository incorporates code from public works including CoTracker, TAPNet, DINOv2, ViT-Adapter, and SPINO. We thank the authors for making their code available.