Improve model card: Add pipeline tag, paper, project page, code, usage, and more

This PR significantly improves the model card for Track-On2 by adding:

* **Metadata**:
* `pipeline_tag: keypoint-detection` to ensure the model is discoverable under relevant tasks at https://huggingface.co/models?pipeline_tag=keypoint-detection.
* **Content**:
* A descriptive overview of the model.
* Links to the paper (https://huggingface.co/papers/2509.19115), the project page (https://kuis-ai.github.io/track_on2), and the official GitHub repository (https://github.com/gorkaydemir/track_on).
* Details on the available pretrained models.
* A "Usage" section with a Python code snippet and instructions from the original GitHub README to demonstrate how to run the model.
* Citation information and acknowledgments.

These additions will make the model much more accessible and informative for the Hugging Face community.

Files changed (1) hide show

README.md +108 -3

README.md CHANGED Viewed

@@ -1,3 +1,108 @@
----
-license: mit
----

+---
+license: mit
+pipeline_tag: keypoint-detection
+---
+# Track-On2: Enhancing Online Point Tracking with Memory
+[📚 Paper](https://huggingface.co/papers/2509.19115) - [🌐 Project Page](https://kuis-ai.github.io/track_on2) - [💻 Code](https://github.com/gorkaydemir/track_on)
+## Overview
+**Track-On2** is an efficient **online point tracking** model that processes videos **frame-by-frame** with a compact transformer memory—no future frames, no windows. Track-On2 builds on this with improved accuracy and efficiency.
+<p align="center">
+  <img src="https://github.com/gorkaydemir/track_on/raw/main/media/teaser.png" alt="Track-On Overview" width="800" />
+</p>
+## Pretrained models
+We provide two pretrained **Track-On2** checkpoints, each using a different backbone:
+- **Track-On2 with DINOv3**
+  [Download here](https://huggingface.co/gorkaydemir/track_on2/resolve/main/trackon2_dinov3_checkpoint.pt?download=true)
+  This checkpoint uses the **DINOv3** visual backbone.
+  - To use it, you must separately obtain the official pretrained DINOv3 weights of [dinov3-vits16plus](https://huggingface.co/facebook/dinov3-vits16plus-pretrain-lvd1689m) by requesting access through Hugging Face.
+  - Our released checkpoints **do not include** backbone weights in order to comply with DINOv3’s licensing and distribution policy.
+- **Track-On2 with DINOv2**
+  [Download here](https://huggingface.co/gorkaydemir/track_on2/resolve/main/trackon2_dinov2_checkpoint.pt?download=true)
+  No additional permissions or downloads are needed.
+  - It offers competitive, often comparable (or stronger) performance to the DINOv3 variant.
+  - Recommended if you want a quick setup without external dependencies.
+## Usage
+You can track points on a video using the **`Predictor`** class.
+### Minimal example
+```python
+import torch
+from model.trackon_predictor import Predictor
+device = "cuda" if torch.cuda.is_available() else "cpu"
+# Initialize
+model = Predictor(args, checkpoint_path="path/to/checkpoint.pth").to(device).eval()
+# Inputs
+# video:   (1, T, 3, H, W) in range 0-255
+# queries: (1, N, 3) with rows = (t, x, y) in pixel coordinates
+#          or use None to enable the model's uniform grid querying
+video = ...          # e.g., torchvision.io.read_video -> (T, H, W, 3) -> (T, 3, H, W) -> add batch dim
+queries = ...        # e.g., torch.tensor([[0, 190, 190], [0, 200, 190], ...]).unsqueeze(0).to(device)
+# Inference
+traj, vis = model(video, queries)
+# Outputs
+# traj: (1, T, N, 2)  -> per-point (x, y) in pixels
+# vis:  (1, T, N)     -> per-point visibility in {0, 1}
+```
+### Using `demo.py`
+A ready-to-run script ([`demo.py`](https://github.com/gorkaydemir/track_on/blob/main/demo.py)) handles loading, preprocessing, inference, and visualization.
+Given:
+- `$video_path`: Path to the input video file (e.g., `.mp4`)
+- `$config_path`: Config file of the model with `yaml` extension (default: `./config/test.yaml`)
+- `$ckpt_path`: Path to the Track-On2 checkpoint (`.pth`)
+- `$output_path`: Path to save the rendered tracking video (e.g., `demo_output.mp4`)
+- `$use_grid`: Whether to use a uniform grid of queries (`true` or `false`)
+you can run the demo by
+```bash
+python demo.py \
+--video $video_path \
+--config $config_path \
+--ckpt $ckpt_path \
+--output $output_path \
+--use-grid $use_grid
+```
+Running the model with uniform grid queries on the video at `media/sample.mp4` produces the visualization shown below.
+<p align="center">
+  <img src="https://github.com/gorkaydemir/track_on/raw/main/media/demo_output.gif" alt="Sample Tracking" width="300" />
+</p>
+## Citation
+If you find this work useful, please cite:
+```bibtex
+@article{Aydemir2025TrackOn2,
+  title={{Track-On2}: Enhancing Online Point Tracking with Memory},
+  author={Aydemir, G\"orkay and Xie, Weidi and G\"uney, Fatma},
+  journal={arXiv preprint arXiv:2509.19115},
+  year={2025}
+}
+```
+```bibtex
+@InProceedings{Aydemir2025TrackOn,
+  title     = {{Track-On}: Transformer-based Online Point Tracking with Memory},
+  author    = {Aydemir, G\"orkay and Cai, Xiongyi and Xie, Weidi and G\"uney, Fatma},
+  booktitle = {The Thirteenth International Conference on Learning Representations},
+  year      = {2025}
+}
+```
+## Acknowledgments
+This repository incorporates code from public works including [CoTracker](https://github.com/facebookresearch/co-tracker), [TAPNet](https://github.com/google-deepmind/tapnet), [DINOv2](https://github.com/facebookresearch/dinov2), [ViT-Adapter](https://github.com/czczup/ViT-Adapter), and [SPINO](https://github.com/robot-learning-freiburg/SPINO). We thank the authors for making their code available.