StreamFormer
/

streamformer-timesformer

Video Classification

Model card Files Files and versions

StreamFormer commited on Aug 10

Commit

6972901

·

verified ·

1 Parent(s): 2188a2d

Update README.md

Files changed (1) hide show

README.md +49 -3

README.md CHANGED Viewed

@@ -1,3 +1,49 @@
----
-license: cc-by-4.0
----

+---
+license: "cc-by-nc-4.0"
+tags:
+- vision
+- video-classification
+---
+# StreamFormer (base-sized model)
+StreamFormer backbone model pre-trained on *Global*-, *Temporal*- and *Spatial*- granularities. It was introduced in the paper [Learning Streaming Video Representation via Multitask Training](https://arxiv.org/abs/2504.20041) and first released in [this repository](https://github.com/Go2Heart/StreamFormer).
+## Intended uses & limitations
+StreamFormer is a streaming video representation backbone that encodes a stream of video input. It is designed for multiple downstream applications like Online Action Detection, Online Video Instance Segmentation and Video Question Answering.
+### How to use
+How to get the multi-granularity feature:
+```python
+from models import TimesformerMultiTaskingModelSigLIP
+import torch
+model = TimesformerMultiTaskingModelSigLIP.from_pretrained("StreamFormer/streamformer-timesformer").eval()
+with torch.no_grad():
+    fake_frames = torch.randn(1, 16, 3, 224, 224)
+    fake_frames = fake_frames.to(model.device)
+    output = model(fake_frames)
+    # global representation [B, D]
+    print(output.pooler_output[:,-1].shape, output.pooler_output[:,-1])
+    # temporal representation [B, T, D]
+    print(output.pooler_output.shape, output.pooler_output)
+    # spatial representation [B, T, HxW, D]
+    print(output.last_hidden_state.shape, output.last_hidden_state)
+```
+### BibTeX entry and citation info
+```bibtex
+@misc{yan2025learning,
+        title={Learning Streaming Video Representation via Multitask Training},
+        author={Yibin Yan and Jilan Xu and Shangzhe Di and Yikun Liu and Yudi Shi and Qirui Chen and Zeqian Li and Yifei Huang and Weidi Xie},
+        year={2025},
+        eprint={2504.20041},
+        archivePrefix={arXiv},
+        primaryClass={cs.CV}
+}
+```