Ted412
/

Video-RTS

Safetensors

English

Model card Files Files and versions

xet

Community

Ted412 commited on Aug 1

Commit

a0e6a1a

verified ·

1 Parent(s): 5aa18e3

Update README.md

Browse files

Files changed (1) hide show

README.md +85 -7

README.md CHANGED Viewed

@@ -1,7 +1,85 @@
----
-license: openrail
-language:
-- en
-base_model:
-- Qwen/Qwen2.5-VL-7B-Instruct
----

+# VIDEO-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
+This is the official implementation for Video-RTS.
+[![Project Website](https://img.shields.io/badge/Project-Website-blue)](https://sites.google.com/cs.unc.edu/videorts2025/) [![arXiv](https://img.shields.io/badge/arXiv-2507.06485-b31b1b.svg)](https://arxiv.org/abs/2507.06485) [![HuggingFace](https://img.shields.io/badge/🤗-HuggingFace%20-cyan.svg)](https://huggingface.co/Ted412/Video-RTS)
+### Authors: [Ziyang Wang*](https://ziyangw2000.github.io/), [Jaehong Yoon*](https://jaehong31.github.io/), [Shoubin Yu](https://yui010206.github.io/), [Md Mohaiminul Islam](https://md-mohaiminul.github.io/), [Gedas Bertasius](https://www.gedasbertasius.com/), [Mohit Bansal](https://www.cs.unc.edu/~mbansal/)
+### University of North Carolina at Chapel Hill
+We introduce Video-RTS, a new approach to improve video reasoning capability with drastically improved data efficiency by combining data-efficient RL with a video-adaptive test-time scaling (TTS) strategy.
+## **Installation**
+```bash
+git clone https://github.com/Ziyang412/Video-RTS.git
+cd Video-RTS
+# build environment
+conda create -n video-rts python=3.11
+conda activate video-rts
+bash setup.sh
+# qwen video extraction setting, e.g., max frames, resolutions
+# Use the [decord] feature to improve speed
+cd src/qwen-vl-utils
+pip install -e .[decord]
+cd ..
+```
+Following Video-R1, please install the provided version of transformers
+```bash
+unzip transformers-main.zip
+cd ./transformers-main
+pip install .
+```
+## **Download Dataset**
+Please refer to the official github of each dataset for video downloading.
+For evaluation, we provide the annotation file in `./src/r1-v/Evaluation` and please refer to the `./src/r1-v/Evaluation/path_coversion.py` to update the video path.
+For training, we provided the training data annotation in `./src/training_data` and please refer to the [CG-Bench](https://huggingface.co/datasets/CG-Bench/CG-Bench) repo for video data
+## **Download Video-RTS model checkpoint**
+We provided the model checkpoint in [Huggingface](https://huggingface.co/Ted412/Video-RTS), noted that the model is only trained on about 2k samples but yield similar performance with the 6k sample training.
+## **Video-RTS Training**
+We use the [Open-R1-Video](https://github.com/Wang-Xiaodong1899/Open-R1-Video) as trainig codebased. We provided our modification files in `./src/training_files` so please replace the exact same files in the original repo. You could also use the [Video-R1](https://github.com/tulerfeng/Video-R1/tree/main) as training codebase, we find the results are similar.
+## **Inference with S2D Video TTS**
+Please update the input model / file name / output file in the given bash file. After running the inference code, please update the json_path in `cal_results_acc.py` to calculate the final video reasoning accuracy.
+```bash
+bash src/video_rts_eval.sh
+python src/cal_results_acc.py
+```
+## Acknowledgments
+We thank the developers of [Open-R1-Video](https://github.com/Wang-Xiaodong1899/Open-R1-Video), [Video-R1](https://github.com/tulerfeng/Video-R1/tree/main), [Qwen-2.5-VL](https://github.com/QwenLM/Qwen2.5-VL/tree/main) and [TRL](https://github.com/huggingface/trl) for their public code release.
+# Reference
+Please cite our paper if you use our models in your works:
+```bibtex
+@misc
+{wang2025videortsrethinkingreinforcementlearning,
+title={Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning},
+author={Ziyang Wang and Jaehong Yoon and Shoubin Yu and Md Mohaiminul Islam and Gedas Bertasius and Mohit Bansal},
+year={2025},
+eprint={2507.06485},
+archivePrefix={arXiv},
+primaryClass={cs.CV},
+url={https://arxiv.org/abs/2507.06485},
+}