VideoSSR: Video Self-Supervised Reinforcement Learning
Paper
•
2511.06281
•
Published
•
24
VideoSSR-8B is a multimodal large language model (MLLM) fine-tuned from Qwen-VL-8B-Instruct for enhanced video understanding. It is trained using a novel Video Self-Supervised Reinforcement Learning (VideoSSR) framework, which generates its own high-quality training data directly from videos, eliminating the need for manual annotation.
Qwen-VL-8B-Instruct