---
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
license: mit
pipeline_tag: video-text-to-text
library_name: transformers
---

This repository contains the model described in [Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence](https://huggingface.co/papers/2505.23747).

Project page: https://diankun-wu.github.io/Spatial-MLLM/

Code: https://github.com/diankun-wu/Spatial-MLLM