--- base_model: - Qwen/Qwen2.5-VL-3B-Instruct license: mit pipeline_tag: video-text-to-text library_name: transformers --- This repository contains the model described in [Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence](https://huggingface.co/papers/2505.23747). Project page: https://diankun-wu.github.io/Spatial-MLLM/ Code: https://github.com/diankun-wu/Spatial-MLLM