metadata
license: mit
tags:
- Multimodal Large Language Model (MLLM)
- Visual Grounding
- Reinforcement Fine-tuning
UniVG-R1 Model Card
Model details
We propose UniVG-R1, a reasoning-guided MLLM for universal visual grounding, which leverages reinforcement learning to enhance reasoning across complex multi-image and multi-modal scenarios.