arxiv:2504.18458

Fast-Slow Thinking for Large Vision-Language Model Reasoning

Published on Apr 25

Authors:

Abstract

Recent advances in large vision-language models (LVLMs) have revealed an overthinking phenomenon, where models generate verbose reasoning across all tasks regardless of questions. To address this issue, we present FAST, a novel Fast-Slow Thinking framework that dynamically adapts reasoning depth based on question characteristics. Through empirical analysis, we establish the feasibility of fast-slow thinking in LVLMs by investigating how response length and data distribution affect performance. We develop FAST-GRPO with three components: model-based metrics for question characterization, an adaptive thinking reward mechanism, and difficulty-aware KL regularization. Experiments across seven reasoning benchmarks demonstrate that FAST achieves state-of-the-art accuracy with over 10\% relative improvement compared to the base model, while reducing token usage by 32.7-67.3\% compared to previous slow-thinking approaches, effectively balancing reasoning length and accuracy.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2504.18458 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2504.18458 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2504.18458 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.