arxiv:2503.10905

Learning to Inference Adaptively for Multimodal Large Language Models

Published on Mar 13

· Submitted by

Authors:

Abstract

Multimodal Large Language Models (MLLMs) have shown impressive capabilities in reasoning, yet come with substantial computational cost, limiting their deployment in resource-constrained settings. Despite recent efforts on improving the efficiency of MLLMs, prior solutions fall short in responding to varying runtime conditions, in particular changing resource availability (e.g., contention due to the execution of other programs on the device). To bridge this gap, we introduce AdaLLaVA, an adaptive inference framework that learns to dynamically reconfigure operations in an MLLM during inference, accounting for the input data and a latency budget. We conduct extensive experiments across benchmarks involving question-answering, reasoning, and hallucination. Our results show that AdaLLaVA effectively adheres to input latency budget, achieving varying accuracy and latency tradeoffs at runtime. Further, we demonstrate that AdaLLaVA adapts to both input latency and content, can be integrated with token selection for enhanced efficiency, and generalizes across MLLMs. Our project webpage with code release is at https://zhuoyan-xu.github.io/ada-llava/.

View arXiv page View PDF Project page GitHub repository Add to collection

Community

zhuoyanxu

Paper submitter about 11 hours ago

•

edited about 11 hours ago

Project: https://zhuoyan-xu.github.io/ada-llava/
Paper: https://arxiv.org/abs/2503.10905
Code: https://github.com/zhuoyan-xu/AdaLLaVA

We present AdaLLaVA, a learning-based framework for adaptive inference in MLLMs. As shown in Figure, given an input image, a text query, and a latency budget, AdaLLaVA empowers an MLLM to answer the query about the image while adhering to the specified budget — a capability unattainable with the base MLLM.

librarian-bot

about 2 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.10905 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.10905 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.10905 in a Space README.md to link it from this page.