arxiv:2507.09313

ProactiveBench: A Comprehensive Benchmark Evaluating Proactive Interactions in Video Large Language Models

Published on Jul 12

Authors:

Abstract

A new benchmark called ProactiveBench and metric PAUC are introduced to evaluate proactive interaction in multimodal dialogue systems, providing a more accurate assessment of user experience compared to traditional metrics.

AI-generated summary

With the growing research focus on multimodal dialogue systems, the capability for proactive interaction is gradually gaining recognition. As an alternative to conventional turn-by-turn dialogue, users increasingly expect multimodal systems to be more initiative, for example, by autonomously determining the timing of multi-turn responses in real time during video playback. To facilitate progress in this emerging area, we introduce ProactiveBench, the first comprehensive benchmark to evaluate a system's ability to engage in proactive interaction. Since model responses are generated at varying timestamps, we further propose PAUC, the first metric that accounts for the temporal dynamics of model responses. This enables a more accurate evaluation of systems operating in proactive settings. Through extensive benchmarking of various baseline systems on ProactiveBench and a user study of human preferences, we show that PAUC is in better agreement with human preferences than traditional evaluation metrics, which typically only consider the textual content of responses. These findings demonstrate that PAUC provides a more faithful assessment of user experience in proactive interaction scenarios. Project homepage: https://github.com/yellow-binary-tree/ProactiveBench

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.09313 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.09313 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.