arxiv:2210.05423

Learning to Locate Visual Answer in Video Corpus Using Question

Published on Oct 11, 2022

Authors:

Abstract

A new task, VCVAL, is introduced for locating visual answers in instructional videos using natural language questions, with a proposed CCGS method that outperforms existing approaches.

AI-generated summary

We introduce a new task, named video corpus visual answer localization (VCVAL), which aims to locate the visual answer in a large collection of untrimmed instructional videos using a natural language question. This task requires a range of skills - the interaction between vision and language, video retrieval, passage comprehension, and visual answer localization. In this paper, we propose a cross-modal contrastive global-span (CCGS) method for the VCVAL, jointly training the video corpus retrieval and visual answer localization subtasks with the global-span matrix. We have reconstructed a dataset named MedVidCQA, on which the VCVAL task is benchmarked. Experimental results show that the proposed method outperforms other competitive methods both in the video corpus retrieval and visual answer localization subtasks. Most importantly, we perform detailed analyses on extensive experiments, paving a new path for understanding the instructional videos, which ushers in further research.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2210.05423 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2210.05423 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2210.05423 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.