Loading [a11y]/accessibility-menu.js
Unifying the Video and Question Attentions for Open-Ended Video Question Answering | IEEE Journals & Magazine | IEEE Xplore

Unifying the Video and Question Attentions for Open-Ended Video Question Answering


Abstract:

Video question answering is an important task toward scene understanding and visual data retrieval. However, current visual question answering works mainly focus on a sin...Show More

Abstract:

Video question answering is an important task toward scene understanding and visual data retrieval. However, current visual question answering works mainly focus on a single static image, which is distinct from the dynamic and sequential visual data in the real world. Their approaches cannot utilize the temporal information in videos. In this paper, we introduce the task of free-form open-ended video question answering. The open-ended answers enable wider applications compared with the common multiple-choice tasks in Visual-QA. We first propose a data set for open-ended Video-QA with the automatic question generation approaches. Then, we propose our sequential video attention and temporal question attention models. These two models apply the attention mechanism on videos and questions, while preserving the sequential and temporal structures of the guides. The two models are integrated into the model of unified attention. After the video and the question are encoded, the answers are generated wordwisely from our models by a decoder. In the end, we evaluate our models on the proposed data set. The experimental results demonstrate the effectiveness of our proposed model.
Published in: IEEE Transactions on Image Processing ( Volume: 26, Issue: 12, December 2017)
Page(s): 5656 - 5666
Date of Publication: 29 August 2017

ISSN Information:

PubMed ID: 28866494

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.