Loading [a11y]/accessibility-menu.js
Cross-Attentional Spatio-Temporal Semantic Graph Networks for Video Question Answering | IEEE Journals & Magazine | IEEE Xplore

Cross-Attentional Spatio-Temporal Semantic Graph Networks for Video Question Answering


Abstract:

Due to the rich spatio-temporal visual content and complex multimodal relations, Video Question Answering (VideoQA) has become a challenging task and attracted increasing...Show More

Abstract:

Due to the rich spatio-temporal visual content and complex multimodal relations, Video Question Answering (VideoQA) has become a challenging task and attracted increasing attention. Current methods usually leverage visual attention, linguistic attention, or self-attention to uncover latent correlations between video content and question semantics. Although these methods exploit interactive information between different modalities to improve comprehension ability, inter- and intra-modality correlations cannot be effectively integrated in a uniform model. To address this problem, we propose a novel VideoQA model called Cross-Attentional Spatio-Temporal Semantic Graph Networks (CASSG). Specifically, a multi-head multi-hop attention module with diversity and progressivity is first proposed to explore fine-grained interactions between different modalities in a crossing manner. Then, heterogeneous graphs are constructed from the cross-attended video frames, clips, and question words, in which the multi-stream spatio-temporal semantic graphs are designed to synchronously reasoning inter- and intra-modality correlations. Last, the global and local information fusion method is proposed to coalesce the local reasoning vector learned from multi-stream spatio-temporal semantic graphs and the global vector learned from another branch to infer the answer. Experimental results on three public VideoQA datasets confirm the effectiveness and superiority of our model compared with state-of-the-art methods.
Published in: IEEE Transactions on Image Processing ( Volume: 31)
Page(s): 1684 - 1696
Date of Publication: 19 January 2022

ISSN Information:

PubMed ID: 35044914

Funding Agency:


References

References is not available for this document.