Journals & Magazines >IEEE Transactions on Neural N... >Volume: 32 Issue: 9

Adversarial Learning With Multi-Modal Attention for Visual Question Answering

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Visual question answering (VQA) has been proposed as a challenging task and attracted extensive research attention. It aims to learn a joint representation of the questio...Show More

Metadata

Abstract:

Visual question answering (VQA) has been proposed as a challenging task and attracted extensive research attention. It aims to learn a joint representation of the question–image pair for answer inference. Most of the existing methods focus on exploring the multi-modal correlation between the question and image to learn the joint representation. However, the answer-related information is not fully captured by these methods, which results that the learned representation is ineffective to reflect the answer of the question. To tackle this problem, we propose a novel model, i.e., adversarial learning with multi-modal attention (ALMA), for VQA. An adversarial learning-based framework is proposed to learn the joint representation to effectively reflect the answer-related information. Specifically, multi-modal attention with the Siamese similarity learning method is designed to build two embedding generators, i.e., question–image embedding and question–answer embedding. Then, adversarial learning is conducted as an interplay between the two embedding generators and an embedding discriminator. The generators have the purpose of generating two modality-invariant representations for the question–image and question–answer pairs, whereas the embedding discriminator aims to discriminate the two representations. Both the multi-modal attention module and the adversarial networks are integrated into an end-to-end unified framework to infer the answer. Experiments performed on three benchmark data sets confirm the favorable performance of ALMA compared with state-of-the-art approaches.

Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 32, Issue: 9, September 2021)

Page(s): 3894 - 3908

Date of Publication: 24 August 2020

ISSN Information:

PubMed ID: 32833656

DOI: 10.1109/TNNLS.2020.3016083

Funding Agency:

Contents

References is not available for this document.

Adversarial Learning With Multi-Modal Attention for Visual Question Answering

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Adversarial Learning With Multi-Modal Attention for Visual Question Answering

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?