Conferences >2017 International Joint Conf...

Fusing attention with visual question answering

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Visual Question Answering is a complex problem that fuses natural language and image processing to answer a question based on information from the image. The basic archit...Show More

Metadata

Abstract:

Visual Question Answering is a complex problem that fuses natural language and image processing to answer a question based on information from the image. The basic architecture for accomplishing this is using a CNN to extract features from the image and an RNN for the language processing, then combine the two in an MLP to produce an answer. These architectures perform well at identifying content, but fail at higher level reasoning such as spatial awareness and combining objects. To help remedy this, we propose using attention to divide the image into separate objects, then using the extracted features along with the location and size information to learn the MLP.

Published in: 2017 International Joint Conference on Neural Networks (IJCNN)

Date of Conference: 14-19 May 2017

Date Added to IEEE Xplore: 03 July 2017

ISBN Information:

Electronic ISSN: 2161-4407

DOI: 10.1109/IJCNN.2017.7965954

Conference Location: Anchorage, AK, USA

Contents

References is not available for this document.

Fusing attention with visual question answering

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Fusing attention with visual question answering

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?