Vision And Text Transformer For Predicting Answerability On Visual Question Answering | IEEE Conference Publication | IEEE Xplore

Vision And Text Transformer For Predicting Answerability On Visual Question Answering


Abstract:

Answerability on Visual Question Answering is a novel and attractive task to predict answerable scores between images and questions in multi-modal data. Existing works of...Show More

Abstract:

Answerability on Visual Question Answering is a novel and attractive task to predict answerable scores between images and questions in multi-modal data. Existing works often utilize a binary mapping from visual question answering systems into Answerability. It does not reflect the essence of this problem. Together with our consideration of Answerability in a regression task, we propose VT-Transformer, which exploits visual and textual features through Transformer architecture. Experimental results on VizWiz 2020 dataset show the effectiveness and robustness of VT-Transformer for Answerability on Visual Question Answering when comparing with competitive baselines.
Date of Conference: 19-22 September 2021
Date Added to IEEE Xplore: 23 August 2021
ISBN Information:

ISSN Information:

Conference Location: Anchorage, AK, USA

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.