Attend, Correct And Focus: A Bidirectional Correct Attention Network For Image-Text Matching | IEEE Conference Publication | IEEE Xplore

Attend, Correct And Focus: A Bidirectional Correct Attention Network For Image-Text Matching


Abstract:

Image-text matching task aims to learn the fine-grained correspondences between images and sentences. Existing methods use attention mechanism to learn the correspondence...Show More

Abstract:

Image-text matching task aims to learn the fine-grained correspondences between images and sentences. Existing methods use attention mechanism to learn the correspondences by attending to all fragments without considering the relationship between fragments and global semantics, which inevitably lead to semantic misalignment among irrelevant fragments. To this end, we propose a Bidirectional Correct Attention Network (BCAN), which leverages global similarities and local similarities to reassign the attention weight, to avoid such semantic misalignment. Specifically, we introduce a global correct unit to correct the attention focused on relevant fragments in irrelevant semantics. A local correct unit is used to correct the attention focused on irrelevant fragments in relevant semantics. Experiments on Flickr30K and MSCOCO datasets verify the effectiveness of our proposed BCAN by outperforming both previous attention-based methods and state-of-the-art methods. Code can be found at: https://github.com/liuyyy111/BCAN.
Date of Conference: 19-22 September 2021
Date Added to IEEE Xplore: 23 August 2021
ISBN Information:

ISSN Information:

Conference Location: Anchorage, AK, USA

Contact IEEE to Subscribe

References

References is not available for this document.