Loading [a11y]/accessibility-menu.js
Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning | IEEE Journals & Magazine | IEEE Xplore

Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning


Abstract:

Self-attention (SA) based networks have achieved great success in image captioning, constantly dominating the leaderboards of online benchmarks. However, existing SA netw...Show More

Abstract:

Self-attention (SA) based networks have achieved great success in image captioning, constantly dominating the leaderboards of online benchmarks. However, existing SA networks still suffer from distance insensitivity and low-rank bottleneck. In this paper, we aim to optimize SA in terms of two aspects, thereby addressing the above issues. First, we introduce a Distance-sensitive Self-Attention (DSA), which considers the raw geometric distances between query-key pairs in the 2D images during SA modeling. Second, we present a simple yet effective approach, named Multi-branch Self-Attention (MSA) to compensate for the low-rank bottleneck. MSA treats a multi-head self-attention layer as a branch and duplicates it multiple times to increase the expressive power of SA. To validate the effectiveness of the two designs, we apply them to the standard self-attention network, and conduct extensive experiments on the highly competitive MS-COCO dataset. We achieve new state-of-the-art performance on both the local and online test sets, i.e., 135.1% CIDEr on the Karpathy split and 135.4% CIDEr on the official online split.
Published in: IEEE Transactions on Multimedia ( Volume: 25)
Page(s): 3962 - 3974
Date of Publication: 22 April 2022

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.