Loading web-font TeX/Math/Italic
Selective Depth Attention Networks for Adaptive Multiscale Feature Representation | IEEE Journals & Magazine | IEEE Xplore

Selective Depth Attention Networks for Adaptive Multiscale Feature Representation


Impact Statement:Multiscale technology has been receiving much attention in computer vision. However, it is a challenging problem to effectively construct adaptive neural networks for rec...Show More

Abstract:

Existing multiscale methods lead to a risk of just increasing the receptive field sizes while neglecting small receptive fields. Thus, it is a challenging problem to effe...Show More
Impact Statement:
Multiscale technology has been receiving much attention in computer vision. However, it is a challenging problem to effectively construct adaptive neural networks for recognizing objects at various scales. The new depthwise attention network is proposed to dynamically capture the multiscale features. Our depth-based attention method is combined with multiscale networks and attention networks as a lightweight and efficient plug-in to achieve better performance. We purposely choose SENet, convolutional block attention module (CBAM), EPSANet, PVTV2, and Res2Net as typical channel-attention, spatial-attention, branch-attention, self-attention, and multiscale xNets, respectively, and significantly improve their recognition performance. Specifically, our SDA-xNet achieves 0.66%, 0.69%, 1.09%, 0.2%, and 0.73% higher top-1 accuracy than original SENet, CBAM, EPSANet, PVTv2, and Res2Net, respectively. On downstream tasks, our SDA-xNet also outperforms their original xNet.

Abstract:

Existing multiscale methods lead to a risk of just increasing the receptive field sizes while neglecting small receptive fields. Thus, it is a challenging problem to effectively construct adaptive neural networks for recognizing various spatial-scale objects. To tackle this issue, we first introduce a new attention dimension, i.e., depth, in addition to existing attentions such as channel-attention, spatial-attention, branch-attention, and self-attention. We present a novel selective depth attention network to treat multiscale objects symmetrically in various vision tasks. Specifically, the blocks within each stage of neural networks, including convolutional neural networks (CNNs), e.g., ResNet, SENet, and Res2Net, and vision transformers (ViTs), e.g., PVTv2, output the hierarchical feature maps with the same resolution but different receptive field sizes. Based on this structural property, we design a depthwise building module, namely an selective depth attention (SDA) module, includi...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 10, October 2024)
Page(s): 5064 - 5074
Date of Publication: 15 May 2024
Electronic ISSN: 2691-4581

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.