Abstract
Attention mechanisms, benefiting from the capability of modeling feature inter-dependencies among channels or spatial locations, have been demonstrated to have great potential in improving the performance of deep convolutional neural networks. However, most existing methods are dedicated to separately developing more intricate channel attention or spatial attention modules to achieve good performance, which inevitably results in losing important information and increasing model overhead. To alleviate this dilemma, in this paper, we propose a novel architecture unit called the lightweight mixed-domain attention (LMA) module. First, LMA aggregates spatial features by using two direction-aware 1D average pooling, which not only captures contextual long-range dependencies but also retains accurate positional information. Subsequently, it adaptively models inter-channel relationships by utilizing our proposed nonlinear local cross-channel interaction strategy, substantially decreasing model overhead while maintaining competitive performance. Our LMA is lightweight yet efficient and can be flexibly plugged into various classic backbones including lightweight MobileNetV2 and heavyweight ResNets as a plug-and-play module. Extensive experimental results of image classification on ImageNet-1K and object detection and instance segmentation on MS COCO demonstrate the superiority of our method against state-of-the-art (SOTA) counterparts. Furthermore, we verify our advanced philosophy through the Grad-CAM++ visualization results.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems (NIPS), pp 1097–1105
Howard A, Sandler M, Chu G, et al. (2019) Searching for mobilenetv3. In: IEEE International Conference on Computer Vision (ICCV), pp 1314–1324
Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML), pp 6105–6114
Han K, Wang Y, Tian Q, et al. (2020) Ghostnet: more features from cheap operations. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1580–1589
Ding X, Zhang X, Ma N et al (2021) Repvgg: making vgg-style convnets great again. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 13733–13742
Ding X, Zhang X, Han J et al (2021) Diverse branch block: building a convolution as an inception-like unit. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10886–10895
Sun Z, Cao S, Yang Y et al (2021) Rethinking transformer-based set prediction for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp 3611–3620
Wang J, Song L, Li Z, et al. (2021) End-to-end object detection with fully convolutional network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 15849–15858
Wang Y, Xu Z, Wang X et al (2021) End-to-end video instance segmentation with transformers. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8741–8750
Zhang R, Tian Z, Shen C, et al. (2020) Mask encoding for single shot instance segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10226–10235
Hou Q, Zhang L, Cheng MM, et al. (2020) Strip pooling: rethinking spatial pooling for scene parsing. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4003–4012
Song Q, Mei K, Huang R (2021) AttaNet: attention-augmented network for fast and accurate scene parsing. In: The AAAI conference on artificial intelligence, pp 2567–2575
Qin Z, Zhang P, Wu F, et al. (2021) Fcanet: frequency channel attention networks. In: IEEE International Conference on Computer Vision (ICCV), pp 783–792
Shen Z, Zhang M, Zhao H, et al. (2021) Efficient attention: attention with linear complexities. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp 3531–3539
Zhao H, Jia J, Koltun V (2020) Exploring self-attention for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10076–10085
Li X, Wang W, Hu X, et al. (2019) Selective kernel networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 510–519
Hu J, Shen L, Albanie S et al (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell, 2011–2023
Wang QL, Wu BG, Zhu PF et al (2020) ECA-net: efficient channel attention for deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 11531–11539
Woo S, Park J, Lee JY et al (2018) CBAM: convolutional block attention module. In: European Conference on Computer Vision (ECCV), pp 3–19
Park J, Woo S, Lee JY et al (2018) BAM: Bottleneck attention module. In: British Machine Vision Conference (BMVC)
Misra D, Nalamada T, Arasanipalai AU et al (2021) Rotate to attend: convolutional triplet attention module. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp 3139–3148
Sandler M, Howard A, Zhu M, et al. (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4510–4520
He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
Hu J, Shen L, Albanie S, et al. (2018) Gather-excite: exploiting feature context in convolutional neural networks. In: Neural Information Processing Systems (NIPS), pp 9401–9411
Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Gao Z, Xie J, Wang Q et al (2019) Global second-order pooling convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3024–3033
Bello I, Zoph B, Vaswani A et al (2019) Attention augmented convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp 3286–3295
Roy AG, Navab N, Wachinger C (2018) Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation” blocks. IEEE Trans Med Imaging, 540–549
Linsley D, Shiebler D, Eberhardt S et al (2019) Learning what and where to attend. In: International Conference on Learning Representations (ICLR)
Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7794–7803
Chen Y, Kalantidis Y, Li J et al (2018) A2-Nets: double attention networks. In: Neural Information Processing Systems (NIPS), pp 352–361
Cao Y, Xu J, Lin S, et al. (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: IEEE International Conference on Computer Vision (ICCV), pp 1971–1980
Fu J, Liu J, Jiang J et al (2020) Scene segmentation with dual relation-aware attention network. IEEE transactions on neural networks and learning systems, 2547–2560
Liu JJ, Hou Q, Cheng MM, et al. (2020) Improving convolutional networks with self-calibrated convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10096–10105
Huang Z, Wang X, Wei Y, et al. (2020) CCNet: criss-cross attention for semantic segmentation. IEEE Trans Pattern Anal Mach Intell, 1–14
Zhang QL, Yang YB (2021) SA-net: shuffle attention for deep convolutional neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 2235–2239
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning (ICML), pp 807–814
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), pp 448–456
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis, 211–252
Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV), pp 740–755
Chattopadhay A, Sarkar A, Howlader P, et al. (2018) Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp 839–847
Paszke A, Gross S, Massa F et al (2019) Pytorch: an imperative style, high-performance deep learning library. In: Neural Information Processing Systems (NIPS), pp 8026–8037
Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: European Conference on Computer Vision (ECCV), pp 21–37
Lin TY, Dollár P, Girshick R, et al. (2017) Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2117–2125
Ren S, He K, Girshick R, et al. (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell, 1137–1149
Lin TY, Goyal P, Girshick R, et al. (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell, 318–327
He K, Gkioxari G, Dollár P et al (2018) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell, 386–397
Chen K, Wang J, Pang J et al (2019) MMDetection: open mmlab detection toolbox and benchmark. arXiv:1906.07155
Yang S, Tan J, Chen B (2022) Robust spike-based continual meta-learning improved by restricted minimum error entropy criterion. Entropy
Yang S, Gao T, Wang J et al (2022) SAM: a unified self-adaptive multicompartmental spiking neuron model for learning with working memory. Frontiers in Neuroscience
Yang S, Deng B, Wang J et al (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons. IEEE Trans Neural Netw Learn Syst, 148–162
Yang S, Wang J, Deng B et al (2021) Neuromorphic context-dependent learning framework with fault-tolerant spike routing. IEEE Trans Neural Netw Learn Syst, 1–15
Acknowledgements
The authors would like to thank editors for rigorous work and the anonymous reviewers for their comments and suggestions. This work was supported in part by National Natural Science Foundation of China under Grant 61801394, 61803310, 62171735, 62173276, and 62101458, in part by Fundamental Research Funds for the Central Universities under Grant 3102019HHZY030013 and Grant G2019KY05206, in part by the Natural Science Basic Research Plan in Shaanxi Province of China under Grant 2020JQ-202 and 2021JQ-122, and in part by China Postdoctoral Science Foundation under Grant 2020M673482 and 2020M673485.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, Y., Zhang, Y., Song, Z. et al. LMA: lightweight mixed-domain attention for efficient network design. Appl Intell 53, 13432–13451 (2023). https://doi.org/10.1007/s10489-022-04170-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04170-3