LMA: lightweight mixed-domain attention for efficient network design

Yu, Yang; Zhang, Yi; Song, Zhe; Tang, Cheng-Kai

doi:10.1007/s10489-022-04170-3

LMA: lightweight mixed-domain attention for efficient network design

Published: 11 October 2022

Volume 53, pages 13432–13451, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yang Yu ORCID: orcid.org/0000-0002-9188-394X¹,
Yi Zhang¹,
Zhe Song¹ &
…
Cheng-Kai Tang¹

559 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Attention mechanisms, benefiting from the capability of modeling feature inter-dependencies among channels or spatial locations, have been demonstrated to have great potential in improving the performance of deep convolutional neural networks. However, most existing methods are dedicated to separately developing more intricate channel attention or spatial attention modules to achieve good performance, which inevitably results in losing important information and increasing model overhead. To alleviate this dilemma, in this paper, we propose a novel architecture unit called the lightweight mixed-domain attention (LMA) module. First, LMA aggregates spatial features by using two direction-aware 1D average pooling, which not only captures contextual long-range dependencies but also retains accurate positional information. Subsequently, it adaptively models inter-channel relationships by utilizing our proposed nonlinear local cross-channel interaction strategy, substantially decreasing model overhead while maintaining competitive performance. Our LMA is lightweight yet efficient and can be flexibly plugged into various classic backbones including lightweight MobileNetV2 and heavyweight ResNets as a plug-and-play module. Extensive experimental results of image classification on ImageNet-1K and object detection and instance segmentation on MS COCO demonstrate the superiority of our method against state-of-the-art (SOTA) counterparts. Furthermore, we verify our advanced philosophy through the Grad-CAM++ visualization results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SFA: Efficient Attention Mechanism for Superior CNN Performance

Article Open access 04 April 2025

CSFNet: a compact and efficient convolution-transformer hybrid vision model

Article 12 February 2024

FGLNet: frequency global and local context channel attention networks

Article 17 August 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems (NIPS), pp 1097–1105
Howard A, Sandler M, Chu G, et al. (2019) Searching for mobilenetv3. In: IEEE International Conference on Computer Vision (ICCV), pp 1314–1324
Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML), pp 6105–6114
Han K, Wang Y, Tian Q, et al. (2020) Ghostnet: more features from cheap operations. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1580–1589
Ding X, Zhang X, Ma N et al (2021) Repvgg: making vgg-style convnets great again. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 13733–13742
Ding X, Zhang X, Han J et al (2021) Diverse branch block: building a convolution as an inception-like unit. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10886–10895
Sun Z, Cao S, Yang Y et al (2021) Rethinking transformer-based set prediction for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp 3611–3620
Wang J, Song L, Li Z, et al. (2021) End-to-end object detection with fully convolutional network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 15849–15858
Wang Y, Xu Z, Wang X et al (2021) End-to-end video instance segmentation with transformers. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8741–8750
Zhang R, Tian Z, Shen C, et al. (2020) Mask encoding for single shot instance segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10226–10235
Hou Q, Zhang L, Cheng MM, et al. (2020) Strip pooling: rethinking spatial pooling for scene parsing. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4003–4012
Song Q, Mei K, Huang R (2021) AttaNet: attention-augmented network for fast and accurate scene parsing. In: The AAAI conference on artificial intelligence, pp 2567–2575
Qin Z, Zhang P, Wu F, et al. (2021) Fcanet: frequency channel attention networks. In: IEEE International Conference on Computer Vision (ICCV), pp 783–792
Shen Z, Zhang M, Zhao H, et al. (2021) Efficient attention: attention with linear complexities. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp 3531–3539
Zhao H, Jia J, Koltun V (2020) Exploring self-attention for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10076–10085
Li X, Wang W, Hu X, et al. (2019) Selective kernel networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 510–519
Hu J, Shen L, Albanie S et al (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell, 2011–2023
Wang QL, Wu BG, Zhu PF et al (2020) ECA-net: efficient channel attention for deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 11531–11539
Woo S, Park J, Lee JY et al (2018) CBAM: convolutional block attention module. In: European Conference on Computer Vision (ECCV), pp 3–19
Park J, Woo S, Lee JY et al (2018) BAM: Bottleneck attention module. In: British Machine Vision Conference (BMVC)
Misra D, Nalamada T, Arasanipalai AU et al (2021) Rotate to attend: convolutional triplet attention module. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp 3139–3148
Sandler M, Howard A, Zhu M, et al. (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4510–4520
He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778
Hu J, Shen L, Albanie S, et al. (2018) Gather-excite: exploiting feature context in convolutional neural networks. In: Neural Information Processing Systems (NIPS), pp 9401–9411
Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Gao Z, Xie J, Wang Q et al (2019) Global second-order pooling convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3024–3033
Bello I, Zoph B, Vaswani A et al (2019) Attention augmented convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp 3286–3295
Roy AG, Navab N, Wachinger C (2018) Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation” blocks. IEEE Trans Med Imaging, 540–549
Linsley D, Shiebler D, Eberhardt S et al (2019) Learning what and where to attend. In: International Conference on Learning Representations (ICLR)
Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7794–7803
Chen Y, Kalantidis Y, Li J et al (2018) A²-Nets: double attention networks. In: Neural Information Processing Systems (NIPS), pp 352–361
Cao Y, Xu J, Lin S, et al. (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: IEEE International Conference on Computer Vision (ICCV), pp 1971–1980
Fu J, Liu J, Jiang J et al (2020) Scene segmentation with dual relation-aware attention network. IEEE transactions on neural networks and learning systems, 2547–2560
Liu JJ, Hou Q, Cheng MM, et al. (2020) Improving convolutional networks with self-calibrated convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10096–10105
Huang Z, Wang X, Wei Y, et al. (2020) CCNet: criss-cross attention for semantic segmentation. IEEE Trans Pattern Anal Mach Intell, 1–14
Zhang QL, Yang YB (2021) SA-net: shuffle attention for deep convolutional neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 2235–2239
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning (ICML), pp 807–814
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), pp 448–456
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis, 211–252
Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV), pp 740–755
Chattopadhay A, Sarkar A, Howlader P, et al. (2018) Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp 839–847
Paszke A, Gross S, Massa F et al (2019) Pytorch: an imperative style, high-performance deep learning library. In: Neural Information Processing Systems (NIPS), pp 8026–8037
Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: European Conference on Computer Vision (ECCV), pp 21–37
Lin TY, Dollár P, Girshick R, et al. (2017) Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2117–2125
Ren S, He K, Girshick R, et al. (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell, 1137–1149
Lin TY, Goyal P, Girshick R, et al. (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell, 318–327
He K, Gkioxari G, Dollár P et al (2018) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell, 386–397
Chen K, Wang J, Pang J et al (2019) MMDetection: open mmlab detection toolbox and benchmark. arXiv:1906.07155
Yang S, Tan J, Chen B (2022) Robust spike-based continual meta-learning improved by restricted minimum error entropy criterion. Entropy
Yang S, Gao T, Wang J et al (2022) SAM: a unified self-adaptive multicompartmental spiking neuron model for learning with working memory. Frontiers in Neuroscience
Yang S, Deng B, Wang J et al (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons. IEEE Trans Neural Netw Learn Syst, 148–162
Yang S, Wang J, Deng B et al (2021) Neuromorphic context-dependent learning framework with fault-tolerant spike routing. IEEE Trans Neural Netw Learn Syst, 1–15

Download references

Acknowledgements

The authors would like to thank editors for rigorous work and the anonymous reviewers for their comments and suggestions. This work was supported in part by National Natural Science Foundation of China under Grant 61801394, 61803310, 62171735, 62173276, and 62101458, in part by Fundamental Research Funds for the Central Universities under Grant 3102019HHZY030013 and Grant G2019KY05206, in part by the Natural Science Basic Research Plan in Shaanxi Province of China under Grant 2020JQ-202 and 2021JQ-122, and in part by China Postdoctoral Science Foundation under Grant 2020M673482 and 2020M673485.

Author information

Authors and Affiliations

School of Electronics and Information, Northwestern Polytechnical University, Xi’an, 710129, Shaanxi, China
Yang Yu, Yi Zhang, Zhe Song & Cheng-Kai Tang

Authors

Yang Yu
View author publications
You can also search for this author inPubMed Google Scholar
Yi Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Zhe Song
View author publications
You can also search for this author inPubMed Google Scholar
Cheng-Kai Tang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Cheng-Kai Tang.

Ethics declarations

Competing interests

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yu, Y., Zhang, Y., Song, Z. et al. LMA: lightweight mixed-domain attention for efficient network design. Appl Intell 53, 13432–13451 (2023). https://doi.org/10.1007/s10489-022-04170-3

Download citation

Accepted: 11 September 2022
Published: 11 October 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10489-022-04170-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LMA: lightweight mixed-domain attention for efficient network design

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SFA: Efficient Attention Mechanism for Superior CNN Performance

CSFNet: a compact and efficient convolution-transformer hybrid vision model

FGLNet: frequency global and local context channel attention networks

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now