Skip to main content
Log in

LMA: lightweight mixed-domain attention for efficient network design

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Attention mechanisms, benefiting from the capability of modeling feature inter-dependencies among channels or spatial locations, have been demonstrated to have great potential in improving the performance of deep convolutional neural networks. However, most existing methods are dedicated to separately developing more intricate channel attention or spatial attention modules to achieve good performance, which inevitably results in losing important information and increasing model overhead. To alleviate this dilemma, in this paper, we propose a novel architecture unit called the lightweight mixed-domain attention (LMA) module. First, LMA aggregates spatial features by using two direction-aware 1D average pooling, which not only captures contextual long-range dependencies but also retains accurate positional information. Subsequently, it adaptively models inter-channel relationships by utilizing our proposed nonlinear local cross-channel interaction strategy, substantially decreasing model overhead while maintaining competitive performance. Our LMA is lightweight yet efficient and can be flexibly plugged into various classic backbones including lightweight MobileNetV2 and heavyweight ResNets as a plug-and-play module. Extensive experimental results of image classification on ImageNet-1K and object detection and instance segmentation on MS COCO demonstrate the superiority of our method against state-of-the-art (SOTA) counterparts. Furthermore, we verify our advanced philosophy through the Grad-CAM++ visualization results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems (NIPS), pp 1097–1105

  2. Howard A, Sandler M, Chu G, et al. (2019) Searching for mobilenetv3. In: IEEE International Conference on Computer Vision (ICCV), pp 1314–1324

  3. Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML), pp 6105–6114

  4. Han K, Wang Y, Tian Q, et al. (2020) Ghostnet: more features from cheap operations. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1580–1589

  5. Ding X, Zhang X, Ma N et al (2021) Repvgg: making vgg-style convnets great again. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 13733–13742

  6. Ding X, Zhang X, Han J et al (2021) Diverse branch block: building a convolution as an inception-like unit. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10886–10895

  7. Sun Z, Cao S, Yang Y et al (2021) Rethinking transformer-based set prediction for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp 3611–3620

  8. Wang J, Song L, Li Z, et al. (2021) End-to-end object detection with fully convolutional network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 15849–15858

  9. Wang Y, Xu Z, Wang X et al (2021) End-to-end video instance segmentation with transformers. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 8741–8750

  10. Zhang R, Tian Z, Shen C, et al. (2020) Mask encoding for single shot instance segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10226–10235

  11. Hou Q, Zhang L, Cheng MM, et al. (2020) Strip pooling: rethinking spatial pooling for scene parsing. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4003–4012

  12. Song Q, Mei K, Huang R (2021) AttaNet: attention-augmented network for fast and accurate scene parsing. In: The AAAI conference on artificial intelligence, pp 2567–2575

  13. Qin Z, Zhang P, Wu F, et al. (2021) Fcanet: frequency channel attention networks. In: IEEE International Conference on Computer Vision (ICCV), pp 783–792

  14. Shen Z, Zhang M, Zhao H, et al. (2021) Efficient attention: attention with linear complexities. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp 3531–3539

  15. Zhao H, Jia J, Koltun V (2020) Exploring self-attention for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10076–10085

  16. Li X, Wang W, Hu X, et al. (2019) Selective kernel networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 510–519

  17. Hu J, Shen L, Albanie S et al (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell, 2011–2023

  18. Wang QL, Wu BG, Zhu PF et al (2020) ECA-net: efficient channel attention for deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 11531–11539

  19. Woo S, Park J, Lee JY et al (2018) CBAM: convolutional block attention module. In: European Conference on Computer Vision (ECCV), pp 3–19

  20. Park J, Woo S, Lee JY et al (2018) BAM: Bottleneck attention module. In: British Machine Vision Conference (BMVC)

  21. Misra D, Nalamada T, Arasanipalai AU et al (2021) Rotate to attend: convolutional triplet attention module. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp 3139–3148

  22. Sandler M, Howard A, Zhu M, et al. (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4510–4520

  23. He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778

  24. Hu J, Shen L, Albanie S, et al. (2018) Gather-excite: exploiting feature context in convolutional neural networks. In: Neural Information Processing Systems (NIPS), pp 9401–9411

  25. Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  26. Gao Z, Xie J, Wang Q et al (2019) Global second-order pooling convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3024–3033

  27. Bello I, Zoph B, Vaswani A et al (2019) Attention augmented convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp 3286–3295

  28. Roy AG, Navab N, Wachinger C (2018) Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation” blocks. IEEE Trans Med Imaging, 540–549

  29. Linsley D, Shiebler D, Eberhardt S et al (2019) Learning what and where to attend. In: International Conference on Learning Representations (ICLR)

  30. Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7794–7803

  31. Chen Y, Kalantidis Y, Li J et al (2018) A2-Nets: double attention networks. In: Neural Information Processing Systems (NIPS), pp 352–361

  32. Cao Y, Xu J, Lin S, et al. (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: IEEE International Conference on Computer Vision (ICCV), pp 1971–1980

  33. Fu J, Liu J, Jiang J et al (2020) Scene segmentation with dual relation-aware attention network. IEEE transactions on neural networks and learning systems, 2547–2560

  34. Liu JJ, Hou Q, Cheng MM, et al. (2020) Improving convolutional networks with self-calibrated convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 10096–10105

  35. Huang Z, Wang X, Wei Y, et al. (2020) CCNet: criss-cross attention for semantic segmentation. IEEE Trans Pattern Anal Mach Intell, 1–14

  36. Zhang QL, Yang YB (2021) SA-net: shuffle attention for deep convolutional neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 2235–2239

  37. Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning (ICML), pp 807–814

  38. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), pp 448–456

  39. Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis, 211–252

  40. Lin TY, Maire M, Belongie S et al (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV), pp 740–755

  41. Chattopadhay A, Sarkar A, Howlader P, et al. (2018) Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp 839–847

  42. Paszke A, Gross S, Massa F et al (2019) Pytorch: an imperative style, high-performance deep learning library. In: Neural Information Processing Systems (NIPS), pp 8026–8037

  43. Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: European Conference on Computer Vision (ECCV), pp 21–37

  44. Lin TY, Dollár P, Girshick R, et al. (2017) Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2117–2125

  45. Ren S, He K, Girshick R, et al. (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell, 1137–1149

  46. Lin TY, Goyal P, Girshick R, et al. (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell, 318–327

  47. He K, Gkioxari G, Dollár P et al (2018) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell, 386–397

  48. Chen K, Wang J, Pang J et al (2019) MMDetection: open mmlab detection toolbox and benchmark. arXiv:1906.07155

  49. Yang S, Tan J, Chen B (2022) Robust spike-based continual meta-learning improved by restricted minimum error entropy criterion. Entropy

  50. Yang S, Gao T, Wang J et al (2022) SAM: a unified self-adaptive multicompartmental spiking neuron model for learning with working memory. Frontiers in Neuroscience

  51. Yang S, Deng B, Wang J et al (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons. IEEE Trans Neural Netw Learn Syst, 148–162

  52. Yang S, Wang J, Deng B et al (2021) Neuromorphic context-dependent learning framework with fault-tolerant spike routing. IEEE Trans Neural Netw Learn Syst, 1–15

Download references

Acknowledgements

The authors would like to thank editors for rigorous work and the anonymous reviewers for their comments and suggestions. This work was supported in part by National Natural Science Foundation of China under Grant 61801394, 61803310, 62171735, 62173276, and 62101458, in part by Fundamental Research Funds for the Central Universities under Grant 3102019HHZY030013 and Grant G2019KY05206, in part by the Natural Science Basic Research Plan in Shaanxi Province of China under Grant 2020JQ-202 and 2021JQ-122, and in part by China Postdoctoral Science Foundation under Grant 2020M673482 and 2020M673485.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cheng-Kai Tang.

Ethics declarations

Competing interests

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, Y., Zhang, Y., Song, Z. et al. LMA: lightweight mixed-domain attention for efficient network design. Appl Intell 53, 13432–13451 (2023). https://doi.org/10.1007/s10489-022-04170-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04170-3

Keywords

Navigation