MSANet: Multi-scale attention networks for image classification

Cao, Ping; Xie, Fangxin; Zhang, Shichao; Zhang, Zuping; Zhang, Jianfeng

doi:10.1007/s11042-022-12792-5

MSANet: Multi-scale attention networks for image classification

1168: Deep Pattern Discovery for Big Multimedia Data
Published: 13 May 2022

Volume 81, pages 34325–34344, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ping Cao^1,2,
Fangxin Xie¹,
Shichao Zhang¹,
Zuping Zhang ORCID: orcid.org/0000-0002-2528-7808¹ &
…
Jianfeng Zhang³

633 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

The classification of images based on the principles of human vision is a major task in the field of computer vision. It is a common method to use multi-scale information and attention mechanism to obtain better classification performance. The methods based on multi-scale can obtain more accurate feature description by fusing different levels of information, and the methods based on attention can make the deep learning models focus on more valuable information in the image. However, the current methods usually treat the acquisition of multi-scale feature maps and the acquisition of attention weights as two separate steps in sequence. Since human eyes usually use these two methods at the same time when observing objects, we propose a multi-scale attention (MSA) module. The proposed MSA module directly extracts the attention information of different scales from a feature map, that is, the multi-scale and attention methods are simultaneously completed in one step. In the MSA module, we obtain different scales of channel and spatial attention by controlling the size of the convolution kernel for cross-channel and cross-space information interaction. Our module can be easily integrated into different convolutional neural networks to form Multi-scale attention networks (MSANet) architectures. We demonstrate the performance of MSANet on CIFAR-10 and CIFAR-100 data sets. In particular, the accuracy of our ResNet-110 based model on CIFAR-10 is 94.39%. Compared with the benchmark convolution model, our proposed multi-scale attention module can bring a roughly 3% increase in accuracy rate on CIFAR-100. Experimental results show that the proposed multi-scale attention module is superior in image classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DMSANet: Dual Multi Scale Attention Network

Densely Connected Image Classification Algorithm Combining with Self-attention

AFS: Attention Using First and Second Order Information to Enrich Features

References

Adelson EH, Anderson CH, Bergen JR, Burt PJ, Ogden JM (1984) Pyramid methods in image processing. RCA Eng 29(6):33–41
Google Scholar
Ali A, Zhu Y, Chen Q, Yu J, Cai H (2019) Leveraging spatio-temporal patterns for predicting citywide traffic crowd flows using deep hybrid neural networks
Ali A, Zhu Y, Zakarya M (2021) A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimedia Tools and Applications pp 1–33
Ali A, Zhu Y, Zakarya M (2021) Exploiting dynamic spatio-temporal correlations for citywide traffic flow prediction using attention based neural networks. Inf Sci 577:852–870
Article MathSciNet Google Scholar
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Banerji S, Sinha A, Liu C (2013) New image descriptors based on color, texture, shape, and wavelets for object and scene image classification. Neurocomputing 117:173–185
Article Google Scholar
Bay H, Tuytelaars T, Van Gool L (2006) Surf: Speeded up robust features. In: European conference on computer vision, Springer, pp 404–417
Bourlard H, Kamp Y (1988) Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics 59(4-5):291–294
Article MathSciNet Google Scholar
Bramberger M, Brunner J, Rinner B, Schwabach H (2004) Real-time video analysis on an embedded smart camera for traffic surveillance. In: Proceedings. RTAS 2004. 10th IEEE real-time and embedded technology and applications symposium, 2004., IEEE, pp 174–181
Burt P, Adelson E (1983) The laplacian pyramid as a compact image code. IEEE Trans Commun 31(4):532–540
Article Google Scholar
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua TS (2017) Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667
Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol 1, Prague, pp 1–2
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, IEEE, pp 886–893
Durand T, Mordan T, Thome N, Cord M (2017) Wildcat: Weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 642–651
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154
Gan J, Zhu X, Hu R, Zhu Y, Ma J, Peng Z, Wu G (2020) Multi-graph fusion for functional neuroimaging biomarker detection. In: Bessiere C. (ed) Proceedings of the Twenty-Ninth international joint conference on artificial intelligence, IJCAI 2020, pp 580–586. ijcai.org
Gao S, Cheng MM, Zhao K, Zhang XY, Yang MH, Torr PH (2019) Res2net: A new multi-scale backbone architecture. IEEE transactions on pattern analysis and machine intelligence
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Computation 18(7):1527–1554
Article MathSciNet Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of Physiology 160 (1):106
Article Google Scholar
Jaderberg M, Simonyan K, Zisserman A, et al. (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025
Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review, pp 1–62
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Li H, Xiong P, An J, Wang L (2018) Pyramid attention network for semantic segmentation. arXiv preprint arXiv:1805.10180
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 510–519
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Mei Y, Fan Y, Zhang Y, Yu J, Zhou Y, Liu D, Fu Y, Huang TS, Shi H (2020) Pyramid attention networks for image restoration. arXiv preprint arXiv:2004.13824
Mnih V, Heess N, Graves A, et al. (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, pp 2204–2212
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Article Google Scholar
Otukei JR, Blaschke T (2010) Land cover change assessment using decision trees, support vector machines and maximum likelihood classification algorithms. International Journal of Applied Earth Observation and Geoinformation 12:S27–S31
Article Google Scholar
Qian X, Fu Y, Xiang T, Jiang YG, Xue X (2019) Leader-based multi-scale attention deep architecture for person re-identification. IEEE Trans Pattern Anal Mach Intell 42(2):371–385
Article Google Scholar
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Smolensky P (1986) Information processing in dynamical systems: Foundations of harmony theory. Tech. rep., Colorado Univ at Boulder Dept of Computer Science
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
Thanh Noi P, Kappas M (2018) Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using sentinel-2 imagery. Sensors 18(1):18
Google Scholar
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Woo S, Park J, Lee JY, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Yan S, Smith JS, Lu W, Zhang B (2018) Hierarchical multi-scale attention networks for action recognition. Signal Process Image Commun 61:73–84
Article Google Scholar
Yan Z, Liu W, Wen S, Yang Y (2019) Multi-label image classification by feature attention network. IEEE Access 7:98005–98013
Article Google Scholar
Yang Y, Xu C, Dong F, Wang X (2020) A new multi-scale convolutional model based on multiple attention for image classification. Appl Sci 10 (1):101
Article Google Scholar
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122
Zhang C, Zhu L, Zhang S, Yu W (2020) TDHPPIR: An efficient deep hashing based privacy-preserving image retrieval method. Neurocomputing 406:386–398
Article Google Scholar
Zhang J, Liu M, Shen D (2017) Detecting anatomical landmarks from limited medical imaging data using two-stage task-oriented deep neural networks. IEEE Trans Image Process 26(10):4753–4764
Article MathSciNet Google Scholar
Zhang J, Xie Z, Sun J, Zou X, Wang J (2020) A cascaded r-cnn with multiscale attention and imbalanced samples for traffic sign detection. IEEE Access 8:29742–29754
Article Google Scholar
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: A single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 9259–9266
Zhao T, Wu X (2019) Pyramid feature attention network for saliency detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3085–3094

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (61836016, 61672177).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Central South University, Changsha, People’s Republic of China
Ping Cao, Fangxin Xie, Shichao Zhang & Zuping Zhang
School of Computer and Information Technology, Beijing Jiaotong University, Beijing, People’s Republic of China
Ping Cao
College of Computer Science, National University of Defense Technology, Changsha, People’s Republic of China
Jianfeng Zhang

Authors

Ping Cao
View author publications
You can also search for this author in PubMed Google Scholar
Fangxin Xie
View author publications
You can also search for this author in PubMed Google Scholar
Shichao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zuping Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zuping Zhang or Jianfeng Zhang.

Ethics declarations

Conflict of Interests

Cao, P., Xie, F., Zhang, S., Zhang, Z., and Zhang, J. declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cao, P., Xie, F., Zhang, S. et al. MSANet: Multi-scale attention networks for image classification. Multimed Tools Appl 81, 34325–34344 (2022). https://doi.org/10.1007/s11042-022-12792-5

Download citation

Received: 02 September 2020
Revised: 09 December 2021
Accepted: 24 February 2022
Published: 13 May 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s11042-022-12792-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MSANet: Multi-scale attention networks for image classification

Abstract

Access this article

Similar content being viewed by others

DMSANet: Dual Multi Scale Attention Network

Densely Connected Image Classification Algorithm Combining with Self-attention

AFS: Attention Using First and Second Order Information to Enrich Features

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MSANet: Multi-scale attention networks for image classification

Abstract

Access this article

Similar content being viewed by others

DMSANet: Dual Multi Scale Attention Network

Densely Connected Image Classification Algorithm Combining with Self-attention

AFS: Attention Using First and Second Order Information to Enrich Features

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation