MGSNet: A multi-scale and gated spatial attention network for crowd counting

Shi, Ying; Sang, Jun; Wu, Zhongyuan; Wang, Fusen; Liu, Xinyue; Xia, Xiaofeng; Sang, Nong

doi:10.1007/s10489-022-03263-3

MGSNet: A multi-scale and gated spatial attention network for crowd counting

Published: 16 March 2022

Volume 52, pages 15436–15446, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ying Shi^1,2,
Jun Sang ORCID: orcid.org/0000-0002-8703-7310^1,2,
Zhongyuan Wu^1,2,
Fusen Wang^1,2,
Xinyue Liu^1,2,
Xiaofeng Xia^1,2 &
…
Nong Sang³

471 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

Recently, crowd counting via estimating a density map has been widely studied. However, it still has a variety of issues to overcome, such as large-scale variation of population, complex background noise, perspective distortion, etc. The large-scale variation of heads will restrict the performance of crowd counting approaches, and the complex background noise will result in the background, such as leaf and mesh, being incorrectly recognized as heads. To maintain large-scale variation and generate a high-quality estimated density map, we propose a novel multi-scale fusion scale-aware attention network called multi-scale and gated spatial attention network (MGSNet). In MGSNet, the first 10 layers of VGG16 with Batch Normalization (BN) are utilized as backbone. Then, two branches, i.e., a large-scale branch and a scale–aware attention branch, are followed. The large-scale branch is used to overcome the large-scale variation of heads in crowd images, in which a Scale Information Aggregation Block (SIAB) is employed to extract multi-scale features by utilizing dilated convolution with different receptive fields. The scale-aware attention branch is used to address complex background noise in crowd scenes, in which a Gated Spatial Attention Block (GSAB) inspired by the Long Short-term Memory Networks (LSTM) is employed to fuse the previous information with different scales and retain the appropriate scale information of crowds. We demonstrate our proposed method on the ShanghaiTech (Part AB), UCF-CC-50 and UCF-QNRF datasets. The experimental results show its effectiveness over the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-scale fusion and dual attention network for crowd counting

Article 21 May 2024

MLANet: multi-level attention network with multi-scale feature fusion for crowd counting

Article 04 March 2024

Crowd density estimation based on multi scale features fusion network with reverse attention mechanism

Article 22 February 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Deb D, Ventura J (2018) An aggregated multicolumn dilated convolution network for perspective-free counting. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 195–204
Cheng Z-Q, Li J-X, Dai Q, Wu X, He J-Y, Hauptmann AG (2019) Improving the learning of multi-column convolutional neural network for crowd counting. In: Proceedings of the 27th ACM international conference on multimedia, pp 1897–1906
Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018) Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5245–5254
Zhou Y, Yang J, Li H, Cao T, Kung S-Y (2020) Adversarial learning for multiscale crowd counting under complex scenes. IEEE transactions on cybernetics
Sindagi VA, Patel VM (2018) A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recogn Lett 107:3–16
Article Google Scholar
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Article Google Scholar
Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2547–2554
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, Ieee, pp 886–893
Liu J, Gao C, Meng D, Hauptmann AG (2018) Decidenet: Counting varying density crowds through attention guided detection and density estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5197–5206
Pham V-Q, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE international conference on computer vision, pp 3253–3261
Wang C, Zhang H, Yang L, Liu S, Cao X (2015) Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on multimedia, pp 1299–1302
Fu M, Xu P, Li X, Liu Q, Ye M, Zhu C (2015) Fast crowd density estimation with convolutional neural networks. Eng Appl Artif Intell 43:81–88
Article Google Scholar
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 833–841
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp 4031–4039
Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1091–1100
Varior RR, Shuai B, Tighe J, Modolo D (2019) Scale-aware attention network for crowd counting. arXiv:1901.06026 1(2):3
Google Scholar
Liu N, Long Y, Zou C, Niu Q, Pan L, Wu H (2019) Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3225–3234
Zeng L, Xu X, Cai B, Qiu S, Zhang T (2017) Multi-scale convolutional neural networks for crowd counting. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp 465–469
Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 734–750
Hossain M, Hosseinzadeh M, Chanda O, Wang Y (2019) Crowd counting using scale-aware attention networks. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp 1280–1288
Varior RR, Shuai B, Tighe J, Modolo D (2019) Multi-scale attention network for crowd counting. arXiv:1901.06026
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Idrees H, Tayyab M, Athrey K, Zhang D, Al-Maadeed S, Rajpoot N, Shah M (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 532– 546
Wang Q, Gao J, Lin W, Yuan Y (2019) Learning from synthetic data for crowd counting in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8198–8207
Jiang X, Xiao Z, Zhang B, Zhen X, Cao X, Doermann D, Shao L (2019) Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6133–6142
Shi M, Yang Z, Xu C, Chen Q (2019) Revisiting perspective information for efficient crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7279–7288
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5099–5108
Yan Z, Yuan Y, Zuo W, Tan X, Wang Y, Wen S, Ding E (2019) Perspective-guided convolution networks for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 952–961
Sindagi VA, Patel VM (2019) Ha-ccn: Hierarchical attention-based crowd counting network. IEEE Trans Image Process 29: 323–335
Article MathSciNet Google Scholar
Wang Q, Breckon TP (2019) Crowd counting via segmentation guided attention networks and curriculum loss . arXiv:1911.07990
Thanasutives P, Fukui K-, Numao M, Kijsirikul B (2021) Encoder-decoder based convolutional neural networks with multi-scale-aware modules for crowd counting. In: 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, pp 2382–2389
Zhang A, Shen J, Xiao Z, Zhu F, Zhen X, Cao X, Shao L (2019) Relational attention network for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6788–6797
Xu C, Qiu K, Fu J, Bai S, Xu Y, Bai X (2019) Learn to scale: Generating multipolar normalized density maps for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8382–8390
Wang B, Liu H, Samaras D, Hoai M (2020) Distribution matching for crowd counting. arXiv:2009.13077

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 61971073).

Author information

Authors and Affiliations

Key Laboratory of Dependable Service Computing in Cyber Physical Society of Ministry of Education, Chongqing University, Chongqing, 400044, China
Ying Shi, Jun Sang, Zhongyuan Wu, Fusen Wang, Xinyue Liu & Xiaofeng Xia
School of Big Data, Software Engineering, Chongqing University, Chongqing, 401331, China
Ying Shi, Jun Sang, Zhongyuan Wu, Fusen Wang, Xinyue Liu & Xiaofeng Xia
School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
Nong Sang

Authors

Ying Shi
View author publications
You can also search for this author inPubMed Google Scholar
Jun Sang
View author publications
You can also search for this author inPubMed Google Scholar
Zhongyuan Wu
View author publications
You can also search for this author inPubMed Google Scholar
Fusen Wang
View author publications
You can also search for this author inPubMed Google Scholar
Xinyue Liu
View author publications
You can also search for this author inPubMed Google Scholar
Xiaofeng Xia
View author publications
You can also search for this author inPubMed Google Scholar
Nong Sang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jun Sang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shi, Y., Sang, J., Wu, Z. et al. MGSNet: A multi-scale and gated spatial attention network for crowd counting. Appl Intell 52, 15436–15446 (2022). https://doi.org/10.1007/s10489-022-03263-3

Download citation

Accepted: 18 January 2022
Published: 16 March 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s10489-022-03263-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MGSNet: A multi-scale and gated spatial attention network for crowd counting

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A multi-scale fusion and dual attention network for crowd counting

MLANet: multi-level attention network with multi-scale feature fusion for crowd counting

Crowd density estimation based on multi scale features fusion network with reverse attention mechanism

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now