Abstract
Recently, crowd counting via estimating a density map has been widely studied. However, it still has a variety of issues to overcome, such as large-scale variation of population, complex background noise, perspective distortion, etc. The large-scale variation of heads will restrict the performance of crowd counting approaches, and the complex background noise will result in the background, such as leaf and mesh, being incorrectly recognized as heads. To maintain large-scale variation and generate a high-quality estimated density map, we propose a novel multi-scale fusion scale-aware attention network called multi-scale and gated spatial attention network (MGSNet). In MGSNet, the first 10 layers of VGG16 with Batch Normalization (BN) are utilized as backbone. Then, two branches, i.e., a large-scale branch and a scale–aware attention branch, are followed. The large-scale branch is used to overcome the large-scale variation of heads in crowd images, in which a Scale Information Aggregation Block (SIAB) is employed to extract multi-scale features by utilizing dilated convolution with different receptive fields. The scale-aware attention branch is used to address complex background noise in crowd scenes, in which a Gated Spatial Attention Block (GSAB) inspired by the Long Short-term Memory Networks (LSTM) is employed to fuse the previous information with different scales and retain the appropriate scale information of crowds. We demonstrate our proposed method on the ShanghaiTech (Part AB), UCF-CC-50 and UCF-QNRF datasets. The experimental results show its effectiveness over the state-of-the-art.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Deb D, Ventura J (2018) An aggregated multicolumn dilated convolution network for perspective-free counting. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 195–204
Cheng Z-Q, Li J-X, Dai Q, Wu X, He J-Y, Hauptmann AG (2019) Improving the learning of multi-column convolutional neural network for crowd counting. In: Proceedings of the 27th ACM international conference on multimedia, pp 1897–1906
Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018) Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5245–5254
Zhou Y, Yang J, Li H, Cao T, Kung S-Y (2020) Adversarial learning for multiscale crowd counting under complex scenes. IEEE transactions on cybernetics
Sindagi VA, Patel VM (2018) A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recogn Lett 107:3–16
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2547–2554
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, Ieee, pp 886–893
Liu J, Gao C, Meng D, Hauptmann AG (2018) Decidenet: Counting varying density crowds through attention guided detection and density estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5197–5206
Pham V-Q, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE international conference on computer vision, pp 3253–3261
Wang C, Zhang H, Yang L, Liu S, Cao X (2015) Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on multimedia, pp 1299–1302
Fu M, Xu P, Li X, Liu Q, Ye M, Zhu C (2015) Fast crowd density estimation with convolutional neural networks. Eng Appl Artif Intell 43:81–88
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 833–841
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp 4031–4039
Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1091–1100
Varior RR, Shuai B, Tighe J, Modolo D (2019) Scale-aware attention network for crowd counting. arXiv:1901.06026 1(2):3
Liu N, Long Y, Zou C, Niu Q, Pan L, Wu H (2019) Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3225–3234
Zeng L, Xu X, Cai B, Qiu S, Zhang T (2017) Multi-scale convolutional neural networks for crowd counting. In: 2017 IEEE International Conference on Image Processing (ICIP), IEEE, pp 465–469
Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 734–750
Hossain M, Hosseinzadeh M, Chanda O, Wang Y (2019) Crowd counting using scale-aware attention networks. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp 1280–1288
Varior RR, Shuai B, Tighe J, Modolo D (2019) Multi-scale attention network for crowd counting. arXiv:1901.06026
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Idrees H, Tayyab M, Athrey K, Zhang D, Al-Maadeed S, Rajpoot N, Shah M (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 532– 546
Wang Q, Gao J, Lin W, Yuan Y (2019) Learning from synthetic data for crowd counting in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8198–8207
Jiang X, Xiao Z, Zhang B, Zhen X, Cao X, Doermann D, Shao L (2019) Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6133–6142
Shi M, Yang Z, Xu C, Chen Q (2019) Revisiting perspective information for efficient crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7279–7288
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5099–5108
Yan Z, Yuan Y, Zuo W, Tan X, Wang Y, Wen S, Ding E (2019) Perspective-guided convolution networks for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 952–961
Sindagi VA, Patel VM (2019) Ha-ccn: Hierarchical attention-based crowd counting network. IEEE Trans Image Process 29: 323–335
Wang Q, Breckon TP (2019) Crowd counting via segmentation guided attention networks and curriculum loss . arXiv:1911.07990
Thanasutives P, Fukui K-, Numao M, Kijsirikul B (2021) Encoder-decoder based convolutional neural networks with multi-scale-aware modules for crowd counting. In: 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, pp 2382–2389
Zhang A, Shen J, Xiao Z, Zhu F, Zhen X, Cao X, Shao L (2019) Relational attention network for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6788–6797
Xu C, Qiu K, Fu J, Bai S, Xu Y, Bai X (2019) Learn to scale: Generating multipolar normalized density maps for crowd counting. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8382–8390
Wang B, Liu H, Samaras D, Hoai M (2020) Distribution matching for crowd counting. arXiv:2009.13077
Acknowledgements
This work was supported by National Natural Science Foundation of China (No. 61971073).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shi, Y., Sang, J., Wu, Z. et al. MGSNet: A multi-scale and gated spatial attention network for crowd counting. Appl Intell 52, 15436–15446 (2022). https://doi.org/10.1007/s10489-022-03263-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03263-3