Abstract
Recently, visual attention mechanisms have been employed in CNN-based crowd counting methods to overcome the interference of background noise and have achieved good performance. However, the existing methods usually focus on designing complex attention structures and extracting pixel-level contextual information, while ignoring global contextual information extraction at different scales. In this paper, to overcome scale variation and complex background noise, we propose a novel scale-aware and global contextual network (SGCNet) that employs multi-scale attention mechanisms to selectively strengthen features with different network scales. The key component of SGCNet is a multi-scale global contextual block that consists of multi-scale feature selection and global contextual information extraction, where global contextual information is adopted as guidance to weight features at different scales. Compared with the previous methods that ignore scale information injected into the attention mechanism, SGCNet achieves better counting performance via multi-scale contextual information extraction. Extensive experiments on four crowd counting datasets (ShanghaiTech, UCF_CC_50, UCF-QNRF, UCSD) demonstrate the effectiveness and superiority of the proposed method in highly congested noisy crowd scenes.
Similar content being viewed by others
References
Sindagi VA, Patel VM (2017) A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recognit Lett
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: CVPR, pp 589–597
Li Y, Zhang X, Chen D (2018) Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: CVPR, pp 1091–1100
Dai F, Liu H, Ma Y, Yike Cao J, Zhao Q, Zhang Y (2019) Dense scale network for crowd counting. arXiv:1906.09707
Sindagi VA, Patel VM (2020) Ha-ccn: hierarchical attention-based crowd counting networh. TIP 29:323–335. https://doi.org/10.1109/TIP.2019.2928634
Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: ICCV, pp 1861–1870
Liu N, Long Y, Zou C, Niu Q, Wu H (2019) Adcrowdnet: an attention-injective deformable convolutional network for crowd understanding. In: CVPR, pp 3225–3234
Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In: ECCV, p 734–750
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: CVPR, pp 5099–5108
Jie H, Li S, Gang S, Albanie S (2017) Squeeze-and-excitation networks. IEEE Trans Pattern Analy Mach Intell PP(99):2011–2023
Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: scale-aware semantic image segmentation. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 3640–3649
Kang D, Ma Z, Chan AB (2017) Beyond counting: comparisons of density maps for crowd analysis tasks - counting, detection, and tracking. IEEE Trans Circ Syst Video Technol PP(99):1–1
Wojek C, Dollar P, Schiele B, Perona P (2012) Pedestrian detection: an evaluation of the state of the art. PAMI 34(4):743–761
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: CVPR, pp 4031–4039
Gao J, Wang Q, Yuan Y (2019) Scar: spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing 363(Oct.21):1–8
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. Springer, Cham
Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: CVPR, pp 2547–2554
Chan AB, Liang ZS, Vasconcelos N (2008) Privacy preserving crowd monitoring: Counting people without people models or tracking. In: 2008 IEEE conference on computer vision and pattern recognition
Idrees H, Saleemi I, Seibert C, Shah M (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: ICCV, pp 532–546
Guo Q, Zeng X, Hu S, Phoummixay S, Ye Y (2021) Learning a deep network with cross-hierarchy aggregation for crowd counting. Knowledge-Based Systems 213:106691
Jiang X, Xiao Z, Zhang B, Zhen X, Cao X, Doermann D (2019) Crowd counting and density estimation by trellis encoder-decoder networks. In: CVPR, pp 6133–6142
Sindagi VA, Patel VM (2019) Multi-level bottom-top and top-bottom feature fusion for crowd counting.In: ICCV, pp 1002–1012
Wu X, Zheng Y, Ye H, Hu W, Ma T, Yang J, He L (2020) Counting crowds with varying densities via adaptive scenario discovery framework. Neurocomputing 397:127–138
Wang S, Lu Y, Zhou T, Di H, Lu L, Zhang L (2020) Sclnet: spatial context learning network for congested crowd counting. Neurocomputing 404:227–239
Zhang A, Shen J, Xiao Z, Zhu F, Zhen X, Cao X, Shao L (2019) Relational attention network for crowd counting. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6787–6796
Zeng X, Wu Y, Hu S, Wang R, Ye Y (2020) Dspnet: deep scale purifier network for dense crowd counting. Expert Systems with Applications 141, 112977
Yuan L, Qiu Z, Liu L, Wu H, Chen T, Chen P, Lin L (2020) Crowd counting via scale-communicative aggregation networks. Neurocomputing 409:420–430
Zhu F, Yan H, Chen X, Li T, Zhang Z (2021) A multi-scale and multi-level feature aggregation network for crowd counting. Neurocomputing 423:46–56
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 61971073).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, Q., Guo, Y., Sang, J. et al. SGCNet: Scale-aware and global contextual network for crowd counting. Appl Intell 52, 12091–12102 (2022). https://doi.org/10.1007/s10489-022-03230-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03230-y