Abstract
Estimating the crowd density in surveillance videos is a hot issue in the field of computer vision and has become the basis of data processing and analysis of public transport services, commercial passenger flow analysis, public security protection and other industries. However, in terms of practical applications, due to the problems of pedestrian occlusion and scale changes, existing methods are inadequate with regard to the acquisition of the human head, which affects the accuracy of counting. To solve this problem, a crowd counting method based on a self-attention residual network is proposed. First, a multiscale convolution module composed of dilated convolution and deformation convolution is used. To avoid losing image resolution, some of the sampling positions are shifted to the occluded crowd by shifting the sampling points, which solves the problem of crowd occlusion. Then, a self-attention residual module is designed to score and classify the feature map, which allows all pixels in the feature map to be classified. The corresponding weight is generated, and the population scale is determined by the weight, which solves the problem of crowd scale changes. The algorithm is applied in ShanghaiTech and the UCF_CC_50 and WorldExpo’10 datasets are tested. The experimental results show that the mean absolute error (MAE) and mean square error (MSE) of this algorithm are significantly reduced compared with those of a comparative algorithm.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Li T, Chang H, Wang M, Ni B, Hong R, Yan S (2014) Crowded scene analysis: a survey. IEEE transactions on circuits and systems for video technology 25(3):367–386
Onoro-Rubio D, López-Sastre R J (2016) towards perspective-free object counting with deep learning. In European Conference on Computer Vision (ECCV) 615-629
Zhang S, Wu G, Costeira JP, Moura JM (2017) Understanding traffic density from large-scale web camera data. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5898–5907
Lin SF, Chen JY, Chao HX (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 31(6):645–654
Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In: In Tenth IEEE International Conference on Computer Vision (ICCV'05), 1(1), pp 90–97
Li M, Zhang Z, Huang K, Tan T (2008, December). Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In 2008 19th International Conference on Pattern Recognition 1–4
Zhao T, Nevatia R, Wu B (2008) Segmentation and tracking of multiple humans in crowded environments. IEEE Trans Pattern Anal Mach Intell 30(7):1198–1211
Ge W, Collins RT (2009). Marked point processes for crowd counting. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2913–2920
Wang M, Wang X (2011). Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In CVPR 3401-3408
Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63(2):153–161
Liang R, Zhu Y, Wang H (2014) Counting crowd flow based on feature points. Neurocomputing 133:377–384
Siva P, Shafiee MJ, Jamieson M, Wong A (2016). Scene invariant crowd segmentation and counting using scale-normalized histogram of moving gradients (homg). arXiv preprint arXiv:1602.00386
An S, Liu W, Venkatesh S (2007). Face recognition using kernel ridge regression. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1–7
Chan AB, Vasconcelos N (2009, September) Bayesian poisson regression for crowd counting. IEEE International Conference on Computer Vision (ICCV) 545–551
Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. Proc IEEE International Conference on Computer Vision (ICCV) 3253–3261
Zhang S, Wu G, Costeira JP, Moura JM (2017). Fcn-rlstm: deep spatio-temporal neural networks for vehicle counting in city cameras. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) 3667–3676
Hu J, Lu J, Tan YP (2017) Sharable and individual multi-view metric learning. IEEE Trans Pattern Anal Mach Intell 40(9):2281–2288
Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018). Crowd counting via adversarial cross-scale consistency pursuit. In proceedings of the IEEE conference On Computer Vision And Pattern Recognition (CVPR) 5245–5254
Song C, Huang Y, Ouyang W, Wang L (2018). Mask-guided contrastive attention model for person re-identification. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1179-1188
Shi Z, Zhang L, Liu Y, Cao X, Ye Y, Cheng MM, Zheng G (2018). Crowd counting with deep negative correlation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition 5382–5390
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 833-841
Boominathan L, Kruthiventi SS, Babu RV (2016, October) Crowdnet: a deep convolutional network for dense crowd counting. In Proceedings of the 24th ACM international conference on Multimedia 640–644
Zhang Y, Zhou D, Chen S, Gao S, & Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 589-597
Sindagi VA, Patel VM (2017, August) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) 1-6
Li Y, Zhang X, Chen D (2018) Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 1091-1100
Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In Proceedings of the European Conference on Computer Vision (ECCV) 734-750
Liu J, Gao C, Meng D, Hauptmann AG (2018) Decidenet: counting varying density crowds through attention guided detection and density estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5197-5206
Ranjan V, Le H, Hoai M (2018) Iterative crowd counting. In Proceedings of the European Conference on Computer Vision (ECCV) 270-285
Liu W, Salzmann M, Fua P (2019). Context-aware crowd counting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5099-5108
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017). Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) 764-773
Sindagi VA, Patel VM (2017). Generating high-quality crowd density maps using contextual pyramid cnns. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) 1861-1870
Liu N, Long Y, Zou C, Niu Q, Pan L, & Wu H (2019) Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3225-3234
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua TS (2017) Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 5659-5667
Loshchilov I, Hutter F (2015) Online batch selection for faster training of neural networks. arXiv preprint arXiv:1511.06343
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 761-769
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Idrees H, Saleemi I, Seibert C, Shah M (2013). Multi-source multiscale counting in extremely dense crowd images. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 2547-2554
Sam DB, Surya S, Babu RV (2017, July) switching convolutional neural network for crowd counting. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4031-4039
Liu X, van de Weijer J, Bagdanov AD (2018) Leveraging unlabeled data for crowd counting by learning to rank. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 7661-7669
Babu Sam D, Sajjan NN, Venkatesh Babu R, Srinivasan M (2018). Divide and grow: capturing huge diversity in crowd images with incrementally growing cnn. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR) 3618-3626
Liu L, Jiang J, Jia W, Amirgholipour S, Zeibots M, He X (2019) DENet: a Universal Network for Counting Crowd with Varying Densities and Scales. arXiv preprint arXiv:1904.08056
Acknowledgements
The authors are grateful for collaborative funding support from the Natural Science Foundation of Shandong Province, China (ZR2018MEE008), National Natural Science Foundation of China (51904173), in part by the Project of Shandong Province High Educational Science and Technology Program (J18KA307).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, YB., Jia, RS., Liu, QM. et al. Crowd counting method based on the self-attention residual network. Appl Intell 51, 427–440 (2021). https://doi.org/10.1007/s10489-020-01842-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01842-w