Abstract
Recently, state-of-the-art crowd counting methods have focused more on predicting a density map and then obtaining the final aggregated count. In 2018, a typical density map-based network for congested scene recognition called CSRNet was proposed, and it achieved better crowd counting performance than previous methods with a simple architecture. It utilizes the first 10 layers from VGG-16 as the front end and deploys dilated convolutional layers as the back-end to generate high-quality density maps. CSRNet has been demonstrated on four datasets (ShanghaiTech dataset, the UCF_CC_50 dataset, the World Expo’10 dataset, and the UCSD dataset) and delivered great performance. To obtain better performance, in this paper, we propose a small network as a new component that generates a counting residual estimation, and we combine our component with CSRNet. We demonstrate this combined network on three datasets (ShanghaiTech dataset, the UCF_CC_50 dataset, and the World Expo’10 dataset) and compare the results with those of CSRNet. The results show that our method has significantly improved the results of CSRNet. Through a series of experiments, such as ablation experiments and control experiments, we demonstrate the effectiveness of our method. In the future, we will apply our method to other networks to achieve better results.

Similar content being viewed by others
References
Bai S, He Z, Qiao Y, Hu H, Wu W, Yan J (2020) Adaptive dilated network with self-correction supervision for counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4594–4603
Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the European Conference on Computer Vision, pp 734–750
Chan AB, Vasconcelos N (2009) Bayesian poisson regression for crowd counting. In: 2009 IEEE 12th international conference on computer vision, pp 545–551
Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. In: BMVC, vol 1, no 2, pp 3
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), vol 1, pp 886–893
de Sá CC, Gonçalves MA, Sousa DX, Salles T (2016) Generalized BROOF-L2R: a general framework for learning to rank based on boosting and random forests. In: proceedings of the 39th international ACM SIGIR conference on Research and Development in information retrieval, pp 95–104
Diwakar M, Kumar M (2018) A review on CT image noise and its denoising. Biomed Signal Process Control 42:73–88
Diwakar M, Singh P (2020) CT image denoising using multivariate model and its method noise thresholding in non-subsampled shearlet domain. Biomed Signal Process Control 57:101754
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
Enzweiler M, Gavrila DM (2008) Monocular pedestrian detection: survey and experiments. IEEE Trans Pattern Anal Mach Intell 31(12):2179–2195
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. Proc IEEE Conf Comput Vis Pattern Recognit 2013:2547–2554
Idrees H, Tayyab M, Athrey K, Zhang D, Al-Maadeed S, Rajpoot N, Shah M (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European conference on computer vision (ECCV), pp 532–546
Kumar M, Diwakar M (2019) A new exponentially directional weighted function based CT image denoising using total variation. J King Saud Univ-Comput Inf Sci 31(1):113–124
Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05), vol 1, pp 878–885
Lempitsky V, Zisserman A (2010) Learning to count objects in images. In: Advances in neural information processing systems, pp 1324–1332
Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In: 2008 19th international conference on pattern recognition, pp 1–4
Li Y, Zhang X, Chen D (2018) CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1091–1100
Liu J, Gao C, Meng D, Hauptmann AG (2018) DecideNet: counting varying density crowds through attention guided detection and density estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5197–5206
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5099–5108
Liu N, Long Y, Zou C, Niu Q, Pan L, Wu H (2019) ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3225–3234
Liu L, Qiu Z, Li G, Liu S, Ouyang W, Lin L (2019) Crowd counting with deep structured scale integration network. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1774–1783
Luo W, Xing J, Milan A, Zhang X, Liu W, Zhao X, Kim TK (2014) Multiple object tracking: a literature review. arXiv:1409.7618
Luo W, Sun P, Zhong F, Liu W, Zhang T, Wang Y (2019) End-to-end active object tracking and its real-world deployment via reinforcement learning. IEEE Trans Pattern Anal Mach Intell 42:1317–1332
Mohan A, Chen Z, Weinberger K (2011) Web-search ranking with initialized gradient boosted regression trees. In: Proceedings of the learning to rank challenge, pp 77–89
Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3253–3261
Ryan D, Denman S, Fookes C, et al (2009) Crowd counting using multiple local features. Digital image computing: techniques and applications. In: 2009 digital image computing: techniques and applications, pp 81–88
Sang J, Wu W, Luo H, Xiang H, Zhang Q, Hu H, Xia X (2019) Improved crowd counting method based on scale-adaptive convolutional neural network. IEEE Access 7:24411–24419
Sindagi VA, Patel VM (2017) CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–6
Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid CNNs. In: IEEE International Conference on Computer Vision, pp 1879–1888
Sindagi VA, Patel VM (2018) A survey of recent advances in CNN-based single image crowd counting and density estimation. Pattern Recogn Lett 107:3–16
Tuzel O, Porikli F, Meer P (2008) Pedestrian detection via classification on riemannian manifolds. IEEE Trans Pattern Anal Mach Intell 30(10):1713–1727
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Wan J, Chan A (2019) Adaptive density map generation for crowd counting. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1130–1139
Wan J, Luo W, Wu B, Chan AB, Liu W (2019) Residual regression with semantic prior for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4036–4045
Wu B, Nevatia R (2007) Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int J Comput Vis 75(2):247–266
Yan Z, Yuan Y, Zuo W, Tan X, Wang Y, Wen S, Ding E (2019) Perspective-guided convolution networks for crowd counting. In: Proceedings of the IEEE International Conference on Computer Vision, pp 952–961
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 589–597
Acknowledgments
This work was supported by National Natural Science Foundation of China (No. 61971073).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, L., Guo, Y., Sang, J. et al. A crowd counting method via density map and counting residual estimation. Multimed Tools Appl 81, 43503–43512 (2022). https://doi.org/10.1007/s11042-022-13220-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13220-4