Abstract
Statistics on crowds in crowded scenes can reflect the density level of crowds and provide safety warnings. This is a laborious task if conducted manually. In recent years, automated crowd counting has received extensive attention in the computer vision field. However, this task is still challenging mainly due to the serious occlusion in crowds and large appearance variations caused by the viewing angles of cameras. To overcome these difficulties, a pyramid-dilated deep convolutional neural network for accurate crowd counting called PDD-CNN is proposed. PDD-CNN is based on a VGG-16 network that is designed to generate dense attribute feature maps from an image with an arbitrary size or resolution. Then, two pyramid dilated modules are adopted, each consisting of four parallel dilated convolutional layers with different rates and a parallel average pooling layer to capture the multiscale features. Finally, three cascading dilated convolutions are used to regress the density map and perform accurate count estimation. In addition, a novel training loss, combining the Euclidean loss with the structural similarity loss, is employed to attenuate the blurry effects of density map estimation. The experimental results on three datasets (ShanghaiTech, UCF_CC_50, and UCF-QNRF) demonstrate that the proposed PDD-CNN produces high-quality density maps and achieves a good counting performance.
Similar content being viewed by others
References
Lempitsky V, Zisserman A (2010) Learning to count objects in images,” Advances in Neural Information Processing Systems, pp. 1324–1332
Boominathan L, Kruthiventi SS, Babu RV (2016) Crowdnet: a deep convolutional network for dense crowd counting,” Proceedings of the 24th ACM International Conference on Multimedia, pp. 640–644
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. IEEE Confer Comput VisionPattern Recogn:589–597
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 5744–5752
Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns,” IEEE International Conference on Computer Vision (ICCV), pp. 1879-1888
Deb D, Ventura J (2018) An aggregated multicolumn dilated convolution network for perspective-free counting, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 308–317
Ji Q, Zhu T, Bao D (2020) A hybrid model of convolutional neural networks and deep regression forests for crowd counting,” Applied Intelligence, pp. 1–15
Duan H, Wang S, Guan Y (2020) SOFA-Net: Second-Order and First-order Attention Network for Crowd Counting, arXiv preprint arXiv:2008.03723, pp. 1–12
Oñoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning,” European Conference on Computer Vision, pp. 615–629
Kang D, Chan A (2018) Crowd counting by adaptively fusing predictions from an image pyramid,” arXiv preprint arXiv:1805.06115, pp. 1–12
Marsden M, McGuiness K, Little S, O’Connor NE (2017) Fully convolutional crowd counting on highly congested scenes,” 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP), pp. 27–33
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network, IEEE Conference on Computer Vision and Pattern Recognition, pp. 6230–6239
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Machine Intell 40(4):834–848
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, pp. 1–13
Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134
Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018) Crowd counting via adversarial cross-scale consistency pursuit, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5245–5254
Idrees H, Tayyab M, Athrey K, Zhang D, Al-Maddeed S, Rajpoot N, Shah M (2018) Composition loss for counting, density map estimation and localization in dense crowds,” IEEE European Conference on Computer Vision, pp. 544–559
Zhao H, Gallo O, Frosio I, Kautz J (2017) Loss functions for image restoration with neural networks. IEEE Trans Computational Imaging 3(1):47–57
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Loy CC, Chen K, Gong S, Xiang T (2013) Crowd counting and profiling: methodology and evaluation,” Modeling, Simulation and Visual Analysis of Crowds, Springer, pp. 347–382
Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63(2):153–161
Li M, Zhang Z, Huang K Tan T (2008) Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection,” 19th International Conference on Pattern Recognition, pp. 1–4
Lin S, Chen J-Y, Chao H-X (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE Trans Syst Man Cybern Syst Hum 31(6):645–654
Chan AB, Liang Z-SJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: Counting people without people models or tracking,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7
Chan AB, Vasconcelos N (2012) Counting people with low level features and bayesian regression. IEEE Trans Image Process 21(4):2160–2177
Ryan D, Denman S, Fookes C, Sridharan S (2009) Crowd counting using multiple local features, Digital Image Computing: Techniques and Applications, pp. 81-88
Ryan D, Denman S, Sridharan S, Fookes C (2015) An evaluation of crowd counting methods, features and regression models. Comput Vis Image Underst 130:1–17
Pham V-Q, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: co-voting uncertain number of targets using random forest for crowd density estimation, IEEE International Conference on Computer Vision, pp. 3253–3261
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. IEEE Confer Comput Vision Pattern Recogn:833–841
Wang L, Yin B, Tang X, Li Y (2019) Removing background interference for crowd counting via de-background detail convolutional network. Neurocomputing 332:360–371
Shi M, Yang Z, Xu C, Chen Q (2018) Revisiting perspective information for efficient crowd counting,” arXiv preprint arXiv: 1807.01989, pp. 1–10
Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting, European Conference on Computer Vision, pp. 757–773
Li Y, Zhang X, Chen D (2018) CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes, IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp:1091–1100
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, pp. 1–14
P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell (2018) Understanding convolution for semantic segmentation,” IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1451–1460
Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images, IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2547–2554
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet: classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, pp. 1097–1105
Hinton GE (2012) A Practical Guide to Training Restricted Boltzmann Machines, Neural Networks: Tricks of the Trade. Springer, Berlin, pp 599–619
Bengio Y (2012) Practical recommendations for gradient-based training of deep architectures, Neural networks: Tricks of the Trade. Springer, Berlin, pp 437–478
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks,” Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256
Li J, Xue Y, Wang W, Ouyang G (2020) Cross-level parallel network for crowd counting. IEEE Trans Indust Informatics 16(1):566–576
Zeng X, Wu Y, Hu S, Wang R, Ye Y (2020) DSPNet: Deep scale purifier network for dense crowd counting, Expert Systems With Applications, pp. 1–10
Wang Q, Gao J, Lin W, Yuan Y (2020) Pixel-Wise Crowd Understanding via Synthetic Data,” International Journal of Computer Vision, pp. 1–21
Sindagi VA, Patel VM (2017) CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting, 14th International Conference on Advanced Video and Signal Based Surveillance, pp. 1–6
Acknowledgements
This research was supported by the National Natural Science Foundation of China (61773085).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, W., Liu, Q. & Wang, W. Pyramid-dilated deep convolutional neural network for crowd counting. Appl Intell 52, 1825–1837 (2022). https://doi.org/10.1007/s10489-021-02537-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02537-6