Skip to main content
Log in

Pyramid-dilated deep convolutional neural network for crowd counting

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Statistics on crowds in crowded scenes can reflect the density level of crowds and provide safety warnings. This is a laborious task if conducted manually. In recent years, automated crowd counting has received extensive attention in the computer vision field. However, this task is still challenging mainly due to the serious occlusion in crowds and large appearance variations caused by the viewing angles of cameras. To overcome these difficulties, a pyramid-dilated deep convolutional neural network for accurate crowd counting called PDD-CNN is proposed. PDD-CNN is based on a VGG-16 network that is designed to generate dense attribute feature maps from an image with an arbitrary size or resolution. Then, two pyramid dilated modules are adopted, each consisting of four parallel dilated convolutional layers with different rates and a parallel average pooling layer to capture the multiscale features. Finally, three cascading dilated convolutions are used to regress the density map and perform accurate count estimation. In addition, a novel training loss, combining the Euclidean loss with the structural similarity loss, is employed to attenuate the blurry effects of density map estimation. The experimental results on three datasets (ShanghaiTech, UCF_CC_50, and UCF-QNRF) demonstrate that the proposed PDD-CNN produces high-quality density maps and achieves a good counting performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Lempitsky V, Zisserman A (2010) Learning to count objects in images,” Advances in Neural Information Processing Systems, pp. 1324–1332

  2. Boominathan L, Kruthiventi SS, Babu RV (2016) Crowdnet: a deep convolutional network for dense crowd counting,” Proceedings of the 24th ACM International Conference on Multimedia, pp. 640–644

  3. Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. IEEE Confer Comput VisionPattern Recogn:589–597

  4. Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 5744–5752

  5. Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns,” IEEE International Conference on Computer Vision (ICCV), pp. 1879-1888

  6. Deb D, Ventura J (2018) An aggregated multicolumn dilated convolution network for perspective-free counting, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 308–317

  7. Ji Q, Zhu T, Bao D (2020) A hybrid model of convolutional neural networks and deep regression forests for crowd counting,” Applied Intelligence, pp. 1–15

  8. Duan H, Wang S, Guan Y (2020) SOFA-Net: Second-Order and First-order Attention Network for Crowd Counting, arXiv preprint arXiv:2008.03723, pp. 1–12

  9. Oñoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning,” European Conference on Computer Vision, pp. 615–629

  10. Kang D, Chan A (2018) Crowd counting by adaptively fusing predictions from an image pyramid,” arXiv preprint arXiv:1805.06115, pp. 1–12

  11. Marsden M, McGuiness K, Little S, O’Connor NE (2017) Fully convolutional crowd counting on highly congested scenes,” 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP), pp. 27–33

  12. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network, IEEE Conference on Computer Vision and Pattern Recognition, pp. 6230–6239

  13. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Machine Intell 40(4):834–848

    Article  Google Scholar 

  14. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, pp. 1–13

  15. Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134

  16. Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018) Crowd counting via adversarial cross-scale consistency pursuit, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5245–5254

  17. Idrees H, Tayyab M, Athrey K, Zhang D, Al-Maddeed S, Rajpoot N, Shah M (2018) Composition loss for counting, density map estimation and localization in dense crowds,” IEEE European Conference on Computer Vision, pp. 544–559

  18. Zhao H, Gallo O, Frosio I, Kautz J (2017) Loss functions for image restoration with neural networks. IEEE Trans Computational Imaging 3(1):47–57

    Article  Google Scholar 

  19. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

  20. Loy CC, Chen K, Gong S, Xiang T (2013) Crowd counting and profiling: methodology and evaluation,” Modeling, Simulation and Visual Analysis of Crowds, Springer, pp. 347–382

  21. Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63(2):153–161

    Article  Google Scholar 

  22. Li M, Zhang Z, Huang K Tan T (2008) Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection,” 19th International Conference on Pattern Recognition, pp. 1–4

  23. Lin S, Chen J-Y, Chao H-X (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE Trans Syst Man Cybern Syst Hum 31(6):645–654

    Article  Google Scholar 

  24. Chan AB, Liang Z-SJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: Counting people without people models or tracking,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7

  25. Chan AB, Vasconcelos N (2012) Counting people with low level features and bayesian regression. IEEE Trans Image Process 21(4):2160–2177

    Article  MathSciNet  Google Scholar 

  26. Ryan D, Denman S, Fookes C, Sridharan S (2009) Crowd counting using multiple local features, Digital Image Computing: Techniques and Applications, pp. 81-88

  27. Ryan D, Denman S, Sridharan S, Fookes C (2015) An evaluation of crowd counting methods, features and regression models. Comput Vis Image Underst 130:1–17

    Article  Google Scholar 

  28. Pham V-Q, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: co-voting uncertain number of targets using random forest for crowd density estimation, IEEE International Conference on Computer Vision, pp. 3253–3261

  29. Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. IEEE Confer Comput Vision Pattern Recogn:833–841

  30. Wang L, Yin B, Tang X, Li Y (2019) Removing background interference for crowd counting via de-background detail convolutional network. Neurocomputing 332:360–371

    Article  Google Scholar 

  31. Shi M, Yang Z, Xu C, Chen Q (2018) Revisiting perspective information for efficient crowd counting,” arXiv preprint arXiv: 1807.01989, pp. 1–10

  32. Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting, European Conference on Computer Vision, pp. 757–773

  33. Li Y, Zhang X, Chen D (2018) CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes, IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp:1091–1100

  34. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, pp. 1–14

  35. P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell (2018) Understanding convolution for semantic segmentation,” IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1451–1460

  36. Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images, IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2547–2554

  37. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet: classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, pp. 1097–1105

  38. Hinton GE (2012) A Practical Guide to Training Restricted Boltzmann Machines, Neural Networks: Tricks of the Trade. Springer, Berlin, pp 599–619

    Book  Google Scholar 

  39. Bengio Y (2012) Practical recommendations for gradient-based training of deep architectures, Neural networks: Tricks of the Trade. Springer, Berlin, pp 437–478

    Book  Google Scholar 

  40. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks,” Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256

  41. Li J, Xue Y, Wang W, Ouyang G (2020) Cross-level parallel network for crowd counting. IEEE Trans Indust Informatics 16(1):566–576

    Article  Google Scholar 

  42. Zeng X, Wu Y, Hu S, Wang R, Ye Y (2020) DSPNet: Deep scale purifier network for dense crowd counting, Expert Systems With Applications, pp. 1–10

  43. Wang Q, Gao J, Lin W, Yuan Y (2020) Pixel-Wise Crowd Understanding via Synthetic Data,” International Journal of Computer Vision, pp. 1–21

  44. Sindagi VA, Patel VM (2017) CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting, 14th International Conference on Advanced Video and Signal Based Surveillance, pp. 1–6

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (61773085).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quanli Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, W., Liu, Q. & Wang, W. Pyramid-dilated deep convolutional neural network for crowd counting. Appl Intell 52, 1825–1837 (2022). https://doi.org/10.1007/s10489-021-02537-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02537-6

Keywords

Navigation