Skip to main content
Log in

Deeper multi-column dilated convolutional network for congested crowd understanding

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In highly congested crowd scenes, it is hard to generate high-quality density maps because the crowd and background are highly mixed such that it is difficult to distinguish them. To alleviate the issue, this paper presents a deeper multi-column dilated convolutional network (DMDCNet) method, which is capable of extracting sufficient semantic features for crowd understanding in highly congested crowd scenes. In DMDCNet, there are two modules: feature extractor and density map estimator. Feature extractor is a VGG-16-based convolutional neural network (CNN), which is able to extract low-level features contained in crowd images. Density map estimator is designed as a multi-column structure of dilated convolutional neural networks (DCNNs) to further extract the deeper information and capture multi-scale contextual information, which could generate high-quality density maps from the input images. Furthermore, multi-column DCNNs in DMDCNet can effectively alleviate the “gridding” problem caused by the dilated convolution framework. Extensive experiments on several commonly used benchmark datasets are conducted to demonstrate the proposed DMDCNet, which shows that DMDCNet is comparable with the recent state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Babu Sam D, Sajjan NN, Venkatesh Babu R, Srinivasan M (2018) Divide and grow: Capturing huge diversity in crowd images with incrementally growing cnn. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3618–3626

  2. Boominathan L, Kruthiventi SS, Babu RV (2016) CrowdNet: A deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM International Conference on Multimedia, pp 640–644

  3. Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the European Conference on Computer Vision, pp 757–773

  4. Chan AB, Liang ZSJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: Counting people without people models or tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–7

  5. Chen J, Su W, Wang Z (2020) Crowd counting with crowd attention convolutional neural network. Neurocomputing 382:210–220

    Article  Google Scholar 

  6. Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. In: Proceedings of the British Machine Vision Conference, vol 1, p 3

  7. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Machine Intell 40(4):834–848

    Article  Google Scholar 

  8. Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. In: arXiv preprint arXiv:1706.05587 (2017)

  9. Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1831–1840

  10. Cireşan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3642–3649

  11. Cristianini N, Shawe-Taylor J et al. (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University

  12. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 886–893

  13. Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE Trans Pattern Anal Machine Intell 34(4):743–761

    Article  Google Scholar 

  14. Dong L, Zhang H, Ji Y, Ding Y (2020) Crowd counting by using multi-level density-based spatial information: A multi-scale cnn framework. Inf Sci 528:79–91

    Article  MathSciNet  Google Scholar 

  15. Gall J, Yao A, Razavi N, Van Gool L, Lempitsky V (2011) Hough forests for object detection, tracking, and action recognition. IEEE Trans Pattern Anal Machine Intell 33(11):2188–2202

    Article  Google Scholar 

  16. Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2547–2554

  17. Idrees H, Tayyab M, Athrey K, Zhang D, Al-Maadeed S, Rajpoot N, Shah M (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European Conference on Computer Vision, pp 532–546

  18. Jiang X, Xiao Z, Zhang B, Zhen X, Cao X, Doermann D, Shao L (2019) Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6133–6142

  19. Kang D, Ma Z, Chan AB (2018) Beyond counting: Comparisons of density maps for crowd analysis tasks-counting, detection, and tracking. IEEE Trans Circuits Syst Video Technol 29(5):1408–1422

    Article  Google Scholar 

  20. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: International Conference on Learning Representations

  21. Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. Proceed IEEE Conf Comput Vision Pattern Recogn 1:878–885

    Google Scholar 

  22. Lempitsky V, Zisserman A (2010) Learning to count objects in images. In: Advances in Neural Information Processing Systems, pp 1324–1332

  23. Li Y, Zhang X, Chen D (2018) CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1091–1100

  24. Liu N, Long Y, Zou C, Niu Q, Pan L, Wu H (2019) ADCrowdnet: An attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3225–3234

  25. Ma J, Dai Y, Tan YP (2019) Atrous convolutions spatial pyramid network for crowd counting and density estimation. Neurocomputing 350:91–101

    Article  Google Scholar 

  26. Marsden M, McGuinness K, Little S, O’Connor NE (2016) Fully convolutional crowd counting on highly congested scenes. In: arXiv preprint arXiv:1612.00220

  27. Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: Proceedings of the European Conference on Computer Vision, pp 615–629

  28. Paragios N, Ramesh V (2001) A MRF-based approach for real-time subway monitoring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol 1, pp I–1034

  29. Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3253–3261

  30. Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. Proceed IEEE Conf Comput Vision Pattern Recogn 1:4031–4039

    Google Scholar 

  31. Shang C, Ai H, Bai B (2016) End-to-end crowd counting via joint learning local and global count. In: Proceedings of the IEEE International Conference on Image Processing, pp 1215–1219

  32. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations

  33. Sindagi VA, Patel VM (2017) CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance, pp 1–6

  34. Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1861–1870

  35. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154

    Article  Google Scholar 

  36. Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vision 63(2):153–161

    Article  Google Scholar 

  37. Walach, E., Wolf, L.: Learning to count with cnn boosting. In: Proceedings of the European Conference on Computer Vision, pp. 660–676 (2016)

  38. Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2018) Understanding convolution for semantic segmentation. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp 1451–1460

  39. Wang Y, Hu S, Wang G, Chen C, Pan Z (2020) Multi-scale dilated convolution of convolutional neural network for crowd counting. Multimedia Tools Appl 79(1):1057–1073

    Article  Google Scholar 

  40. Wang Y, Zou Y (2016) Fast visual object counting via example-based density estimation. In: Proceedings of the IEEE International Conference on Image Processing, pp 3653–3657

  41. Wei Y, Feng J, Liang X, Cheng MM, Zhao Y, Yan S (2017) Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1568–1576

  42. Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. Proceed IEEE Int Conf Comput Vision 1:90–97

    Google Scholar 

  43. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: International Conference on Learning Representations

  44. Zeng X, Wu Y, Hu S, Wang R, Ye Y (2020) Dspnet: deep scale purifier network for dense crowd counting. Expert Syst Appl 141:112977

    Article  Google Scholar 

  45. Zhang C, Kang K, Li H, Wang X, Xie R, Yang X (2016) Data-driven crowd understanding: A baseline for a large-scale crowd dataset. IEEE Trans Multimedia 18(6):1048–1061

    Article  Google Scholar 

  46. Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 833–841

  47. Zhang S, Wu G, Costeira JP, Moura JM (2017) FCN-rLSTM: Deep spatio-temporal neural networks for vehicle counting in city cameras. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3667–3676

  48. Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 589–597

  49. Zhao M, Zhang C, Zhang J, Porikli F, Ni B, Zhang W (2020) Scale-aware crowd counting via depth-embedded convolutional neural networks. IEEE Trans Circuits Syst Video Technol 30(10):3651–3662

    Article  Google Scholar 

  50. Zhu M, Wang X, Tang J, Wang N, Qu L (2020) Attentive multi-stage convolutional neural network for crowd counting. Pattern Recogn Lett 135:279–285

    Article  Google Scholar 

Download references

Acknowledgment

This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, by the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights statement

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, L., Zhang, L., Zheng, X. et al. Deeper multi-column dilated convolutional network for congested crowd understanding. Neural Comput & Applic 34, 1407–1422 (2022). https://doi.org/10.1007/s00521-021-06458-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06458-w

Keywords

Navigation