Deeper multi-column dilated convolutional network for congested crowd understanding

Yan, Leilei; Zhang, Li; Zheng, Xiaohan; Li, Fanzhang

doi:10.1007/s00521-021-06458-w

Deeper multi-column dilated convolutional network for congested crowd understanding

Original Article
Published: 08 September 2021

Volume 34, pages 1407–1422, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Leilei Yan^1,2,
Li Zhang ORCID: orcid.org/0000-0001-7914-0679^1,2,
Xiaohan Zheng^1,2 &
…
Fanzhang Li^1,2

261 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

In highly congested crowd scenes, it is hard to generate high-quality density maps because the crowd and background are highly mixed such that it is difficult to distinguish them. To alleviate the issue, this paper presents a deeper multi-column dilated convolutional network (DMDCNet) method, which is capable of extracting sufficient semantic features for crowd understanding in highly congested crowd scenes. In DMDCNet, there are two modules: feature extractor and density map estimator. Feature extractor is a VGG-16-based convolutional neural network (CNN), which is able to extract low-level features contained in crowd images. Density map estimator is designed as a multi-column structure of dilated convolutional neural networks (DCNNs) to further extract the deeper information and capture multi-scale contextual information, which could generate high-quality density maps from the input images. Furthermore, multi-column DCNNs in DMDCNet can effectively alleviate the “gridding” problem caused by the dilated convolution framework. Extensive experiments on several commonly used benchmark datasets are conducted to demonstrate the proposed DMDCNet, which shows that DMDCNet is comparable with the recent state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

A new YOLO-based method for real-time crowd detection from video and performance analysis of YOLO models

Article 30 January 2023

Mehmet Şirin Gündüz & Gültekin Işık

References

Babu Sam D, Sajjan NN, Venkatesh Babu R, Srinivasan M (2018) Divide and grow: Capturing huge diversity in crowd images with incrementally growing cnn. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3618–3626
Boominathan L, Kruthiventi SS, Babu RV (2016) CrowdNet: A deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM International Conference on Multimedia, pp 640–644
Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the European Conference on Computer Vision, pp 757–773
Chan AB, Liang ZSJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: Counting people without people models or tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–7
Chen J, Su W, Wang Z (2020) Crowd counting with crowd attention convolutional neural network. Neurocomputing 382:210–220
Article Google Scholar
Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. In: Proceedings of the British Machine Vision Conference, vol 1, p 3
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Machine Intell 40(4):834–848
Article Google Scholar
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. In: arXiv preprint arXiv:1706.05587 (2017)
Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1831–1840
Cireşan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3642–3649
Cristianini N, Shawe-Taylor J et al. (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 886–893
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE Trans Pattern Anal Machine Intell 34(4):743–761
Article Google Scholar
Dong L, Zhang H, Ji Y, Ding Y (2020) Crowd counting by using multi-level density-based spatial information: A multi-scale cnn framework. Inf Sci 528:79–91
Article MathSciNet Google Scholar
Gall J, Yao A, Razavi N, Van Gool L, Lempitsky V (2011) Hough forests for object detection, tracking, and action recognition. IEEE Trans Pattern Anal Machine Intell 33(11):2188–2202
Article Google Scholar
Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2547–2554
Idrees H, Tayyab M, Athrey K, Zhang D, Al-Maadeed S, Rajpoot N, Shah M (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European Conference on Computer Vision, pp 532–546
Jiang X, Xiao Z, Zhang B, Zhen X, Cao X, Doermann D, Shao L (2019) Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6133–6142
Kang D, Ma Z, Chan AB (2018) Beyond counting: Comparisons of density maps for crowd analysis tasks-counting, detection, and tracking. IEEE Trans Circuits Syst Video Technol 29(5):1408–1422
Article Google Scholar
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: International Conference on Learning Representations
Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. Proceed IEEE Conf Comput Vision Pattern Recogn 1:878–885
Google Scholar
Lempitsky V, Zisserman A (2010) Learning to count objects in images. In: Advances in Neural Information Processing Systems, pp 1324–1332
Li Y, Zhang X, Chen D (2018) CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1091–1100
Liu N, Long Y, Zou C, Niu Q, Pan L, Wu H (2019) ADCrowdnet: An attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3225–3234
Ma J, Dai Y, Tan YP (2019) Atrous convolutions spatial pyramid network for crowd counting and density estimation. Neurocomputing 350:91–101
Article Google Scholar
Marsden M, McGuinness K, Little S, O’Connor NE (2016) Fully convolutional crowd counting on highly congested scenes. In: arXiv preprint arXiv:1612.00220
Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: Proceedings of the European Conference on Computer Vision, pp 615–629
Paragios N, Ramesh V (2001) A MRF-based approach for real-time subway monitoring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol 1, pp I–1034
Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3253–3261
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. Proceed IEEE Conf Comput Vision Pattern Recogn 1:4031–4039
Google Scholar
Shang C, Ai H, Bai B (2016) End-to-end crowd counting via joint learning local and global count. In: Proceedings of the IEEE International Conference on Image Processing, pp 1215–1219
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations
Sindagi VA, Patel VM (2017) CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance, pp 1–6
Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1861–1870
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154
Article Google Scholar
Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vision 63(2):153–161
Article Google Scholar
Walach, E., Wolf, L.: Learning to count with cnn boosting. In: Proceedings of the European Conference on Computer Vision, pp. 660–676 (2016)
Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2018) Understanding convolution for semantic segmentation. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp 1451–1460
Wang Y, Hu S, Wang G, Chen C, Pan Z (2020) Multi-scale dilated convolution of convolutional neural network for crowd counting. Multimedia Tools Appl 79(1):1057–1073
Article Google Scholar
Wang Y, Zou Y (2016) Fast visual object counting via example-based density estimation. In: Proceedings of the IEEE International Conference on Image Processing, pp 3653–3657
Wei Y, Feng J, Liang X, Cheng MM, Zhao Y, Yan S (2017) Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1568–1576
Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. Proceed IEEE Int Conf Comput Vision 1:90–97
Google Scholar
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: International Conference on Learning Representations
Zeng X, Wu Y, Hu S, Wang R, Ye Y (2020) Dspnet: deep scale purifier network for dense crowd counting. Expert Syst Appl 141:112977
Article Google Scholar
Zhang C, Kang K, Li H, Wang X, Xie R, Yang X (2016) Data-driven crowd understanding: A baseline for a large-scale crowd dataset. IEEE Trans Multimedia 18(6):1048–1061
Article Google Scholar
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 833–841
Zhang S, Wu G, Costeira JP, Moura JM (2017) FCN-rLSTM: Deep spatio-temporal neural networks for vehicle counting in city cameras. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3667–3676
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 589–597
Zhao M, Zhang C, Zhang J, Porikli F, Ni B, Zhang W (2020) Scale-aware crowd counting via depth-embedded convolutional neural networks. IEEE Trans Circuits Syst Video Technol 30(10):3651–3662
Article Google Scholar
Zhu M, Wang X, Tang J, Wang N, Qu L (2020) Attentive multi-stage convolutional neural network for crowd counting. Pattern Recogn Lett 135:279–285
Article Google Scholar

Download references

Acknowledgment

This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, by the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

Joint International Research Laboratory of Machine Learning and Neuromorphic Computing & School of Computer Science and Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Leilei Yan, Li Zhang, Xiaohan Zheng & Fanzhang Li
Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Leilei Yan, Li Zhang, Xiaohan Zheng & Fanzhang Li

Authors

Leilei Yan
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohan Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Fanzhang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights statement

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, L., Zhang, L., Zheng, X. et al. Deeper multi-column dilated convolutional network for congested crowd understanding. Neural Comput & Applic 34, 1407–1422 (2022). https://doi.org/10.1007/s00521-021-06458-w

Download citation

Received: 12 October 2020
Accepted: 26 August 2021
Published: 08 September 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s00521-021-06458-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deeper multi-column dilated convolutional network for congested crowd understanding

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

A new YOLO-based method for real-time crowd detection from video and performance analysis of YOLO models

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deeper multi-column dilated convolutional network for congested crowd understanding

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

A new YOLO-based method for real-time crowd detection from video and performance analysis of YOLO models

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation