Multi-scale and multi-column convolutional neural network for crowd density estimation

Chen, Lei; Wang, Guodong; Hou, Guojia

doi:10.1007/s11042-020-10002-8

Multi-scale and multi-column convolutional neural network for crowd density estimation

Published: 20 October 2020

Volume 80, pages 6661–6674, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Lei Chen¹,
Guodong Wang¹ &
Guojia Hou¹

453 Accesses
8 Citations
Explore all metrics

Abstract

In order to accurately identify objects of different sizes, we propose an efficient Multi-Scale and Multi-Column Convolutional Neural Network (MSMC) to estimate the crowd density. On the one hand, the ground truth is generated based on the existed label information. On the other hand, the image is fed into our model to find the relationship between the ground truth and the predicted density map. The network is composed of three components: feature extraction, feature fusion and feature regression. First, VGG16 is utilized for faster feature extraction. Second, different sizes layers from VGG16 are fused, which helps the detection of objects with different sizes. Third, we apply multi-channel convolution to further solve the issue of multi-sizes. After the fusion block, the dilated convolution is employed to strengthen the receptive field without increasing the amount of parameters. In the crowd density estimation, the combination of multiple sizes and multiple channels enhances the ability of receiving information, improves the mapping ability of the original image and the density map, and promotes the accuracy of crowd density estimation. In this paper, the test results of the ShanghaiTech Dataset and UCF_CC_50 Dataset are provided in the Experiment section, which shows that the proposed method makes an excellent performance in both accuracy and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

A review of object detection based on deep learning

Article 12 June 2020

Transfer learning for image classification using VGG19: Caltech-101 image data set

Article 17 September 2021

References

Aich S, Stavness I (2019) Global sum pooling: a generalization trick for object counting with small datasets of large images. In Proc. IEEE Conf. CVPR, pp. 73–82
Babu Sam D, Sajjan NN, Venkatesh Babu R, et al (2018) Divide and grow: capturing huge diversity in crowd images with incrementally growing cnn. In Proc. IEEE Conf. CVPR, pp. 3618–3626
Boominathan L, Kruthiventi SSS, Babu RV (2016) Crowdnet: a deep convolutional network for dense crowd counting. In Proc.of the 2016 ACM on Multimedia Conf., ACM, pp. 640–644
Cai W, Wei Z (2020) PiiGAN: generative adversarial networks for pluralistic image Inpainting. IEEE Access 8:48451–48463
Article Google Scholar
Cao X, Wang Z, Zhao Y, et al (2018) Scale aggregation network for accurate and efficient crowd counting. In Proc. ECCV, pp. 734–750
Chan AB, Vasconcelos N (2009) Bayesian poisson regression for crowd counting. In Proc. IEEE Conf. ICCV, pp. 545–551
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Cireşan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In Proc. IEEE Conf. CVPR, pp. 3642–3649
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In Proc. IEEE Conf. CVPR, pp. 886–893
Deb D, Ventura J (2018) An aggregated multicolumn dilated convolution network for perspective-free counting. In Proc. IEEE Conf. CVPR, pp. 195–204
Dollar P, Wojek C, Schiele B et al (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
Article Google Scholar
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Hu S, Wang G, Wang Y, Chen C, Pan Z (2020) Accurate image super-resolution using dense connections and dimension reduction network. Multimed Tools Appl 79:1427–1443
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Proc. NIPS, pp. 1097–1105
LeCun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Lempitsky V, Zisserman A (2010) Learning to count objects in images. In Proc. NIPS, pp. 1324–1332
Li T, Chang H, Wang M, Ni B, Hong R, Yan S (2015) Crowded scene analysis: a survey. IEEE Trans on Circuits and Syst for Video Technol 25(3):367–386
Article Google Scholar
Li K, Ma W, Usman S et al (2020) Object detection with convolutional neural networks. Deep Learning in Computer Vision: Principles and Applications 30(31):41–62
Article Google Scholar
Li Y, Zhang X, Chen D (2018) Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In Proc. IEEE Conf. CVPR, pp. 1091–1100
Li M, Zhang Z, Huang K, et al (2008) Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection. In Proc IEEE Conf CVPR, 1–4
Lin SF, Chen JY, Chao HX (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE trans. On Syst. Man, and Cybernetics-Part A: Systems and Humans 31(6):645–654
Article Google Scholar
Liu N, Long Y, Zou C, et al (2019) ADCrowdNet: an Attention-injective Deformable Convolutional Network for Crowd Understanding. In Proc. IEEE Conf. CVPR, pp. 3225–3234
Liu L, Ouyang W, Xiaogang W et al (2020) Deep learning for generic object detection: a survey. Int J Comput Vision 128(2):261–318
Article Google Scholar
Liu X, van de Weijer J, Bagdanov AD (2018) Leveraging unlabeled data for crowd counting by learning to rank. In Proc. IEEE Conf. CVPR, pp. 7661–7669
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In Proc. IEEE Conf. CVPR, pp. 3431–3440
Loy CC, Chen K, Gong S, et al (2013) Crowd counting and profiling: Methodology and evaluation. In Modeling, Simulation and Visual Analysis of Crowds. Springer, pp. 347–382
Mahmoud H and Ali IA (2020) Deep learning in computer vision: principles and applications. CRC Press
Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In Proc. ECCV, pp. 615–629
Ranjan V, Le H, Hoai M (2018) Iterative crowd counting. In Proc. ECCV, pp. 270–285
Revathi T and Rajalaxm TM (2020) Deep Learning for People Counting Model Soft Computing for Problem Solving, https://doi.org/10.1007/978-981-15-0035-0_43
Sam DB, Babu RV (2018) Top-down feedback for crowd counting convolutional neural network. Thirty-Second AAAI Conf on AI
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In Proc. IEEE Conf. CVPR, pp. 4031–4039
Shi M, Yang Z, Xu C, et al (2019) Revisiting perspective information for efficient crowd counting. In Proc. IEEE Conf. CVPR, pp. 7279–7288
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv
Sindagi VA, Patel VM (2017) Generating highquality crowd density maps using contextual pyramid CNNs. In Proc. IEEE Conf. CVPR, pp. 1861–1870
Vedaldi A, Jia Y, Shelhamer E, et al (2014) Caffe: Convolutional architecture for fast feature embedding. In Proc.of the 22nd ACM International Conf. on Multimedia. ACM, pp. 675–678
Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63(2):153–161
Article Google Scholar
Wang Y, Hu S, Wang G et al (2020) Multi-scale dilated convolution of convolutional neural network for crowd counting. Multimed Tools Appl 78(11):1057–1073
Article Google Scholar
Wang Y, Wang G, Chen C, Pan Z (2019) Multi-scale convolution of convolutional neural network for image denoising. Multimed Tools Appl 78:19945–19960
Article Google Scholar
Wang Z, Zou C, Cai W (2020) Small sample classification of Hyperspectral remote sensing images based on sequential joint Deeping learning model. IEEE 8:71353–71363
Google Scholar
Wei Y, Feng J, Liang X, et al (2017) Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proc. IEEE Conf. CVPR, pp. 1568–1576
Wei Y, Liang X, Chen Y, Shen X, Cheng MM, Feng J, Zhao Y, Yan S (2017) Stc: a simple to complex framework for weaklysupervised semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(11):2314–2320
Article Google Scholar
You H, Tian S, Yu L, Lv Y (2020) Pixel-level remote sensing image recognition based on bidirectional word vectors. IEEE Trans Geosci Remote Sens 58(2):1281–1293
Article Google Scholar
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In Proc. ICLR
Zhang Q, Chan AB (2019) Wide-Area Crowd Counting via Ground-Plane Density Maps and Multi-View Fusion CNNs. In Proc. IEEE Conf. CVPR, pp. 8297–8306
Zhang C, Kang K, Li H, Wang X, Xie R, Yang X (2016) Data-driven crowd understanding: a baseline for a large-scale crowd dataset. IEEE Trans Multimedia 18(6):1048–1061
Article Google Scholar
Zhang C, Li H, Wang X, et al (2015) Cross-scene crowd counting via deep convolutional neural networks. In Proc. IEEE Conf. CVPR, pp. 833–841
Zhang L, Shi M, Chen Q (2018) Crowd counting via scale-adaptive convolutional neural network. In Proc. IEEE Conf. WACV, pp. 1113–1121
Zhang Y, Zhou D, Chen S, et al (2016) Single-image crowd counting via multi-column convolutional neural network. In Proc. IEEE Conf. CVPR, pp. 589–597

Download references

Acknowledgements

The research work is supported by the Natural Science Foundation of Shandong Province, China (No. ZR2019MF050, ZR2019BF042), National Natural Science Foundation of China (No. 61901240).

Author information

Authors and Affiliations

College of Computer Science and Technology, Qingdao University, Qingdao, People’s Republic of China, 266071
Lei Chen, Guodong Wang & Guojia Hou

Authors

Lei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guodong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guojia Hou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guodong Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L., Wang, G. & Hou, G. Multi-scale and multi-column convolutional neural network for crowd density estimation. Multimed Tools Appl 80, 6661–6674 (2021). https://doi.org/10.1007/s11042-020-10002-8

Download citation

Received: 07 January 2020
Revised: 21 August 2020
Accepted: 29 September 2020
Published: 20 October 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11042-020-10002-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale and multi-column convolutional neural network for crowd density estimation

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

A review of object detection based on deep learning

Transfer learning for image classification using VGG19: Caltech-101 image data set

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-scale and multi-column convolutional neural network for crowd density estimation

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

A review of object detection based on deep learning

Transfer learning for image classification using VGG19: Caltech-101 image data set

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation