Skip to main content
Log in

Multi-scale and multi-column convolutional neural network for crowd density estimation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In order to accurately identify objects of different sizes, we propose an efficient Multi-Scale and Multi-Column Convolutional Neural Network (MSMC) to estimate the crowd density. On the one hand, the ground truth is generated based on the existed label information. On the other hand, the image is fed into our model to find the relationship between the ground truth and the predicted density map. The network is composed of three components: feature extraction, feature fusion and feature regression. First, VGG16 is utilized for faster feature extraction. Second, different sizes layers from VGG16 are fused, which helps the detection of objects with different sizes. Third, we apply multi-channel convolution to further solve the issue of multi-sizes. After the fusion block, the dilated convolution is employed to strengthen the receptive field without increasing the amount of parameters. In the crowd density estimation, the combination of multiple sizes and multiple channels enhances the ability of receiving information, improves the mapping ability of the original image and the density map, and promotes the accuracy of crowd density estimation. In this paper, the test results of the ShanghaiTech Dataset and UCF_CC_50 Dataset are provided in the Experiment section, which shows that the proposed method makes an excellent performance in both accuracy and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Aich S, Stavness I (2019) Global sum pooling: a generalization trick for object counting with small datasets of large images. In Proc. IEEE Conf. CVPR, pp. 73–82

  2. Babu Sam D, Sajjan NN, Venkatesh Babu R, et al (2018) Divide and grow: capturing huge diversity in crowd images with incrementally growing cnn. In Proc. IEEE Conf. CVPR, pp. 3618–3626

  3. Boominathan L, Kruthiventi SSS, Babu RV (2016) Crowdnet: a deep convolutional network for dense crowd counting. In Proc.of the 2016 ACM on Multimedia Conf., ACM, pp. 640–644

  4. Cai W, Wei Z (2020) PiiGAN: generative adversarial networks for pluralistic image Inpainting. IEEE Access 8:48451–48463

    Article  Google Scholar 

  5. Cao X, Wang Z, Zhao Y, et al (2018) Scale aggregation network for accurate and efficient crowd counting. In Proc. ECCV, pp. 734–750

  6. Chan AB, Vasconcelos N (2009) Bayesian poisson regression for crowd counting. In Proc. IEEE Conf. ICCV, pp. 545–551

  7. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  8. Cireşan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In Proc. IEEE Conf. CVPR, pp. 3642–3649

  9. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In Proc. IEEE Conf. CVPR, pp. 886–893

  10. Deb D, Ventura J (2018) An aggregated multicolumn dilated convolution network for perspective-free counting. In Proc. IEEE Conf. CVPR, pp. 195–204

  11. Dollar P, Wojek C, Schiele B et al (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761

    Article  Google Scholar 

  12. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  13. Hu S, Wang G, Wang Y, Chen C, Pan Z (2020) Accurate image super-resolution using dense connections and dimension reduction network. Multimed Tools Appl 79:1427–1443

    Article  Google Scholar 

  14. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Proc. NIPS, pp. 1097–1105

  15. LeCun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  16. Lempitsky V, Zisserman A (2010) Learning to count objects in images. In Proc. NIPS, pp. 1324–1332

  17. Li T, Chang H, Wang M, Ni B, Hong R, Yan S (2015) Crowded scene analysis: a survey. IEEE Trans on Circuits and Syst for Video Technol 25(3):367–386

    Article  Google Scholar 

  18. Li K, Ma W, Usman S et al (2020) Object detection with convolutional neural networks. Deep Learning in Computer Vision: Principles and Applications 30(31):41–62

    Article  Google Scholar 

  19. Li Y, Zhang X, Chen D (2018) Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In Proc. IEEE Conf. CVPR, pp. 1091–1100

  20. Li M, Zhang Z, Huang K, et al (2008) Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection. In Proc IEEE Conf CVPR, 1–4

  21. Lin SF, Chen JY, Chao HX (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE trans. On Syst. Man, and Cybernetics-Part A: Systems and Humans 31(6):645–654

    Article  Google Scholar 

  22. Liu N, Long Y, Zou C, et al (2019) ADCrowdNet: an Attention-injective Deformable Convolutional Network for Crowd Understanding. In Proc. IEEE Conf. CVPR, pp. 3225–3234

  23. Liu L, Ouyang W, Xiaogang W et al (2020) Deep learning for generic object detection: a survey. Int J Comput Vision 128(2):261–318

    Article  Google Scholar 

  24. Liu X, van de Weijer J, Bagdanov AD (2018) Leveraging unlabeled data for crowd counting by learning to rank. In Proc. IEEE Conf. CVPR, pp. 7661–7669

  25. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In Proc. IEEE Conf. CVPR, pp. 3431–3440

  26. Loy CC, Chen K, Gong S, et al (2013) Crowd counting and profiling: Methodology and evaluation. In Modeling, Simulation and Visual Analysis of Crowds. Springer, pp. 347–382

  27. Mahmoud H and Ali IA (2020) Deep learning in computer vision: principles and applications. CRC Press

  28. Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In Proc. ECCV, pp. 615–629

  29. Ranjan V, Le H, Hoai M (2018) Iterative crowd counting. In Proc. ECCV, pp. 270–285

  30. Revathi T and Rajalaxm TM (2020) Deep Learning for People Counting Model Soft Computing for Problem Solving, https://doi.org/10.1007/978-981-15-0035-0_43

  31. Sam DB, Babu RV (2018) Top-down feedback for crowd counting convolutional neural network. Thirty-Second AAAI Conf on AI

  32. Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In Proc. IEEE Conf. CVPR, pp. 4031–4039

  33. Shi M, Yang Z, Xu C, et al (2019) Revisiting perspective information for efficient crowd counting. In Proc. IEEE Conf. CVPR, pp. 7279–7288

  34. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv

  35. Sindagi VA, Patel VM (2017) Generating highquality crowd density maps using contextual pyramid CNNs. In Proc. IEEE Conf. CVPR, pp. 1861–1870

  36. Vedaldi A, Jia Y, Shelhamer E, et al (2014) Caffe: Convolutional architecture for fast feature embedding. In Proc.of the 22nd ACM International Conf. on Multimedia. ACM, pp. 675–678

  37. Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63(2):153–161

    Article  Google Scholar 

  38. Wang Y, Hu S, Wang G et al (2020) Multi-scale dilated convolution of convolutional neural network for crowd counting. Multimed Tools Appl 78(11):1057–1073

    Article  Google Scholar 

  39. Wang Y, Wang G, Chen C, Pan Z (2019) Multi-scale convolution of convolutional neural network for image denoising. Multimed Tools Appl 78:19945–19960

    Article  Google Scholar 

  40. Wang Z, Zou C, Cai W (2020) Small sample classification of Hyperspectral remote sensing images based on sequential joint Deeping learning model. IEEE 8:71353–71363

    Google Scholar 

  41. Wei Y, Feng J, Liang X, et al (2017) Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proc. IEEE Conf. CVPR, pp. 1568–1576

  42. Wei Y, Liang X, Chen Y, Shen X, Cheng MM, Feng J, Zhao Y, Yan S (2017) Stc: a simple to complex framework for weaklysupervised semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(11):2314–2320

    Article  Google Scholar 

  43. You H, Tian S, Yu L, Lv Y (2020) Pixel-level remote sensing image recognition based on bidirectional word vectors. IEEE Trans Geosci Remote Sens 58(2):1281–1293

    Article  Google Scholar 

  44. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In Proc. ICLR

  45. Zhang Q, Chan AB (2019) Wide-Area Crowd Counting via Ground-Plane Density Maps and Multi-View Fusion CNNs. In Proc. IEEE Conf. CVPR, pp. 8297–8306

  46. Zhang C, Kang K, Li H, Wang X, Xie R, Yang X (2016) Data-driven crowd understanding: a baseline for a large-scale crowd dataset. IEEE Trans Multimedia 18(6):1048–1061

    Article  Google Scholar 

  47. Zhang C, Li H, Wang X, et al (2015) Cross-scene crowd counting via deep convolutional neural networks. In Proc. IEEE Conf. CVPR, pp. 833–841

  48. Zhang L, Shi M, Chen Q (2018) Crowd counting via scale-adaptive convolutional neural network. In Proc. IEEE Conf. WACV, pp. 1113–1121

  49. Zhang Y, Zhou D, Chen S, et al (2016) Single-image crowd counting via multi-column convolutional neural network. In Proc. IEEE Conf. CVPR, pp. 589–597

Download references

Acknowledgements

The research work is supported by the Natural Science Foundation of Shandong Province, China (No. ZR2019MF050, ZR2019BF042), National Natural Science Foundation of China (No. 61901240).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guodong Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, L., Wang, G. & Hou, G. Multi-scale and multi-column convolutional neural network for crowd density estimation. Multimed Tools Appl 80, 6661–6674 (2021). https://doi.org/10.1007/s11042-020-10002-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10002-8

Keywords

Navigation