Skip to main content
Log in

Effective use of convolutional neural networks and diverse deep supervision for better crowd counting

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this paper, we focus on the task of estimating crowd count and high-quality crowd density maps. Among crowd counting methods, crowd density map estimation is especially promising because it preserves spatial information which makes it useful for both counting and localization (detection and tracking). Convolutional neural networks have enabled significant progress in crowd density estimation recently, but there are still open questions regarding suitable architectures. We revisit CNNs design and point out key adaptations, enabling plain a signal column CNNs to obtain high resolution and high-quality density maps on all major dense crowd counting datasets. The regular deep supervision utilizes the general ground truth to guide intermediate predictions. Instead, we build hierarchical supervisory signals with additional multi-scale labels to consider the diversities in deep neural networks. We begin by obtaining multi-scale labels based on different Gaussian kernels. These multi-scale labels can be seen as diverse representations in the supervision and can achieve high performance for better quality crowd density map estimation. Extensive experiments demonstrate that our approach achieves the state-of-the-art performance on the ShanghaiTech, UCF_CC_50 and UCSD datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. IEEE Conf Comput Vis Pattern Recogn 1:875–885

    Google Scholar 

  2. Zhao T, Nevatia R (2003) Bayesian human segmentation in crowded situations. IEEE Conf Comput Vis Pattern Recogn 2:459–466

    Google Scholar 

  3. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE Comput Soc Conf Comput Vis Pattern Recogn 1:886–893

    Google Scholar 

  4. Hou Y-L, Pang GK (2011) People counting and human detection in a challenging situation. IEEE Trans Syst Man Cybern-Part Syst Hum 41(1):24–33. 13

    Article  Google Scholar 

  5. Ryan D, Denman S, Fookes C, Sridharan S (2009) Crowd counting using multiple local features. Digital Image Computing: Techniques and Applications(DICTA), pp 81–88

  6. Chan AB, Liang Z-SJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: Counting people without people models or tracking. The IEEE conference on computer vision and pattern recognition(CVPR), pp 1–7

  7. Marana A, daFontoura.Costa L, Lotufo R, Velastin S (1999) Estimating crowd density with Minkowski fractal dimension. Proc IEEE Int Conf Acoust Speech Signal Process 6:3521–3524

    Google Scholar 

  8. Davies AC, Yin JH, Velastin S (1995) Crowd monitoring using image processing. Electron Commun Eng J 7(1):37–47

    Article  Google Scholar 

  9. Paragios N, Ramesh V (2001) A MRF-based approach for real-time subway monitoring. IEEE Comput Soc Conf Comput Vis Pattern Recogn(CVPR) 1:I–1034

    Google Scholar 

  10. Rahmalan H, Nixon MS, Carter JN (2006) On crowd density estimation for surveillance. The Institution of Engineering and Technology Conferenceon Crime and Security, pp 540–545

  11. Kong D, Gray D, Tao H (2005) Counting pedestrians in crowds using view point invariant training. In: Proceedings of British Machine Vision Conference(BMVC)

  12. Lempitsky V, Zisserman A (2010) Learning to count objects in images. In: Advances in Neural Information Processing Systems, pp 1324–1332

  13. Fiaschi L, Nair R, Koethe U, Hamprecht FA (2012) Learning to count with regression forest and structured labels. In: ICPR, pp 2685–2688

  14. Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE International Conference on Computer Vision(CVPR), pp 3253–3261

  15. Wang Y, Zou Y (2016) Fast visual object counting via example-based density estimation. In: IEEE international conference on image processing (ICIP), pp 3653–3657. https://doi.org/10.1109/ICIP.2016.7533041

  16. Wang C, Zhang H, Yang L, Liu S, Cao X (2015) Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on Multimedia, pp 1299–1302

  17. Fu M, Xu P, Li X, Liu Q, Ye M, Zhu C (2015) Fast crowd density estimation with convolutional neural networks. Eng Appl Artif Intell 43:81–88

    Article  Google Scholar 

  18. Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. IEEE conference on computer vision and pattern recognition(CVPR)

  19. Shang C, Ai H, Bai B (2016) End-to-end crowd counting via joint learning local and global count. 2016 IEEE International Conference on Image Processing(ICIP), pp 1215–1219

  20. Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern recognition(CVPR)

  21. Onoro-Rubio D, Lopez-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: European Conference on Computer Vision(ECCV), pp 615–629

  22. Walach E, Wolf L (2016) Learning to count with cnn boosting. In: European Conference on Computer Vision(ECCV), pp 660–676

  23. Hu P, Ramanan D (2016) Finding Tiny Faces. arXiv:1612.04402

  24. Yu F, Koltun V (2016) Multi-Scale Context aggregation by dilated convolutions. ICLR

  25. Badrinarayanan V, Handa A, Cipolla R (2017) SegNet: A deep convolutional encoder-decoder architecture for robust semantic pixelwise labelling. IEEE Trans Pattern Anal Mach Intell 39:2481–2495

    Article  Google Scholar 

  26. Long J, Shelhamer E, Darrell T (2015) Fully Convolutional Networks for Semantic Segmentation. In: the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3431–3440

  27. Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention(MICCAI), pp 234–241

  28. Lee C, Xie S, Gallagher P, Zhang Z, Tu Z (2015) Deeply supervised nets. In: AISTATS

  29. Kong D, Gray D, Tao H (2006) A Viewpoint Invariant Approach for Crowd Counting. In: The 18th International Conference on Pattern Recognition(ICPR), pp 1187–1190

  30. Chan AB, Morrow M, Vasconcelos N (2009) Analysis of crowded scenes using holistic properties, in Performance Evaluation of Tracking and Surveillance Workshop at CVPR, pp 31–37

  31. Shimosaka M, Masuda S, Fukui R, Moriand T, Sato T (2011) Counting pedestrians in crowded scenes with efficient sparse learning. In: First Asian Conference on Pattern Recognition (ACPR), pp. 27-31

  32. Khan U, Klette R (2016) Logarithmically improved property regression for crowd counting. Pacific-Rim Symposium on Image and Video Technology:Image and Video Technology, pp 123–135

  33. Chan AB, Vasconcelos N (2009) Bayesian poisson regression for crowd counting. In: 2009 IEEE 12th International Conference on Computer Vision, pp 545–551

  34. Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. Inproceedings British Machine Vision Conference, pp 21.1–21.11

  35. Marana A, Costa LdF, Lotufo R, Velastin S (1998) On the Efficacy of Texture Analysis for Crowd Monitoring. In: 1998. Proceedings. SIBGRAPI’98. International Symposium on Computer Graphics, Image Processing, and Vision, pp 354–361

  36. Fradi H, Dugelay JL (2012) People counting system in crowded scenes based on feature regression. In: Proceedings of European Signal Processing Conference, pp 27–31

  37. Kumagai S, Hotta K, Kurita T (2017) Mixture of counting cnns: Adaptive integration of cnns specialized to specific appearance for crowd counting. arXiv:1703.09393

  38. Marsden M, McGuiness K, Little S, E.O’Connor N (2016) Fully convolutional crowd counting on highly congested scenes. arXiv:1612.00220

  39. Sheng B, Shen C, Lin G, Li J, Yang W, Sun C (2016) Crowd counting via weighted VLAD on dense attribute feature maps. IEEE Transactions on Circuits and Systems for Video Technology

  40. Di K, Ma Z, Chan AB (2017) Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks-Counting, Detection, and Tracking. preprint arXiv:1705.10118

  41. Arteta C, Lempitsky V, Zisserman A (2016) Counting in the wild. In: European Conference on Computer Vision. Springer, pp 483–498

  42. Zhao Z, Li H, Zhao R, Wang X (2016) Crossing-line crowd counting with two-phase deep neural networks. In: European Conference on Computer Vision. Springer, pp 712C726

  43. Sindagi VA, Patel VM (2017) Cnn-based cascaded multitask learning of high-level prior and density estimation for crowd counting. IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

  44. Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. IEEE conference on computer vision and pattern recognition(CVPR), pp 833–841

  45. Sindagi VA, Patel VM (2017) Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs. IEEE International Conference on Computer Vision (ICCV)

  46. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the Inception Architecture for Computer Vision. IEEE conference on computer vision and pattern recognition(CVPR)

  47. Boominathan L, Kruthiventi SS, Babu RV (2016) Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 2016 ACM on Multimedia Conference, ACM, pp 640–644

  48. Simonyan K, Zisserman A Very deep convolutional networks for large-scale image recognition. In: ICLR, 2015

  49. Girshick R (2015) Fast R-CNN. In: IEEE ICCV, pp 1440–1448

  50. Yang J, Price B, Cohen S, Lee H, Yang M-H (2016) Object contour detection with a fully convolutional encoder-decoder network. arXiv:1603.04530

  51. Shi M, Caesar H, Ferrari V (2018) Crowd counting via scale-adaptive convolutional neural network. IEEE Winter Conference on Applications of Computer Vision (WACV)

  52. Dubrovina A, Kisilev P, Ginsburg B, Hashoul S, Kimmel R (2016) Computational mammography using deep neural networks. In: Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, pp 1–5

  53. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2016) DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Preprint: arXiv:1606.00915

  54. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. (2014) Caffe: Convolutional architecture for fast feature embedding. In: ACM MM, pp 675–678

  55. Idrees H, Saleemi I, Seibert C, Shah M (2013) Multisource multi-scale counting in extremely dense crowd images. IEEE conference on computer vision and pattern recognition (CVPR), pp 2547–2554

  56. Casella G, Berger R (1990) Statistical inference, 2nd edn. Duxbury Press, p 686

  57. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: From error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haiying Jiang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Supplementary Material

Appendix: Supplementary Material

This section presents some additional results of MPC for the three datasets (Shanghai Tech [18], UCF_CC_50 dataset [55] and UCSD dataset [6].The PSNR (Peak Signal-to-Noise Ratio) and the SSIM (Structural Similarity in Image) perform to evaluate quality of generated density maps. Results on sample images from these datasets are shown in Figs. 678 and 9, which represent a variety of density levels.

Fig. 6
figure 6

Results of our MPC medel on Shanghai Tech Part A dataset [18]. Left column: Input images. Middle column: Ground truth density maps. Right column: Estimated density maps

Fig. 7
figure 7

Results of our MPC medel on Shanghai Tech Part B dataset [18]. Left column: Input images. Middle column: Ground truth density maps. Right column: Estimated density maps

Fig. 8
figure 8

Results of our MPC medel on UCF_CC_50 dataset [55]. Left column: Input images. Middle column: Ground truth density maps. Right column: Estimated density maps

Fig. 9
figure 9

Results of the our MPC model on UCSD dataset [6]. Left column: Input images. Middle column: Ground truth density maps. Right column: Estimated density maps

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, H., Jin, W. Effective use of convolutional neural networks and diverse deep supervision for better crowd counting. Appl Intell 49, 2415–2433 (2019). https://doi.org/10.1007/s10489-018-1394-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1394-9

Keywords

Navigation