Skip to main content
Log in

A hybrid model of convolutional neural networks and deep regression forests for crowd counting

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Real-time monitoring variation of crowd via video surveillance plays a significant role in the new generation of technology in a smart city. We propose a crowd counting algorithm based on deep regression forest, named CountForest. First of all, according to the correlation among frames, the crowd counting problem is transformed into a label-distribution-learning problem. Then we combine convolutional neural networks(CNN) and deep regression forest to make a hybrid model. CNN is introduced for the task of feature learning and deep decision forest is extended to address label distribution learning problem in crowd counting. Thereinto, the proposed network replaces its softmax layer with the aforementioned probabilistic decision forest in order to better establish a mapping relationship between image features and crowds’ number so as to implement an end-to-end hybrid model for crowd counting problem. Our method demonstrated in the final experiments not only attains the high accuracy in crowd counting but has comparable robustness and instantaneity in selected public datasets as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Cao L, Zhang X, Ren W, Huang K (2015) Large scale crowd analysis based on convolutional neural network. Pattern Recogn 48(10):3016–3024

    Article  Google Scholar 

  2. Chan AB, Liang ZSJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: Counting people without people models or tracking. In: Computer vision and pattern recognition, 2008. CVPR 2008. IEEE Conference on, pp 1-7

  3. Change Loy C, Gong S, Xiang T (2013) From semi-supervised to transfer counting of crowds. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2256–2263

  4. Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localized crowd counting. In: British Machine Vision Conference, pp 1–11

  5. Chen K, Gong S, Xiang T, Chen CL (2013) Cumulative attribute space for age and crowd density estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2467–2474

  6. Davies AC, Yin JH, Velastin SA (1995) Crowd monitoring using image processing. Electronics and Communication Engineering Journal 7(1):37–47

    Article  Google Scholar 

  7. Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761

    Article  Google Scholar 

  8. Foroughi H, Ray N, Zhang H (2015) Robust people counting using sparse representation and random projection. Pattern Recogn 48(10):3038–3052

    Article  Google Scholar 

  9. Forsyth D (2014) Object detection with discriminatively trained part-based models. Computer 47(2):6–7

    Article  MathSciNet  Google Scholar 

  10. Fu M, Xu P, Li X, Liu Q, Ye M, Zhu C (2015) Fast crowd density estimation with convolutional neural networks. Eng Appl Artif Intell 43:81–88

    Article  Google Scholar 

  11. Gavrila DM (2007) A bayesian, exemplar-based approach to hierarchical shape matching. IEEE Trans Pattern Anal Mach Intell 29(8):1408–21

    Article  Google Scholar 

  12. Gavrila DM, Philomin V (1999) Real-time object detection for “smart” vehicles. IEEE Intconfcomputvis 57(2):87–93. vol. 1

    Google Scholar 

  13. Geng X, Ji R (2013) Label distribution learning. In: IEEE International Conference on Data Mining Workshops, pp 377–383

  14. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  15. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. J Mach Learn Res - Proceedings Track 9:249–256

    Google Scholar 

  16. Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Computer Vision and Pattern Recognition, pp 2547–2554

  17. Jiang H, Jin W (2019) Effective use of convolutional neural networks and diverse deep supervision for better crowd counting. Appl Intell 49:1–19

    Article  Google Scholar 

  18. Kang D, Ma Z, Chan AB (2018) Beyond counting: Comparisons of density maps for crowd analysis tasks - counting, detection, and tracking. IEEE Trans Circ Syst Vid Tech 29(5):1408–1422

    Article  Google Scholar 

  19. Kumagai S, Hotta K, Kurita T (2017) Mixture of counting cnns: Adaptive integration of cnns specialized to specific appearance for crowd counting. arXiv:1703.09393

  20. Li M, Huang K, Tan T (2009) Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In: International Conference on Pattern Recognition, pp 1–4

  21. Liu M, Jiang J, Guo Z, Wang Z, Liu Y (2018) Crowd counting with fully convolutional neural network. In: 2018 25Th IEEE international conference on image processing (ICIP), IEEE, pp 953–957

  22. Marana A, Costa LD, Lotufo R, Velastin S (1998) On the efficacy of texture analysis for crowd monitoring. In: Proc International Symposium on Computer Graphics, Image Processing, pp 354–361

  23. Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: European Conference on Computer Vision, Springer, pp 615–629

  24. Papageorgiou C, Poggio T (2000) A trainable system for object detection. Int J Comput Vis 38(1):15–33

    Article  MATH  Google Scholar 

  25. Paragios N, Ramesh V (2003) A mrf-based approach for real-time subway monitoring. In: Computer Vision and Pattern Recognition, 2001. CVPR 2001., pp I–1034–I–1040 vol.1

  26. Pham VQ, Kozakaya T, Yamaguchi O, Okada R (2015) Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: IEEE International Conference on Computer Vision, pp 3253–3261

  27. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  28. Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, PP 4031-4039

  29. Shang C, Ai H, Bai B (2016) End-to-end crowd counting via joint learning local and global count. In: IEEE International Conference on Image Processing, pp 1215–1219

  30. Shen W, Zhao K, Guo Y, Yuille AL (2017) Label distribution learning forests. In: Advances in Neural Information Processing Systems, pp 834–843

  31. Sheng B, Shen C, Lin G, Li J, Yang W, Sun C (2016) Crowd counting via weighted vlad on dense attribute feature map. IEEE Trans Circ Syst Vid Tech 28(8):1788–1797

    Article  Google Scholar 

  32. Sindagi VA, Patel VM (2018) A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recogn Lett 107:3–16

    Article  Google Scholar 

  33. Song J, Guo Y, Gao L, Li X, Hanjalic A, Shen HT (2018) From deterministic to generative: Multimodal stochastic rnns for video captioning. IEEE Trans Neural Netw Learn Syst 30(10):3047–3058

    Article  Google Scholar 

  34. Tan B, Zhang J, Wang L (2011) Semi-supervised elastic net for pedestrian counting. Pattern Recogn 44(10-11):2297–2304

    Article  Google Scholar 

  35. Viola P, Jones M (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154

    Article  Google Scholar 

  36. Walach E, Wolf L (2016) Learning to count with cnn boosting. In: European Conference on Computer Vision, pp 660–676

  37. Wang C, Zhang H, Yang L, Liu S, Cao X (2015) Deep people counting in extremely dense crowds. In: ACM International Conference on Multimedia, pp 1299–1302

  38. Wang X, Gao L, Song J, Shen H (2016) Beyond frame-level cnn: saliency-aware 3-d cnn with lstm for video action recognition. IEEE Signal Process Lett 24(4):510–514

    Article  Google Scholar 

  39. Wang X, Gao L, Wang P, Sun X, Liu X (2017) Two-stream 3-d convnet fusion for action recognition in videos with arbitrary size and length. IEEE T MULTIMEDIA 20(3):634–644

    Article  Google Scholar 

  40. Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In: Tenth IEEE International Conference on Computer Vision, pp 90–97

  41. Xu B, Qiu G (2016) Crowd density estimation based on rich features and random projection forest. In: IEEE Winter Conference on Applications of Computer Vision, pp 1–8

  42. Zeng L, Xu X, Cai B, Qiu S, Zhang T (2017) Multi-scale convolutional neural networks for crowd counting. In: IEEE International conference on image processing (ICIP), IEEE, pp 465–469

  43. Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 833–841

  44. Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Computer Vision and Pattern Recognition, pp 589–597

  45. Zhang Z, Wang M, Geng X (2015) Crowd counting in public video surveillance by label distribution learning. Neurocomputing 166:151–163

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Natural Science Foundation of Guangdong Province, China (No.2016A030313288).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingge Ji.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ji, Q., Zhu, T. & Bao, D. A hybrid model of convolutional neural networks and deep regression forests for crowd counting. Appl Intell 50, 2818–2832 (2020). https://doi.org/10.1007/s10489-020-01688-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01688-2

Keywords

Navigation