Skip to main content
Log in

A real-time two-input stream multi-column multi-stage convolution neural network (TIS-MCMS-CNN) for efficient crowd congestion-level analysis

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Crowd congestion-level analysis (CCA) is one of the most important tasks of crowd analysis and helps to control crowd disasters. The existing state-of-the-art approaches either utilize spatial features or spatial–temporal texture features to implement the CCA. The state-of-the-art deep-learning approaches utilize a single column convolution neural network (CNN) to extract deep spatial features to solve the objective function and perform better than traditional approaches. But still, the performance is needed to be improved as these models can not capture features invariant to perspective change. The proposed work is mainly based on two intuitions. First, both deep spatial and temporal features are required to improve the performance of the model. Second, a multi-column CNN with different kernel size is capable of capturing features invariant to perspective and scene change. Based on these intuitions, we proposed a two-input stream multi-column multi-stage CNN with parallel end to end training to solve the CCA. Each stream extracts spatial and temporal features from the scene, followed by a fusion layer to enhance the discrimination power of the model. We demonstrated experiments by using publicly available datasets such as PETS-2009, UCSD, UMN. We manually annotated 22 K frames into one of five crowd congestion levels such as Very Low, Low, Medium, High, and Very High. The proposed model achieves accuracies of 96.97%, 97.21%, 98.52%, 98.55%, 97.01% on PETS-2009, UCSD-Ped1, UCSD-Ped2, UMN-Plaza1 and UMN-Plaza2, respectively. The model processes nearly 30 test frames per second and hence applicable in real-time applications. The proposed model outperforms some of the existing state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Jiang, X., Xiao, Z., Zhang, B., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 6126–6135 (2019).https://doi.org/10.1109/CVPR.2019.00629

  2. Chen, X.H., Lai, J.H.: Detecting abnormal crowd behaviors based on the div-curl characteristics of flow fields. Pattern Recognit. 88, 342–355 (2019). https://doi.org/10.1016/j.patcog.2018.11.023

    Article  Google Scholar 

  3. Wei, X., Du, J., Xue, Z., et al.: A very deep two-stream network for crowd type recognition. Neurocomputing (2019). https://doi.org/10.1016/j.neucom.2018.10.106

    Article  Google Scholar 

  4. Vahora, S.A., Chauhan, N.C.: Deep neural network model for group activity recognition using contextual relationship. Eng. Sci. Technol. Int. J. 22, 47–54 (2019). https://doi.org/10.1016/j.jestch.2018.08.010

    Article  Google Scholar 

  5. Jing, S., Chen, C.L., Kai Kang, X.W.: Slicing convolutional neural network for crowd video understanding. In: Proc IEEE Conf Comput Vis Pattern Recognition 5620–5628 (2016)

  6. Xiong, G., Cheng, J., Wu, X., et al.: An energy model approach to people counting for abnormal crowd behavior detection. Neurocomputing 83, 121–135 (2012). https://doi.org/10.1016/j.neucom.2011.12.007

    Article  Google Scholar 

  7. Lazaridis, L., Dimou, A., Daras, P.: Abnormal behavior detection in crowded scenes using density heatmaps and optical flow. In: Eur Signal Process Conf 2018-September, pp 2060–2064. (2018) https://doi.org/10.23919/EUSIPCO.2018.8553620

  8. Huang, L., Chen, T., Wang, Y., Yuan, H.: Congestion detection of pedestrians using the velocity entropy: A case study of Love Parade 2010 disaster. Phys. A Stat. Mech. Appl. 440, 200–209 (2015). https://doi.org/10.1016/j.physa.2015.08.013

    Article  Google Scholar 

  9. Polus, A., Schofer, J.L., Ushpiz, A.: Pedestrian flow and level of service. J Transp Eng 109, 46–56 (1983). https://doi.org/10.1061/(ASCE)0733-947X(1983)109:1(46)

    Article  Google Scholar 

  10. Fu, M., Xu, P., Li, X., et al.: Fast crowd density estimation with convolutional neural networks. Eng. Appl. Artif. Intell. 43, 81–88 (2015). https://doi.org/10.1016/j.engappai.2015.04.006

    Article  Google Scholar 

  11. Marana, A.N., Velastin, S.A., Costa, L.F., Lotufo, R.A.: Automatic estimation of crowd density using texture. Saf. Sci. 28, 165–175 (1998). https://doi.org/10.1016/S0925-7535(97)00081-7

    Article  Google Scholar 

  12. Marana, A.N., da Costa, L.F., Lotufo, R.A., Velastin, S.A.: Estimating crowd density with Minkowski fractal dimension. ICASSP IEEE Int. Conf. Acoust. Speech Signal Process Proc 6, 3521–3524 (1999). https://doi.org/10.1109/icassp.1999.757602

    Article  Google Scholar 

  13. Rahmalan, H., Nixon, M.S., Carter, J.N.: On crowd density estimation for surveillance. 540–545. (2008) https://doi.org/10.1049/ic:20060360

  14. Marana, A.N., Cavenaghi, M.A., Ulson, R.S., Drumond, F.L.: Real-Time Crowd Density Estimation Using Images. Springer, Berlin (2005)

    Book  Google Scholar 

  15. Su, H., Yang, H., Zheng, S.: The large-scale crowd density estimation based on effective region feature extraction method. In: Kimmel, R., Klette, R., Sugimoto, A. (eds) Computer Vision – ACCV 2010. ACCV 2010. Lecture Notes in Computer Science, vol 6494. Springer, Berlin, Heidelberg, pp 302–313 (2011). https://doi.org/10.1007/978-3-642-19318-7_24

  16. Ma, W., Huang, L., Liu, C.: Crowd density analysis using co-occurrence texture features. In: Proceeding-5th Int Conf Comput Sci Converg Inf Technol ICCIT 2010 170–175. (2010) https://doi.org/10.1109/ICCIT.2010.5711051

  17. Wang, Z., Liu, H., Qian, Y., Xu, T.: Crowd density estimation based on local binary pattern co-occurrence matrix. In: Proc 2012 IEEE Int Conf Multimed Expo Work ICMEW 2012 372–377. (2012) https://doi.org/10.1109/ICMEW.2012.71

  18. Fradi, H., Dugelay, J.L.: A new multiclass SVM algorithm and its application to crowd density analysis using LBP features. in: 2013 IEEE Int Conf Image Process ICIP 2013-Proc 4554–4558. (2013) https://doi.org/10.1109/ICIP.2013.6738938

  19. Alanazi, A.A., Bilal, M., Engineering, S.: Crowd Density Estimation Using Novel Feature Descriptor. (2019) arXiv:190505891

  20. Kim, G.: Estimation of Crowd Density in Public Areas Based on Neural Network. KSII Trans Internet Inf Syst 6, 2170–2190 (2012). https://doi.org/10.3837/tiis.2012.09.011

    Article  Google Scholar 

  21. Yang, H., Su, H., Zheng, S., et al.: The large-scale crowd density estimation based on sparse spatiotemporal local binary pattern. Proc IEEE Int. Conf. Multimed. Expo. (2011). https://doi.org/10.1109/ICME.2011.6012156

    Article  Google Scholar 

  22. Pu, S., Song, T., Zhang, Y., Xie, D.: Estimation of crowd density in surveillance scenes based on deep convolutional neural network. Proc. Comput. Sci. 111, 154–159 (2017). https://doi.org/10.1016/j.procs.2017.06.022

    Article  Google Scholar 

  23. Zhang, Y., Zhou, D., Chen, S., et al.: Single-image crowd counting via multi-column convolutional neural network. Proc IEEE Conf. Comput. Vis Pattern Recognit. (2016). https://doi.org/10.1002/slct.201701956

    Article  Google Scholar 

  24. PETS 2009 Benchmark Data.: http://www.cvg.reading.ac.uk/PETS2009/a.html#s1l1. Accessed 31 Jul 2019

  25. UCSD Anomaly Detection Dataset.: http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm. Accessed 24 May 2019

  26. Monitoring Human Activity.: http://mha.cs.umn.edu/proj_recognition.shtml#crowd_count. Accessed 27 May 2019

  27. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22, 761–767 (2004). https://doi.org/10.1016/j.imavis.2004.02.006

    Article  Google Scholar 

  28. Lamba, S., Nain, N.: A large scale crowd density classification using spatio-temporal local binary pattern. In: Proc-13th Int Conf Signal-Image Technol Internet-Based Syst SITIS 2017 2018-January, pp 296–302. (2018) https://doi.org/10.1109/SITIS.2017.57

  29. Mikolajczyk, K., Tuytelaars, T., Schmid, C., et al.: A comparison of affine region detectors. Int J Comput Vis 65, 43–72 (2005). https://doi.org/10.1007/s11263-005-3848-x

    Article  Google Scholar 

  30. Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. Proc IEEE Conf. Comput. Vis Pattern Recognit. (2015). https://doi.org/10.1002/jctb.4820

    Article  Google Scholar 

  31. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd Int Conf Learn Represent ICLR 2015 - Conf Track Proc, pp 1–14 (2015)

  32. Li, Z., Tang, J.: Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans. Multimed. 17, 1989–1999 (2015). https://doi.org/10.1109/TMM.2015.2477035

    Article  Google Scholar 

  33. Li, Z., Tang, J., Mei, T.: Deep collaborative embedding for social image understanding. IEEE Trans. Pattern Anal. Mach. Intell. (2018). https://doi.org/10.1109/TPAMI.2018.2852750

    Article  Google Scholar 

  34. Bruce, L.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. Proceedings DARPA image Understanding workshop (1981):121430.

  35. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986). https://doi.org/10.1038/323533a0

    Article  MATH  Google Scholar 

  36. Kingma, D. P., Ba, J. L.: Adam: A method for stochastic optimization. 3rd Int Conf Learn Represent ICLR 2015 - Conf Track Proc 1–15 (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santosh Kumar Tripathy.

Additional information

Communicated by C. Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tripathy, S.K., Srivastava, R. A real-time two-input stream multi-column multi-stage convolution neural network (TIS-MCMS-CNN) for efficient crowd congestion-level analysis. Multimedia Systems 26, 585–605 (2020). https://doi.org/10.1007/s00530-020-00667-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-020-00667-4

Keywords

Navigation