Abstract
Crowd congestion-level analysis (CCA) is one of the most important tasks of crowd analysis and helps to control crowd disasters. The existing state-of-the-art approaches either utilize spatial features or spatial–temporal texture features to implement the CCA. The state-of-the-art deep-learning approaches utilize a single column convolution neural network (CNN) to extract deep spatial features to solve the objective function and perform better than traditional approaches. But still, the performance is needed to be improved as these models can not capture features invariant to perspective change. The proposed work is mainly based on two intuitions. First, both deep spatial and temporal features are required to improve the performance of the model. Second, a multi-column CNN with different kernel size is capable of capturing features invariant to perspective and scene change. Based on these intuitions, we proposed a two-input stream multi-column multi-stage CNN with parallel end to end training to solve the CCA. Each stream extracts spatial and temporal features from the scene, followed by a fusion layer to enhance the discrimination power of the model. We demonstrated experiments by using publicly available datasets such as PETS-2009, UCSD, UMN. We manually annotated 22 K frames into one of five crowd congestion levels such as Very Low, Low, Medium, High, and Very High. The proposed model achieves accuracies of 96.97%, 97.21%, 98.52%, 98.55%, 97.01% on PETS-2009, UCSD-Ped1, UCSD-Ped2, UMN-Plaza1 and UMN-Plaza2, respectively. The model processes nearly 30 test frames per second and hence applicable in real-time applications. The proposed model outperforms some of the existing state-of-the-art techniques.
Similar content being viewed by others
References
Jiang, X., Xiao, Z., Zhang, B., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 6126–6135 (2019).https://doi.org/10.1109/CVPR.2019.00629
Chen, X.H., Lai, J.H.: Detecting abnormal crowd behaviors based on the div-curl characteristics of flow fields. Pattern Recognit. 88, 342–355 (2019). https://doi.org/10.1016/j.patcog.2018.11.023
Wei, X., Du, J., Xue, Z., et al.: A very deep two-stream network for crowd type recognition. Neurocomputing (2019). https://doi.org/10.1016/j.neucom.2018.10.106
Vahora, S.A., Chauhan, N.C.: Deep neural network model for group activity recognition using contextual relationship. Eng. Sci. Technol. Int. J. 22, 47–54 (2019). https://doi.org/10.1016/j.jestch.2018.08.010
Jing, S., Chen, C.L., Kai Kang, X.W.: Slicing convolutional neural network for crowd video understanding. In: Proc IEEE Conf Comput Vis Pattern Recognition 5620–5628 (2016)
Xiong, G., Cheng, J., Wu, X., et al.: An energy model approach to people counting for abnormal crowd behavior detection. Neurocomputing 83, 121–135 (2012). https://doi.org/10.1016/j.neucom.2011.12.007
Lazaridis, L., Dimou, A., Daras, P.: Abnormal behavior detection in crowded scenes using density heatmaps and optical flow. In: Eur Signal Process Conf 2018-September, pp 2060–2064. (2018) https://doi.org/10.23919/EUSIPCO.2018.8553620
Huang, L., Chen, T., Wang, Y., Yuan, H.: Congestion detection of pedestrians using the velocity entropy: A case study of Love Parade 2010 disaster. Phys. A Stat. Mech. Appl. 440, 200–209 (2015). https://doi.org/10.1016/j.physa.2015.08.013
Polus, A., Schofer, J.L., Ushpiz, A.: Pedestrian flow and level of service. J Transp Eng 109, 46–56 (1983). https://doi.org/10.1061/(ASCE)0733-947X(1983)109:1(46)
Fu, M., Xu, P., Li, X., et al.: Fast crowd density estimation with convolutional neural networks. Eng. Appl. Artif. Intell. 43, 81–88 (2015). https://doi.org/10.1016/j.engappai.2015.04.006
Marana, A.N., Velastin, S.A., Costa, L.F., Lotufo, R.A.: Automatic estimation of crowd density using texture. Saf. Sci. 28, 165–175 (1998). https://doi.org/10.1016/S0925-7535(97)00081-7
Marana, A.N., da Costa, L.F., Lotufo, R.A., Velastin, S.A.: Estimating crowd density with Minkowski fractal dimension. ICASSP IEEE Int. Conf. Acoust. Speech Signal Process Proc 6, 3521–3524 (1999). https://doi.org/10.1109/icassp.1999.757602
Rahmalan, H., Nixon, M.S., Carter, J.N.: On crowd density estimation for surveillance. 540–545. (2008) https://doi.org/10.1049/ic:20060360
Marana, A.N., Cavenaghi, M.A., Ulson, R.S., Drumond, F.L.: Real-Time Crowd Density Estimation Using Images. Springer, Berlin (2005)
Su, H., Yang, H., Zheng, S.: The large-scale crowd density estimation based on effective region feature extraction method. In: Kimmel, R., Klette, R., Sugimoto, A. (eds) Computer Vision – ACCV 2010. ACCV 2010. Lecture Notes in Computer Science, vol 6494. Springer, Berlin, Heidelberg, pp 302–313 (2011). https://doi.org/10.1007/978-3-642-19318-7_24
Ma, W., Huang, L., Liu, C.: Crowd density analysis using co-occurrence texture features. In: Proceeding-5th Int Conf Comput Sci Converg Inf Technol ICCIT 2010 170–175. (2010) https://doi.org/10.1109/ICCIT.2010.5711051
Wang, Z., Liu, H., Qian, Y., Xu, T.: Crowd density estimation based on local binary pattern co-occurrence matrix. In: Proc 2012 IEEE Int Conf Multimed Expo Work ICMEW 2012 372–377. (2012) https://doi.org/10.1109/ICMEW.2012.71
Fradi, H., Dugelay, J.L.: A new multiclass SVM algorithm and its application to crowd density analysis using LBP features. in: 2013 IEEE Int Conf Image Process ICIP 2013-Proc 4554–4558. (2013) https://doi.org/10.1109/ICIP.2013.6738938
Alanazi, A.A., Bilal, M., Engineering, S.: Crowd Density Estimation Using Novel Feature Descriptor. (2019) arXiv:190505891
Kim, G.: Estimation of Crowd Density in Public Areas Based on Neural Network. KSII Trans Internet Inf Syst 6, 2170–2190 (2012). https://doi.org/10.3837/tiis.2012.09.011
Yang, H., Su, H., Zheng, S., et al.: The large-scale crowd density estimation based on sparse spatiotemporal local binary pattern. Proc IEEE Int. Conf. Multimed. Expo. (2011). https://doi.org/10.1109/ICME.2011.6012156
Pu, S., Song, T., Zhang, Y., Xie, D.: Estimation of crowd density in surveillance scenes based on deep convolutional neural network. Proc. Comput. Sci. 111, 154–159 (2017). https://doi.org/10.1016/j.procs.2017.06.022
Zhang, Y., Zhou, D., Chen, S., et al.: Single-image crowd counting via multi-column convolutional neural network. Proc IEEE Conf. Comput. Vis Pattern Recognit. (2016). https://doi.org/10.1002/slct.201701956
PETS 2009 Benchmark Data.: http://www.cvg.reading.ac.uk/PETS2009/a.html#s1l1. Accessed 31 Jul 2019
UCSD Anomaly Detection Dataset.: http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm. Accessed 24 May 2019
Monitoring Human Activity.: http://mha.cs.umn.edu/proj_recognition.shtml#crowd_count. Accessed 27 May 2019
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22, 761–767 (2004). https://doi.org/10.1016/j.imavis.2004.02.006
Lamba, S., Nain, N.: A large scale crowd density classification using spatio-temporal local binary pattern. In: Proc-13th Int Conf Signal-Image Technol Internet-Based Syst SITIS 2017 2018-January, pp 296–302. (2018) https://doi.org/10.1109/SITIS.2017.57
Mikolajczyk, K., Tuytelaars, T., Schmid, C., et al.: A comparison of affine region detectors. Int J Comput Vis 65, 43–72 (2005). https://doi.org/10.1007/s11263-005-3848-x
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. Proc IEEE Conf. Comput. Vis Pattern Recognit. (2015). https://doi.org/10.1002/jctb.4820
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd Int Conf Learn Represent ICLR 2015 - Conf Track Proc, pp 1–14 (2015)
Li, Z., Tang, J.: Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans. Multimed. 17, 1989–1999 (2015). https://doi.org/10.1109/TMM.2015.2477035
Li, Z., Tang, J., Mei, T.: Deep collaborative embedding for social image understanding. IEEE Trans. Pattern Anal. Mach. Intell. (2018). https://doi.org/10.1109/TPAMI.2018.2852750
Bruce, L.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. Proceedings DARPA image Understanding workshop (1981):121430.
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986). https://doi.org/10.1038/323533a0
Kingma, D. P., Ba, J. L.: Adam: A method for stochastic optimization. 3rd Int Conf Learn Represent ICLR 2015 - Conf Track Proc 1–15 (2015)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by C. Xu.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tripathy, S.K., Srivastava, R. A real-time two-input stream multi-column multi-stage convolution neural network (TIS-MCMS-CNN) for efficient crowd congestion-level analysis. Multimedia Systems 26, 585–605 (2020). https://doi.org/10.1007/s00530-020-00667-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-020-00667-4