A real-time two-input stream multi-column multi-stage convolution neural network (TIS-MCMS-CNN) for efficient crowd congestion-level analysis

Tripathy, Santosh Kumar; Srivastava, Rajeev

doi:10.1007/s00530-020-00667-4

A real-time two-input stream multi-column multi-stage convolution neural network (TIS-MCMS-CNN) for efficient crowd congestion-level analysis

Regular Paper
Published: 08 July 2020

Volume 26, pages 585–605, (2020)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Santosh Kumar Tripathy¹ &
Rajeev Srivastava¹

378 Accesses
9 Citations
Explore all metrics

Abstract

Crowd congestion-level analysis (CCA) is one of the most important tasks of crowd analysis and helps to control crowd disasters. The existing state-of-the-art approaches either utilize spatial features or spatial–temporal texture features to implement the CCA. The state-of-the-art deep-learning approaches utilize a single column convolution neural network (CNN) to extract deep spatial features to solve the objective function and perform better than traditional approaches. But still, the performance is needed to be improved as these models can not capture features invariant to perspective change. The proposed work is mainly based on two intuitions. First, both deep spatial and temporal features are required to improve the performance of the model. Second, a multi-column CNN with different kernel size is capable of capturing features invariant to perspective and scene change. Based on these intuitions, we proposed a two-input stream multi-column multi-stage CNN with parallel end to end training to solve the CCA. Each stream extracts spatial and temporal features from the scene, followed by a fusion layer to enhance the discrimination power of the model. We demonstrated experiments by using publicly available datasets such as PETS-2009, UCSD, UMN. We manually annotated 22 K frames into one of five crowd congestion levels such as Very Low, Low, Medium, High, and Very High. The proposed model achieves accuracies of 96.97%, 97.21%, 98.52%, 98.55%, 97.01% on PETS-2009, UCSD-Ped1, UCSD-Ped2, UMN-Plaza1 and UMN-Plaza2, respectively. The model processes nearly 30 test frames per second and hence applicable in real-time applications. The proposed model outperforms some of the existing state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-perspective convolutional neural networks for citywide crowd flow prediction

Article 05 August 2022

Deep feature network with multi-scale fusion for highly congested crowd counting

Article 07 August 2023

Deeper multi-column dilated convolutional network for congested crowd understanding

Article 08 September 2021

References

Jiang, X., Xiao, Z., Zhang, B., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 6126–6135 (2019).https://doi.org/10.1109/CVPR.2019.00629
Chen, X.H., Lai, J.H.: Detecting abnormal crowd behaviors based on the div-curl characteristics of flow fields. Pattern Recognit. 88, 342–355 (2019). https://doi.org/10.1016/j.patcog.2018.11.023
Article Google Scholar
Wei, X., Du, J., Xue, Z., et al.: A very deep two-stream network for crowd type recognition. Neurocomputing (2019). https://doi.org/10.1016/j.neucom.2018.10.106
Article Google Scholar
Vahora, S.A., Chauhan, N.C.: Deep neural network model for group activity recognition using contextual relationship. Eng. Sci. Technol. Int. J. 22, 47–54 (2019). https://doi.org/10.1016/j.jestch.2018.08.010
Article Google Scholar
Jing, S., Chen, C.L., Kai Kang, X.W.: Slicing convolutional neural network for crowd video understanding. In: Proc IEEE Conf Comput Vis Pattern Recognition 5620–5628 (2016)
Xiong, G., Cheng, J., Wu, X., et al.: An energy model approach to people counting for abnormal crowd behavior detection. Neurocomputing 83, 121–135 (2012). https://doi.org/10.1016/j.neucom.2011.12.007
Article Google Scholar
Lazaridis, L., Dimou, A., Daras, P.: Abnormal behavior detection in crowded scenes using density heatmaps and optical flow. In: Eur Signal Process Conf 2018-September, pp 2060–2064. (2018) https://doi.org/10.23919/EUSIPCO.2018.8553620
Huang, L., Chen, T., Wang, Y., Yuan, H.: Congestion detection of pedestrians using the velocity entropy: A case study of Love Parade 2010 disaster. Phys. A Stat. Mech. Appl. 440, 200–209 (2015). https://doi.org/10.1016/j.physa.2015.08.013
Article Google Scholar
Polus, A., Schofer, J.L., Ushpiz, A.: Pedestrian flow and level of service. J Transp Eng 109, 46–56 (1983). https://doi.org/10.1061/(ASCE)0733-947X(1983)109:1(46)
Article Google Scholar
Fu, M., Xu, P., Li, X., et al.: Fast crowd density estimation with convolutional neural networks. Eng. Appl. Artif. Intell. 43, 81–88 (2015). https://doi.org/10.1016/j.engappai.2015.04.006
Article Google Scholar
Marana, A.N., Velastin, S.A., Costa, L.F., Lotufo, R.A.: Automatic estimation of crowd density using texture. Saf. Sci. 28, 165–175 (1998). https://doi.org/10.1016/S0925-7535(97)00081-7
Article Google Scholar
Marana, A.N., da Costa, L.F., Lotufo, R.A., Velastin, S.A.: Estimating crowd density with Minkowski fractal dimension. ICASSP IEEE Int. Conf. Acoust. Speech Signal Process Proc 6, 3521–3524 (1999). https://doi.org/10.1109/icassp.1999.757602
Article Google Scholar
Rahmalan, H., Nixon, M.S., Carter, J.N.: On crowd density estimation for surveillance. 540–545. (2008) https://doi.org/10.1049/ic:20060360
Marana, A.N., Cavenaghi, M.A., Ulson, R.S., Drumond, F.L.: Real-Time Crowd Density Estimation Using Images. Springer, Berlin (2005)
Book Google Scholar
Su, H., Yang, H., Zheng, S.: The large-scale crowd density estimation based on effective region feature extraction method. In: Kimmel, R., Klette, R., Sugimoto, A. (eds) Computer Vision – ACCV 2010. ACCV 2010. Lecture Notes in Computer Science, vol 6494. Springer, Berlin, Heidelberg, pp 302–313 (2011). https://doi.org/10.1007/978-3-642-19318-7_24
Ma, W., Huang, L., Liu, C.: Crowd density analysis using co-occurrence texture features. In: Proceeding-5th Int Conf Comput Sci Converg Inf Technol ICCIT 2010 170–175. (2010) https://doi.org/10.1109/ICCIT.2010.5711051
Wang, Z., Liu, H., Qian, Y., Xu, T.: Crowd density estimation based on local binary pattern co-occurrence matrix. In: Proc 2012 IEEE Int Conf Multimed Expo Work ICMEW 2012 372–377. (2012) https://doi.org/10.1109/ICMEW.2012.71
Fradi, H., Dugelay, J.L.: A new multiclass SVM algorithm and its application to crowd density analysis using LBP features. in: 2013 IEEE Int Conf Image Process ICIP 2013-Proc 4554–4558. (2013) https://doi.org/10.1109/ICIP.2013.6738938
Alanazi, A.A., Bilal, M., Engineering, S.: Crowd Density Estimation Using Novel Feature Descriptor. (2019) arXiv:190505891
Kim, G.: Estimation of Crowd Density in Public Areas Based on Neural Network. KSII Trans Internet Inf Syst 6, 2170–2190 (2012). https://doi.org/10.3837/tiis.2012.09.011
Article Google Scholar
Yang, H., Su, H., Zheng, S., et al.: The large-scale crowd density estimation based on sparse spatiotemporal local binary pattern. Proc IEEE Int. Conf. Multimed. Expo. (2011). https://doi.org/10.1109/ICME.2011.6012156
Article Google Scholar
Pu, S., Song, T., Zhang, Y., Xie, D.: Estimation of crowd density in surveillance scenes based on deep convolutional neural network. Proc. Comput. Sci. 111, 154–159 (2017). https://doi.org/10.1016/j.procs.2017.06.022
Article Google Scholar
Zhang, Y., Zhou, D., Chen, S., et al.: Single-image crowd counting via multi-column convolutional neural network. Proc IEEE Conf. Comput. Vis Pattern Recognit. (2016). https://doi.org/10.1002/slct.201701956
Article Google Scholar
PETS 2009 Benchmark Data.: http://www.cvg.reading.ac.uk/PETS2009/a.html#s1l1. Accessed 31 Jul 2019
UCSD Anomaly Detection Dataset.: http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm. Accessed 24 May 2019
Monitoring Human Activity.: http://mha.cs.umn.edu/proj_recognition.shtml#crowd_count. Accessed 27 May 2019
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22, 761–767 (2004). https://doi.org/10.1016/j.imavis.2004.02.006
Article Google Scholar
Lamba, S., Nain, N.: A large scale crowd density classification using spatio-temporal local binary pattern. In: Proc-13th Int Conf Signal-Image Technol Internet-Based Syst SITIS 2017 2018-January, pp 296–302. (2018) https://doi.org/10.1109/SITIS.2017.57
Mikolajczyk, K., Tuytelaars, T., Schmid, C., et al.: A comparison of affine region detectors. Int J Comput Vis 65, 43–72 (2005). https://doi.org/10.1007/s11263-005-3848-x
Article Google Scholar
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. Proc IEEE Conf. Comput. Vis Pattern Recognit. (2015). https://doi.org/10.1002/jctb.4820
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd Int Conf Learn Represent ICLR 2015 - Conf Track Proc, pp 1–14 (2015)
Li, Z., Tang, J.: Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans. Multimed. 17, 1989–1999 (2015). https://doi.org/10.1109/TMM.2015.2477035
Article Google Scholar
Li, Z., Tang, J., Mei, T.: Deep collaborative embedding for social image understanding. IEEE Trans. Pattern Anal. Mach. Intell. (2018). https://doi.org/10.1109/TPAMI.2018.2852750
Article Google Scholar
Bruce, L.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. Proceedings DARPA image Understanding workshop (1981):121430.
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986). https://doi.org/10.1038/323533a0
Article MATH Google Scholar
Kingma, D. P., Ba, J. L.: Adam: A method for stochastic optimization. 3rd Int Conf Learn Represent ICLR 2015 - Conf Track Proc 1–15 (2015)

Download references

Author information

Authors and Affiliations

Computing and Vision Lab, Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, India
Santosh Kumar Tripathy & Rajeev Srivastava

Authors

Santosh Kumar Tripathy
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santosh Kumar Tripathy.

Additional information

Communicated by C. Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tripathy, S.K., Srivastava, R. A real-time two-input stream multi-column multi-stage convolution neural network (TIS-MCMS-CNN) for efficient crowd congestion-level analysis. Multimedia Systems 26, 585–605 (2020). https://doi.org/10.1007/s00530-020-00667-4

Download citation

Received: 02 January 2020
Accepted: 20 June 2020
Published: 08 July 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00530-020-00667-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A real-time two-input stream multi-column multi-stage convolution neural network (TIS-MCMS-CNN) for efficient crowd congestion-level analysis

Abstract

Access this article

Similar content being viewed by others

Multi-perspective convolutional neural networks for citywide crowd flow prediction

Deep feature network with multi-scale fusion for highly congested crowd counting

Deeper multi-column dilated convolutional network for congested crowd understanding

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A real-time two-input stream multi-column multi-stage convolution neural network (TIS-MCMS-CNN) for efficient crowd congestion-level analysis

Abstract

Access this article

Similar content being viewed by others

Multi-perspective convolutional neural networks for citywide crowd flow prediction

Deep feature network with multi-scale fusion for highly congested crowd counting

Deeper multi-column dilated convolutional network for congested crowd understanding

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation