Abstract
Counting the people in a moving camera video or picture is difficult since the positions of the people and the camera overlap in the frame. Traditional object recognition techniques use feature matching or optical flow mechanisms, but unable to perform people estimation due Limited Viewpoints, Scale Variations, Object Appearance Variability, and Dynamic Environments issues. This research aims to enhance object counting research in a manner that differs from traditional techniques. With the aid of density map generation, this study proposed a transfer learning-based PeopleNet model that counts the correct number of people from moving camera videos. The proposed model has been developed by fine-tuning two fully connected (FC) layers at the end of the standard pre-trained VGG16 model. In this model, the top layers are frozen. There are no public or benchmark datasets available to evaluate the model’s results. Therefore, we introduced a new dataset captured by a moving camera to counter such a scarcity of resources. The PeopleNet has achieved impressive results in dense, average, and sparse crowd scenarios and has established a new standard for future people counting studies. The proposed model for counting people from moving camera videos has exhibited behavioral and favorable results. It could be useful in real-time applications.










Similar content being viewed by others
References
Fortunato S. Community detection in graphs. Phys Rep. 2010;486(3–5):75–174.
Rougier C, et al. Robust video surveillance for fall detection based on human shape deformation. IEEE Trans Circuits Syst Video Technol. 2011;21(5):611–22.
Faris H, Aljarah I, Mirjalili S. Training feedforward neural networks using multi-verse optimizer for binary classification problems. Appl Intell. 2016;45:322–32.
Kanade T, et al. A system for video surveillance and monitoring, cmu vsam final report. 1999; 135.
Sobral Andrews, Zahzah El-hadi. Matrix and tensor completion algorithms for background model initialization: a comparative evaluation. Pattern Recogn Lett. 2017;96:22–33.
Gao G, et al. Cnn-based density estimation and crowd counting: a survey. 2020; arXiv preprint arXiv:2003.12783.
Pandey A, et al. KUMBH MELA: a case study for dense crowd counting and modeling. Multimed Tools Appl. 2020;79:17837–58.
Felemban Emad A, et al. Digital revolution for Hajj crowd management: a technology survey. IEEE Access. 2020;8:208583–609.
Zhang C, et al. Dependent motion segmentation in moving camera videos: a survey. IEEE Access. 2018;6:55963–75.
Hu Y, et al. Dense crowd counting from still images with convolutional neural networks. J Vis Commun Image Represent. 2016;38:530–9.
Wang D. Unsupervised video segmentation based on watersheds and temporal tracking. IEEE Trans Circuits Syst Video Technol. 1998;8(5):539–46.
Tian C, et al. Deep spatial-temporal networks for crowd flows prediction by dilated convolutions and region-shifting attention mechanism. Appl Intell. 2020;50:3057–70.
Deng Y, Manjunath BS. Unsupervised segmentation of color-texture regions in images and video. IEEE Trans Pattern Anal Mach Intell. 2001;23(8):800–10.
Shinde GR, et al. Forecasting models for coronavirus disease (COVID-19): a survey of the state-of-the-art. SN Comput Sci. 2020;1:1–15.
Dong-Su L, et al. Automatic image segmentation for concealed object detection using the expectation-maximization algorithm. Optics Express. 2010;18(10):10659–67.
Jodoin P-M, Mignotte M, Rosenberger C. Segmentation framework based on label field fusion. IEEE Trans Image Process. 2007;16(10):2535–50.
Lee B, Hedley M. Background estimation for video surveillance. 2002;
Graszka P. Median mixture model for background-foreground segmentation in video sequences. 2014
Stauffer C, Eric W, Grimson L. Adaptive background mixture models for real-time tracking. Proceedings. 1999 IEEE computer society conference on computer vision and pattern recognition (Cat. No PR00149). Vol. 2. IEEE,1999
Oana M, et al. The detection of moving objects in video by background subtraction using Dempster-Shafer theory. Trans Electron Commun. 2015;60(1):1–9.
Marghes C, Bouwmans T, Vasiu R. Background modeling and foreground detection via a reconstructive and discriminative subspace learning approach. International conference on image processing, computer vision, and pattern recognition, IPCV. 2012;Vol. 2012
Ramirez-Quintana JA, Chacon-Murguia MI. Self-adaptive SOM-CNN neural system for dynamic object detection in normal and complex scenarios. Pattern Recogn. 2015;48(4):1137–49.
Ramírez-Quintana JA, Chacon-Murguía MI. Self-organizing retinotopic maps applied to background modeling for dynamic object segmentation in video sequences. The 2013 international joint conference on neural networks (IJCNN). IEEE, 2013
Wren Christopher R, et al. Pfinder: Real-time tracking of the human body. IEEE Trans Pattern Anal Mach Intell. 1997;19(7):780–5.
Zhang H-X, Xu De. Fusing color and gradient features for background model. 2006 8th international conference on signal processing. Vol. 2. IEEE, 2006
Wang Y, Luo Z, Jodoin P-M. Interactive deep learning method for segmenting moving objects. Pattern Recogn Lett. 2017;96:66–75.
Zeng D, Zhu M, Kuijper A. Combining background subtraction algorithms with convolutional neural network. J Electron Imaging. 2019;28(1):013011–013011.
Babaee M, Dinh DT, Rigoll G. A deep convolutional neural network for video sequence background subtraction. Pattern Recogn. 2018;76:635–49.
Xu L, et al. Temporally adaptive restricted Boltzmann machine for background modeling. Proceedings of the AAAI conference on artificial intelligence. 2015;Vol. 29. No. 1.
Shafiee Mohammad J, et al. Real-time embedded motion detection via neural response mixture modeling. J Signal Process Syst. 2018;90:931–46.
Javad Shafiee M, et al. Embedded motion detection via neural response mixture background modeling. Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2016
Guo R, Qi H. Partially-sparse restricted boltzmann machine for background modeling and subtraction. 2013 12th international conference on machine learning and applications. Vol. 1. IEEE, 2013
Xu P, et al. Dynamic background learning through deep auto-encoder networks. Proceedings of the 22nd ACM international conference on Multimedia. 2014
Braham M, Droogenbroeck M Van. Deep background subtraction with scene-specific convolutional neural networks. 2016 international conference on systems, signals and image processing (IWSSIP). IEEE, 2016
Cinelli Lucas P. Anomaly detection in surveillance videos using deep residual networks. Universidade Federal do Rio de Janeiro, Rio de Janeiro. 2017
Liu Y-B, et al. Crowd counting method based on the self-attention residual network. Appl Intell. 2021;51:427–40.
Bautista Carlo M, et al. Convolutional neural network for vehicle detection in low resolution traffic videos. 2016 IEEE region 10 symposium (TENSYMP). IEEE, 2016
Li Y, Zhang X, Chen D. Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. Proceedings of the IEEE conference on computer vision and pattern recognition. 2018
He K, et al. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016
Qu Z, Yu S, Fu M. Motion background modeling based on context-encoder. 2016 third international conference on artificial intelligence and pattern recognition (AIPR). IEEE, 2016
Zhao X, et al. Joint background reconstruction and foreground segmentation via a two-stage convolutional neural network. 2017 IEEE international conference on multimedia and expo (ICME). IEEE, 2017
Wang J, Chan Kwok L. Background subtraction based on encoder-decoder structured CNN." Pattern recognition: 5th Asian Conference, ACPR 2019, Auckland, New Zealand, November 26-29, 2019, Revised Selected Papers, Part II 5. Springer International Publishing, 2020;
Zhang Y, et al. Deep learning driven blockwise moving object detection with binary scene modeling. Neurocomputing. 2015;168:454–63.
Elgammal A, Harwood D, Davis L. Non-parametric model for background subtraction. Computer vision-ECCV 2000: 6th European conference on computer vision Dublin, Ireland, June 26-July 1, 2000 proceedings, part II 6. Springer Berlin Heidelberg, 2000
Ghosh A, Subudhi Badri N, Ghosh S. Object detection from videos captured by moving camera by fuzzy edge incorporated Markov random field and local histogram matching. IEEE Trans Circuits Syst Video Technol. 2012;22(8):1127–35.
Arteta C, et al. Interactive object counting. Computer vision-ECCV 2014: 13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part III 13. Springer International Publishing, 2014
Zhang Y, et al. Single-image crowd counting via multi-column convolutional neural network. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016
Wang Q, et al. NWPU-crowd: a large-scale benchmark for crowd counting and localization. IEEE Trans Pattern Anal Mach Intell. 2020;43(6):2141–9.
Boominathan L, Kruthiventi Srinivas SS, Babu R Venkatesh. Crowdnet: a deep convolutional network for dense crowd counting. Proceedings of the 24th ACM international conference on Multimedia. 2016
Chen K, et al. Feature mining for localised crowd counting. Bmvc. 2012; Vol. 1. No. 2.
Ding X, et al. A deeply-recursive convolutional network for crowd counting. 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018
Hossain M, et al. Crowd counting using scale-aware attention networks. 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, 2019
Cheng Z-Q, et al. Learning spatial awareness to improve crowd counting. Proceedings of the IEEE/CVF international conference on computer vision. 2019
Wang Z, et al. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process. 2004;13(4):600–12.
Muhammad LJ, et al. Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset. SN Comput Sci. 2021;2:1–13.
Cenggoro Tjeng W, Aslamiah Ayu H, Yunanto A. Feature pyramid networks for crowd counting. Procedia Comput Sci. 2019;157:175–82.
Lei L, et al. Denet: a universal network for counting crowd with varying densities and scales. IEEE Trans Multimed. 2020;23:1060–8.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We, the authors of the paper, certify that we have no affiliations with or involvement in any organization or entity with any financial interest (such as honorary; educational grants), in the materials discussed in this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tomar, A., Kumar, S. & Verma, K.K. People Counting from Moving Camera Videos through PeopleNet Framework. SN COMPUT. SCI. 5, 985 (2024). https://doi.org/10.1007/s42979-024-03298-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-03298-y