Abstract
A huge number of cameras records scenes everywhere, generating enormous bulks of videos. Processing these huge masses of videos and detection of abnormal object activities demands adequate resources like time, manpower, and hardware storage, etc. To cope with the aforementioned challenges, our proposed model for an automatic video summarization of abnormal events plays an important role in providing the well-organized storage, quick browsing, and retrieval of the large collection of video data without losing important aspects due to its lightweight. In this research, abnormal object activity detection and summary generation are performed based on two stages i.e. 1) machine learning technique for key event detection, 2) deep learning algorithm to remove extra frames generating summarized video. Firstly, Silhouette images are formed, and two feature descriptors such as Zernike Moments and R-Transform are used to create a combined feature vector. The combined feature vector provides more informative features from images and makes our model lightweight keeping only relevant features. Furthermore, on the combined feature vector, K Nearest Neighbor (KNN) clustering is applied to extract keyframes sequentially. In the end, to improve the performance, Deep Learning Algorithm i.e. ALexNet is trained over preprocessed frames from the dataset. Moreover, the DL classifier aims to eliminate the non-Key Frames and generate surveillance video summaries demonstrating abnormal object activities. The efficiency of the proposed algorithm is analyzed performing an extensive experimentation attaining 99% accuracy approximately.













Similar content being viewed by others
Abbreviations
- OCR:
-
Optical Character Recognition
- SI:
-
Silhouette Image
- K-NN:
-
K Nearest Neighbour
- KARD:
-
Kinetic Activity Recognition Dataset
References
Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Disc 29(3):626–688
AlMaadeed N (2020) Face recognition and summarization for surveillance video sequences
Bansal M, Kumar M, Kumar M (2021) 2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors. Multimed Tools Appl 80(12):18839–18857
Bansal M, Kumar M, Sachdeva M, Mittal A (2021) Transfer learning for image classification using VGG19: Caltech-101 image data set. J Ambient Intell Humaniz Comput:1–12
Blank M, et al. (2005) Actions as space-time shapes. In tenth IEEE international conference on computer vision (ICCV'05) volume 1. IEEE
Dang C, Moghadam A, Radha H (2014) RPCA-KFE: key frame extraction for consumer video based robust principal component analysis. arXiv preprint arXiv:1405.1678
Dhiman C, Vishwakarma DK (2017) High dimensional abnormal human activity recognition using histogram oriented gradients and zernike moments. In 2017 IEEE international conference on computational intelligence and computing research (ICCIC). IEEE
Doulamis AD, Doulamis ND, Kollias SD (2000) A fuzzy video content representation for video summarization and content-based retrieval. Signal Process 80(6):1049–1067
Dupont C, Tobias L, Luvison B (2017) Crowd-11: A dataset for fine grained crowd behaviour analysis. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
Dürr O, Sick B (2013) Deep learning: a novel approach to classify phenotypes in high content screening. PLoS One 8:e80999
Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. J Vis Commun Image Represent 23(7):1031–1040
Elharrouss O, Almaadeed N, al-Maadeed S, Bouridane A, Beghdadi A (2021) A combined multiple action recognition and summarization for surveillance video sequences. Appl Intell 51(2):690–712
Gaglio S, Re GL, Morana M (2014) Human activity recognition process using 3-D posture data. IEEE Trans Human-Mach Syst 45(5):586–597
Gianluigi C, Raimondo S (2006) An innovative algorithm for key frame extraction in video summarization. J Real-Time Image Proc 1(1):69–88
Gygli M, et al. (2014) Creating summaries from user videos. In European conference on computer vision. 2014 (pp. 505–520). Springer, Cham
Huang H, Liu H, Zhang L (2014) Videoweb: space-time aware presentation of a videoclip collection. IEEE J Emerg Select Topics Circuits Syst 4(1):142–152
Hung M-H, Hsieh C-H (2008) Event detection of broadcast baseball videos. IEEE Trans Circuits Syst Vid Technol 18(12):1713–1726
Javed A, Bajwa KB, Malik H, Irtaza A (2016) An efficient framework for automatic highlights generation from sports videos. IEEE Signal Process Lett 23(7):954–958
Ji Z, Xiong K, Pang Y, Li X (2019) Video summarization with attention-based encoder–decoder networks. IEEE Trans Circuits Syst Vid Technol 30(6):1709–1717
Jiang J, He X, Gao M, Wang X, Wu X (2015) Human action recognition via compressive-sensing-based dimensionality reduction. Optik 126(9–10):882–887
Kamiński Ł, Maćkowiak S, Domański M (2017) Human activity recognition using standard descriptors of MPEG CDVS. In 2017 IEEE international conference on Multimedia & Expo Workshops (ICMEW). IEEE
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 25:1097–1105
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Kumar A, Kumar M, Kaur A (2021) Face detection in still images under occlusion and non-uniform illumination. Multimed Tools Appl 80(10):14565–14590
Lazaridis L, Dimou A, Daras P (2018) Abnormal behavior detection in crowded scenes using density heatmaps and optical flow. In 2018 26th European signal processing conference (EUSIPCO). IEEE
Li B, Pan H, Sezan I (2003) A general framework for sports video summarization with its application to soccer. In 2003 IEEE international conference on acoustics, speech, and signal processing, 2003. Proceedings. (ICASSP’03), vol.3, pp. III–169. IEEE
Li, C., et al. (2009) Motion-focusing key frame extraction and video summarization for lane surveillance system. In 2009 16th IEEE international conference on image processing (ICIP), pp. 4329–4332. IEEE
Lin J, Zhong S-h, Fares A (2022) Deep hierarchical LSTM networks with attention for video summarization. Comput Electr Eng 97:107618
Ma M, Mei S, Wan S, Hou J, Wang Z, Feng DD (2020) Video summarization via block sparse dictionary selection. Neurocomputing 378:197–209
Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 202–211)
Mahum R, Rehman SU, Okon OD, Alabrah A, Meraj T, Rauf HT (2021) A novel hybrid approach based on deep CNN to detect glaucoma using fundus imaging. Electronics 11(1):26
Mahum R, Rehman SU, Meraj T, Rauf HT, Irtaza A, el-Sherbeeny AM, el-Meligy MA (2021) A novel hybrid approach based on deep cnn features to detect knee osteoarthritis. Sensors 21(18):6189
Mahum R, et al. (2022) A novel framework for potato leaf disease detection using an efficient deep learning model. Human Ecol Risk Assess: An Int J, p. 1–24
Muhammad K, Hussain T, del Ser J, Palade V, de Albuquerque VHC (2019) DeepReS: a deep learning-based video summarization strategy for resource-constrained industrial surveillance scenarios. IEEE Trans Industrial Informa 16(9):5938–5947
Muhammad K, Hussain T, Baik SW (2020) Efficient CNN based summarization of surveillance videos for resource-constrained devices. Pattern Recogn Lett 130:370–375
Munir MH, et al. (2022) An automated framework for Corona virus severity detection using combination of AlexNet and faster RCNN
Murugan AS et al (2018) A study on various methods used for video summarization and moving object detection for video surveillance applications. Multimed Tools Appl 77(18):23273–23290
Napoletano P, Boccignone G, Tisato F (2015) Attentive monitoring of multiple video streams driven by a Bayesian foraging strategy. IEEE Trans Image Process 24(11):3266–3281
Ou S-H et al (2014) On-line multi-view video summarization for wireless video sensor network. IEEE J Select Topics Signal Process 9(1):165–179
Pan H, Van Beek P, Sezan M.I (2001) Detection of slow-motion replay segments in sports video for highlights generation. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (cat. No. 01CH37221). IEEE
Pan H, Li B, Sezan MI (2002) Automatic detection of replay segments in broadcast sports programs by detection of logos in scene transitions. In 2002 IEEE international conference on acoustics, speech, and signal processing. IEEE
Reed S, et al. (2014) Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596
Rezaee K, Rezakhani SM, Khosravi MR, Moghimi MK (2021) A survey on deep learning-based real-time crowd anomaly detection for secure distributed video surveillance. Pers Ubiquit Comput:1–17
Shaheed K, Mao A, Qureshi I, Kumar M, Hussain S, Ullah I, Zhang X (2022) DS-CNN: a pre-trained Xception model based on depth-wise separable convolutional neural network for finger vein recognition. Expert Syst Appl 191:116288
Sharif M, Khan MA, Akram T, Javed MY, Saba T, Rehman A (2017) A framework of human detection and action recognition based on uniform segmentation and combination of Euclidean distance and joint entropy-based features selection. EURASIP J Image Vid Process 2017(1):1–18
Song Y, et al (2015) Tvsum: Summarizing web videos using titles. in Proceedings of the IEEE conference on computer vision and pattern recognition
Tabbone S, Wendling L, Salmon J-P (2006) A new shape descriptor defined on the radon transform. Comput Vis Image Underst 102(1):42–51
Tang L-X, Mei T, Hua X-S (2009) Near-lossless video summarization. in Proceedings of the 17th ACM international conference on Multimedia
Taskiran CM et al (2006) Automated video program summarization using speech transcripts. IEEE Trans Multimed 8(4):775–791
Tavassolipour M, Karimian M, Kasaei S (2013) Event detection and summarization in soccer videos using bayesian network and copula. IEEE Trans Circ Syst Vid Technol 24(2):291–304
Tran TN, Wehrens R, Buydens LM (2006) KNN-kernel density-based clustering for high-dimensional multivariate data. Comput Stat Data Anal 51(2):513–525
Varghese EB, Thampi SM (2018) A deep learning approach to predict crowd behavior based on emotion. In international conference on smart multimedia. Springer
Varghese E, Thampi SM, Berretti S (2020) A psychologically inspired fuzzy cognitive deep learning framework to predict crowd behavior. IEEE Trans Affect Comput
Wang F, Ngo C-W (2007) Rushes video summarization by object and event understanding. In Proceedings of the international workshop on TRECVID video summarization, pp. 25–29
Wang T, et al. (2007) Video collage: a novel presentation of video sequence. In 2007 IEEE international conference on multimedia and expo. IEEE
Wang M, Hong R, Li G, Zha ZJ, Yan S, Chua TS (2012) Event driven web video summarization by tag localization and key-shot identification. IEEE Trans Multimed 14(4):975–985
Xu J, Sun Z, Ma C (2021) Crowd aware summarization of surveillance videos by deep reinforcement learning. Multimed Tools Appl 80(4):6121–6141
Yao T, Mei T, Rui Y (2016) Highlight detection with pairwise deep ranking for first-person video summarization. in Proceedings of the IEEE conference on computer vision and pattern recognition
You J, Liu G, Sun L, Li H (2007) A multiple visual models based perceptive analysis framework for multilevel video summarization. IEEE Trans Circuits Syst Vid Technol 17(3):273–285
Zawbaa HM, El-Bendary N, Hassanien AE, Kim TH (2011) Machine learning-based soccer video summarization system. In International Conference on Multimedia, Computer Graphics, and Broadcasting. 2011 (pp. 19–28). Springer, Berlin, Heidelberg
Zhang L, Xu QK, Nie LZ, Huang H (2014) VideoGraph: a non-linear video representation for efficient exploration. Vis Comput 30(10):1123–1132
Zhang S, Zhu Y, Roy-Chowdhury AK (2016) Context-aware surveillance video summarization. IEEE Trans Image Process 25(11):5469–5478
Zhang S, Zhang W, Li Y (2016) Human action recognition based on multifeature fusion. In Chinese intelligent systems conference. 2016. Springer
Zhao W, Wang J, Bhat D, Sakiewicz K, Nandhakumar N, Chang W (1999) Improving color based video shot detection. In Proceedings IEEE international conference on multimedia computing and systems (vol. 2, pp. 752–756). IEEE
Zhu X, et al. (2003) Medical video mining for efficient database indexing, management and access. In proceedings 19th international conference on data engineering (cat. No. 03CH37405). IEEE
Acknowledgements
The authors extend their appreciation to King Saud University, Riyadh, Saudi Arabia and UET Taxila for supporting this work.
Funding
The authors extend their appreciation to “King Saud University” for funding through researchers supporting project number (RSP- 2021/164), King Saud University, Riyadh, Saudi Arabia.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mahum, R., Irtaza, A., Nawaz, M. et al. A robust framework to generate surveillance video summaries using combination of zernike moments and r-transform and deep neural network. Multimed Tools Appl 82, 13811–13835 (2023). https://doi.org/10.1007/s11042-022-13773-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13773-4