research-article

Robust, efficient and privacy-preserving violent activity recognition in videos

Authors:

Balasubramanian Raman,

Amitesh Singh RajputAuthors Info & Claims

SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

Pages 2081 - 2088

https://doi.org/10.1145/3341105.3373942

Published: 30 March 2020 Publication History

Abstract

Human activity recognition is an extensively researched topic in the field of computer vision. However, some specific events like aggressive behavior or fights have been relatively less investigated. The automatic recognition of such tasks is particularly important in video surveillance scenarios like prisons, railway stations, psychiatric wards, as well as filtering violent contents on-line. In this paper, we attempt to make a violent activity recognition system using deep learning paradigm, which is not only more accurate, but also can be deployed in real-time video surveillance systems. First, multiple approximate dynamic images (ADI) are computed from the input video sequence. An efficient convolutional neural network (CNN) called MobileNet is then used to extract short-term spatio-temporal features from these ADIs. These features are stacked together and fed to a gated recurrent unit (GRU) network, which enables modeling the long-term dynamics of the video sequence. In addition, we also introduce a privacy protection scheme based on randomization of pixel values. The proposed framework is evaluated on three violence recognition benchmark datasets, and the results obtained shows the superiority of our method both in terms of accuracy and memory requirement than the current state-of-the-art.

References

[1]

Hakan Bilen, Basura Fernando, Efstratios Gavves, and Andrea Vedaldi. 2017. Action recognition with dynamic image networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 12 (2017), 2799--2813.

Digital Library

[2]

Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi, and Stephen Gould. 2016. Dynamic image networks for action recognition. In Conference on Computer Vision and Pattern Recognition. IEEE, 3034--3042.

[3]

Piotr Bilinski and Francois Bremond. 2016. Human violence recognition and detection in surveillance videos. In International Conference on Advanced Video and Signal Based Surveillance. IEEE, 30--36.

[4]

Jie Chen, Shiguang Shan, Chu He, Guoying Zhao, Matti Pietikainen, Xilin Chen, and Wen Gao. 2009. WLD: A robust local image descriptor. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (2009), 1705--1720.

Digital Library

[5]

Liang-Hua Chen, Hsi-Wen Hsu, Li-Yun Wang, and Chih-Wen Su. 2011. Violence detection in movies. In International Conference Computer Graphics, Imaging and Visualization. IEEE, 119--124.

Digital Library

[6]

François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Conference on Computer Vision and Pattern Recognition. IEEE, 1251--1258.

[7]

C Clarin, J Dionisio, M Echavez, and P Naval. 2005. DOVE: Detection of movie violence using motion intensity analysis on skin and blood. In Philippine Computing Science Congress. CSP, 150--156.

[8]

Marco Cristani, Manuele Bicego, and Vittorio Murino. 2007. Audio-visual event recognition in surveillance video sequences. IEEE Transactions on Multimedia 9, 2 (2007), 257--267.

Digital Library

[9]

Oscar Deniz, Ismael Serrano, Gloria Bueno, and Tae-Kyun Kim. 2014. Fast violence detection in video. In International Conference on Computer Vision Theory and Applications, Vol. 2. IEEE, 478--485.

[10]

Chunhui Ding, Shouke Fan, Ming Zhu, Weiguo Feng, and Baozhi Jia. 2014. Violence detection in video by using 3D convolutional neural networks. In International Symposium on Visual Computing. Springer, 551--558.

[11]

Eknarin Ditsanthia, Luepol Pipanmaekaporn, and Suwatchai Kamonsantiroj. 2018. Video Representation Learning for CCTV-Based Violence Detection. In Technology Innovation Management and Engineering Science International Conference. IEEE, 1--5.

[12]

Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Conference on Computer Vision and Pattern Recognition. IEEE, 2625--2634.

[13]

Zhihong Dong, Jie Qin, and Yunhong Wang. 2016. Multi-stream deep networks for person to person violence detection in videos. In Chinese Conference on Pattern Recognition. Springer, 517--531.

[14]

Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Conference on Computer Vision and Pattern Recognition. 1110--1118.

[15]

Eugene Yujun Fu, Hong Va Leong, Grace Ngai, and Stephen Chan. 2015. Automatic fight detection based on motion analysis. In International Symposium on Multimedia. IEEE, 57--60.

[16]

Yuan Gao, Hong Liu, Xiaohu Sun, Can Wang, and Yi Liu. 2016. Violence detection using oriented violent flows. Image and Vision Computing 48 (2016), 37--41.

Digital Library

[17]

Ismael Serrano Gracia, Oscar Deniz Suarez, Gloria Bueno Garcia, and Tae-Kyun Kim. 2015. Fast fight detection. PloS one 10, 4 (2015), e0120448.

[18]

Alex Hanson, Koutilya Pnvr, Sanjukta Krishnagopal, and Larry Davis. 2018. Bidirectional Convolutional LSTM for the Detection of Violence in Videos. In European Conference on Computer Vision. Springer.

[19]

Tal Hassner, Yossi Itcher, and Orit Kliper-Gross. 2012. Violent flows: Real-time detection of violent crowd behavior. In Computer Vision and Pattern Recognition Workshops. IEEE, 1--6.

[20]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition. IEEE, 770--778.

[21]

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[22]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Conference on Computer Vision and Pattern Recognition. IEEE, 4700--4708.

[23]

AS Keçeli and A Kaya. 2017. Violent activity detection with transfer learning method. Electronics Letters 53, 15 (2017), 1047--1048.

[24]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.

[25]

Stefan Leutenegger, Margarita Chli, and Roland Siegwart. 2011. BRISK: Binary robust invariant scalable keypoints. In International Conference on Computer Vision. IEEE, 2548--2555.

Digital Library

[26]

Ce Li, Chunyu Xie, Baochang Zhang, Chen Chen, and Jungong Han. 2018. Deep Fisher discriminant learning for mobile hand gesture recognition. Pattern Recognition 77 (2018), 276--288.

Digital Library

[27]

Jian Lin and Weiqiang Wang. 2009. Weakly-supervised violence detection in movies with audio and video based co-training. In Pacific-Rim Conference on Multimedia. Springer, 930--935.

Digital Library

[28]

Amira Ben Mabrouk and Ezzeddine Zagrouba. 2017. Spatio-temporal feature using optical flow based distribution for violence detection. Pattern Recognition Letters 92 (2017), 62--67.

Digital Library

[29]

Richard McPherson, Reza Shokri, and Vitaly Shmatikov. 2016. Defeating image obfuscation with deep learning. arXiv preprint arXiv:1609.00408 (2016).

[30]

Sadegh Mohammadi, Hamed Kiani, Alessandro Perina, and Vittorio Murino. 2015. Violence detection in crowded scenes using substantial derivative. In International Conference on Advanced Video and Signal Based Surveillance. IEEE, 1--6.

[31]

Enrique Bermejo Nievas, Oscar Deniz Suarez, Gloria Bueno García, and Rahul Sukthankar. 2011. Violence detection in video using computer vision techniques. In International Conference on Computer Analysis of Images and Patterns. Springer, 332--339.

Digital Library

[32]

Marko Robnik-Šikonja and Igor Kononenko. 2003. Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning 53, 1--2 (2003), 23--69.

Digital Library

[33]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.

Digital Library

[34]

Michael S Ryoo, Kiyoon Kim, and Hyun Jong Yang. 2018. Extreme low resolution activity recognition with multi-siamese embedding learning. In AAAI Conference on Artificial Intelligence.

[35]

Michael S Ryoo, Brandon Rothrock, Charles Fleming, and Hyun Jong Yang. 2017. Privacy-preserving human activity recognition from extreme low resolution. In AAAI Conference on Artificial Intelligence.

[36]

Ismael Serrano, Oscar Deniz, Gloria Bueno, Guillermo Garcia-Hernando, and Tae-Kyun Kim. 2018. Spatio-temporal elastic cuboid trajectories for efficient fight recognition using Hough forests. Machine Vision and Applications 29, 2 (2018), 207--217.

Digital Library

[37]

Ismael Serrano, Oscar Deniz, Jose Luis Espinosa-Aranda, and Gloria Bueno. 2018. Fight recognition in video using hough forests and 2D convolutional neural network. IEEE Transactions on Image Processing 27, 10 (2018), 4787--4797.

Digital Library

[38]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[39]

AJ Smola and B Schölkopf. 2004. A tutorial on support vector regression. Statistics and computing 14 (2004), 199--222.

[40]

Wei Song, Dongliang Zhang, Xiaobing Zhao, Jing Yu, Rui Zheng, and Antai Wang. 2019. A Novel Violent Video Detection Scheme Based on Modified 3D Convolutional Neural Networks. IEEE Access 7 (2019), 39172--39179.

[41]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.

Digital Library

[42]

Swathikiran Sudhakaran and Oswald Lanz. 2017. Learning to detect violent videos using convolutional long short-term memory. In International Conference on Advanced Video and Signal Based Surveillance. IEEE, 1--6.

[43]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In International Conference on Computer Vision. IEEE, 4489--4497.

Digital Library

[44]

Amin Ullah, Jamil Ahmad, Khan Muhammad, Muhammad Sajjad, and Sung Wook Baik. 2017. Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6 (2017), 1155--1166.

[45]

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision. Springer, 20--36.

[46]

Long Xu, Chen Gong, Jie Yang, Qiang Wu, and Lixiu Yao. 2014. Violent video detection based on MoSIFT feature and sparse coding. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3538--3542.

[47]

Mingze Xu, Aidean Sharghi, Xin Chen, and David J Crandall. 2018. Fully-coupled two-stream spatiotemporal networks for extremely low resolution action recognition. In Winter Conference on Applications of Computer Vision (WACV). IEEE, 1607--1615.

[48]

Xingyu Xu, Xiaoyu Wu, Ge Wang, and Huimin Wang. 2018. Violent Video Classification Based on Spatial-Temporal Cues Using Deep Learning. In International Symposium on Computational Intelligence and Design, Vol. 1. IEEE, 319--322.

[49]

Angela Yao, Juergen Gall, and Luc Van Gool. 2010. A hough transform-based voting framework for action recognition. In Conference on Computer Vision and Pattern Recognition. IEEE, 2061--2068.

[50]

Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In Conference on Computer Vision and Pattern Recognition. IEEE, 4694--4702.

[51]

Tao Zhang, Wenjing Jia, Chen Gong, Jun Sun, and Xiaoning Song. 2018. Semi-supervised dictionary learning via local sparse constraints for violence detection. Pattern Recognition Letters 107 (2018), 98--104.

[52]

Tao Zhang, Wenjing Jia, Xiangjian He, and Jie Yang. 2016. Discriminative dictionary learning with motion weber local descriptor for violence detection. IEEE Transactions on Circuits and Systems for Video Technology 27, 3 (2016), 696--709.

Digital Library

[53]

Tao Zhang, Wenjing Jia, Baoqing Yang, Jie Yang, Xiangjian He, and Zhonglong Zheng. 2017. MoWLD: a robust motion image descriptor for violence detection. Multimedia Tools and Applications 76, 1 (2017), 1419--1438.

Digital Library

[54]

Tao Zhang, Zhijie Yang, Wenjing Jia, Baoqing Yang, Jie Yang, and Xiangjian He. 2016. A new method for violence detection in surveillance scenes. Multimedia Tools and Applications 75, 12 (2016), 7327--7349.

Digital Library

[55]

Rui Zhao, Haider Ali, and Patrick Van der Smagt. 2017. Two-stream RNN/CNN for action recognition in 3D videos. In International Conference on Intelligent Robots and Systems. IEEE, 4260--4267.

Digital Library

[56]

Peipei Zhou, Qinghai Ding, Haibo Luo, and Xinglin Hou. 2017. Violent interaction detection in video based on deep learning. Journal of Physics: Conference Series 844, 1 (2017), 012044.

[57]

Peipei Zhou, Qinghai Ding, Haibo Luo, and Xinglin Hou. 2018. Violence detection in surveillance video using low-level features. PLoS one 13, 10 (2018), e0203668.

[58]

Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie. 2016. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In AAAI Conference on Artificial Intelligence.

Cited By

Pandey AKumar P(2025)BGRU-MTRA: bilinear GRU networks with multi-path temporal residual attention for suspicious activity recognitionNeural Computing and Applications10.1007/s00521-024-10416-737:1(185-212)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s00521-024-10416-7
Frimpong EKhan TMichalas A(2024)Secrets in Motion: Privacy-Preserving Video Classification with Built-In Access Control2024 9th International Conference on Smart and Sustainable Technologies (SpliTech)10.23919/SpliTech61897.2024.10612492(1-6)Online publication date: 25-Jun-2024
https://doi.org/10.23919/SpliTech61897.2024.10612492
Boddupally JANAIAH Suresh PABBOJU (2024)HARGAN: Generative Adversarial Network BasedDeep Learning Framework for Efficient Recognition of Human Actions from Surveillance VideosInternational Journal of Computational and Experimental Science and Engineering10.22399/ijcesen.58710:4Online publication date: 12-Dec-2024
https://doi.org/10.22399/ijcesen.587
Show More Cited By

Index Terms

Robust, efficient and privacy-preserving violent activity recognition in videos

Recommendations

Surveillance video face recognition with single sample per person based on 3D modeling and blurring

Video surveillance has attracted more and more interests in the last decade, video-based Face Recognition (FR) therefore became an important task. However, the surveillance videos include many vague non-frontal faces especially the view of faces looking ...
Multimodal Egocentric Activity Recognition Using Multi-stream CNN
ICVGIP '18: Proceedings of the 11th Indian Conference on Computer Vision, Graphics and Image Processing

Egocentric activity recognition (EAR) is an emerging area in the field of computer vision research. Motivated by the current success of Convolutional Neural Network (CNN), we propose a multi-stream CNN for multimodal egocentric activity recognition ...
Automatic pose normalization for open-set single-sample face recognition in video surveillance
Abstract
Face images acquired by video surveillance cameras usually involve large pose variations which significantly degrade the performance of face recognition systems. Existing techniques address the pose variation problem by normalizing the arbitrary ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

March 2020

2348 pages

ISBN:9781450368667

DOI:10.1145/3341105

Conference Chairs:
Chih-Cheng Hung
Kennesaw State University
,
Tomas Cerny
Baylor University
,
Program Chairs:
Dongwan Shin
New Mexico Tech
,
Alessio Bechini
University of Pisa, Italy

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SAC '20

Sponsor:

SIGAPP

SAC '20: The 35th ACM/SIGAPP Symposium on Applied Computing

March 30 - April 3, 2020

Brno, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
387
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pandey AKumar P(2025)BGRU-MTRA: bilinear GRU networks with multi-path temporal residual attention for suspicious activity recognitionNeural Computing and Applications10.1007/s00521-024-10416-737:1(185-212)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s00521-024-10416-7
Frimpong EKhan TMichalas A(2024)Secrets in Motion: Privacy-Preserving Video Classification with Built-In Access Control2024 9th International Conference on Smart and Sustainable Technologies (SpliTech)10.23919/SpliTech61897.2024.10612492(1-6)Online publication date: 25-Jun-2024
https://doi.org/10.23919/SpliTech61897.2024.10612492
Boddupally JANAIAH Suresh PABBOJU (2024)HARGAN: Generative Adversarial Network BasedDeep Learning Framework for Efficient Recognition of Human Actions from Surveillance VideosInternational Journal of Computational and Experimental Science and Engineering10.22399/ijcesen.58710:4Online publication date: 12-Dec-2024
https://doi.org/10.22399/ijcesen.587
Imandeka EPutra PHidayanto AMahmud M(2024)Exploring the World of Smart Prisons: Barriers, Trends, and Sustainable SolutionsHuman Behavior and Emerging Technologies10.1155/2024/61581542024:1Online publication date: 9-Sep-2024
https://doi.org/10.1155/2024/6158154
Pandey AKumar P(2024)Resstanet: deep residual spatio-temporal attention network for violent action recognitionInternational Journal of Information Technology10.1007/s41870-024-01799-w16:5(2891-2900)Online publication date: 25-Mar-2024
https://doi.org/10.1007/s41870-024-01799-w
Akula VKavati I(2024)Human Violence Detection in Videos Using Key Frame Identification and 3D CNN with Convolutional Block Attention ModuleCircuits, Systems, and Signal Processing10.1007/s00034-024-02824-w43:12(7924-7950)Online publication date: 13-Aug-2024
https://doi.org/10.1007/s00034-024-02824-w
Pham DYoon BVu VKim JAhn SChang JYoo HSun KKim KKim K(2023)A Deep Learning-Based Real-Time Video Object Contextualizing and Archiving System2023 25th International Conference on Advanced Communication Technology (ICACT)10.23919/ICACT56868.2023.10079454(137-144)Online publication date: 19-Feb-2023
https://doi.org/10.23919/ICACT56868.2023.10079454
Zhang YYang MHuang ZHe LWu Z(2023)Frame importance and temporal memory effect-based fast video quality assessment for user-generated contentApplied Intelligence10.1007/s10489-023-04624-253:19(21517-21531)Online publication date: 5-Jun-2023
https://doi.org/10.1007/s10489-023-04624-2
Donia MYoussif AEl-Behaidy W(2023)Anomalous-Aggressive Event Detection TechniquesProceedings of Eighth International Congress on Information and Communication Technology10.1007/978-981-99-3043-2_7(77-95)Online publication date: 1-Sep-2023
https://doi.org/10.1007/978-981-99-3043-2_7
Chexia ZTan ZWu DNing JZhang B(2022)A Generalized Model for Crowd Violence Detection Focusing on Human Contour and Dynamic Features2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00042(327-335)Online publication date: May-2022
https://doi.org/10.1109/CCGrid54584.2022.00042
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten