skip to main content
10.1145/3341105.3373942acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Robust, efficient and privacy-preserving violent activity recognition in videos

Published: 30 March 2020 Publication History

Abstract

Human activity recognition is an extensively researched topic in the field of computer vision. However, some specific events like aggressive behavior or fights have been relatively less investigated. The automatic recognition of such tasks is particularly important in video surveillance scenarios like prisons, railway stations, psychiatric wards, as well as filtering violent contents on-line. In this paper, we attempt to make a violent activity recognition system using deep learning paradigm, which is not only more accurate, but also can be deployed in real-time video surveillance systems. First, multiple approximate dynamic images (ADI) are computed from the input video sequence. An efficient convolutional neural network (CNN) called MobileNet is then used to extract short-term spatio-temporal features from these ADIs. These features are stacked together and fed to a gated recurrent unit (GRU) network, which enables modeling the long-term dynamics of the video sequence. In addition, we also introduce a privacy protection scheme based on randomization of pixel values. The proposed framework is evaluated on three violence recognition benchmark datasets, and the results obtained shows the superiority of our method both in terms of accuracy and memory requirement than the current state-of-the-art.

References

[1]
Hakan Bilen, Basura Fernando, Efstratios Gavves, and Andrea Vedaldi. 2017. Action recognition with dynamic image networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 12 (2017), 2799--2813.
[2]
Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi, and Stephen Gould. 2016. Dynamic image networks for action recognition. In Conference on Computer Vision and Pattern Recognition. IEEE, 3034--3042.
[3]
Piotr Bilinski and Francois Bremond. 2016. Human violence recognition and detection in surveillance videos. In International Conference on Advanced Video and Signal Based Surveillance. IEEE, 30--36.
[4]
Jie Chen, Shiguang Shan, Chu He, Guoying Zhao, Matti Pietikainen, Xilin Chen, and Wen Gao. 2009. WLD: A robust local image descriptor. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 9 (2009), 1705--1720.
[5]
Liang-Hua Chen, Hsi-Wen Hsu, Li-Yun Wang, and Chih-Wen Su. 2011. Violence detection in movies. In International Conference Computer Graphics, Imaging and Visualization. IEEE, 119--124.
[6]
François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Conference on Computer Vision and Pattern Recognition. IEEE, 1251--1258.
[7]
C Clarin, J Dionisio, M Echavez, and P Naval. 2005. DOVE: Detection of movie violence using motion intensity analysis on skin and blood. In Philippine Computing Science Congress. CSP, 150--156.
[8]
Marco Cristani, Manuele Bicego, and Vittorio Murino. 2007. Audio-visual event recognition in surveillance video sequences. IEEE Transactions on Multimedia 9, 2 (2007), 257--267.
[9]
Oscar Deniz, Ismael Serrano, Gloria Bueno, and Tae-Kyun Kim. 2014. Fast violence detection in video. In International Conference on Computer Vision Theory and Applications, Vol. 2. IEEE, 478--485.
[10]
Chunhui Ding, Shouke Fan, Ming Zhu, Weiguo Feng, and Baozhi Jia. 2014. Violence detection in video by using 3D convolutional neural networks. In International Symposium on Visual Computing. Springer, 551--558.
[11]
Eknarin Ditsanthia, Luepol Pipanmaekaporn, and Suwatchai Kamonsantiroj. 2018. Video Representation Learning for CCTV-Based Violence Detection. In Technology Innovation Management and Engineering Science International Conference. IEEE, 1--5.
[12]
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Conference on Computer Vision and Pattern Recognition. IEEE, 2625--2634.
[13]
Zhihong Dong, Jie Qin, and Yunhong Wang. 2016. Multi-stream deep networks for person to person violence detection in videos. In Chinese Conference on Pattern Recognition. Springer, 517--531.
[14]
Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Conference on Computer Vision and Pattern Recognition. 1110--1118.
[15]
Eugene Yujun Fu, Hong Va Leong, Grace Ngai, and Stephen Chan. 2015. Automatic fight detection based on motion analysis. In International Symposium on Multimedia. IEEE, 57--60.
[16]
Yuan Gao, Hong Liu, Xiaohu Sun, Can Wang, and Yi Liu. 2016. Violence detection using oriented violent flows. Image and Vision Computing 48 (2016), 37--41.
[17]
Ismael Serrano Gracia, Oscar Deniz Suarez, Gloria Bueno Garcia, and Tae-Kyun Kim. 2015. Fast fight detection. PloS one 10, 4 (2015), e0120448.
[18]
Alex Hanson, Koutilya Pnvr, Sanjukta Krishnagopal, and Larry Davis. 2018. Bidirectional Convolutional LSTM for the Detection of Violence in Videos. In European Conference on Computer Vision. Springer.
[19]
Tal Hassner, Yossi Itcher, and Orit Kliper-Gross. 2012. Violent flows: Real-time detection of violent crowd behavior. In Computer Vision and Pattern Recognition Workshops. IEEE, 1--6.
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition. IEEE, 770--778.
[21]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
[22]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Conference on Computer Vision and Pattern Recognition. IEEE, 4700--4708.
[23]
AS Keçeli and A Kaya. 2017. Violent activity detection with transfer learning method. Electronics Letters 53, 15 (2017), 1047--1048.
[24]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.
[25]
Stefan Leutenegger, Margarita Chli, and Roland Siegwart. 2011. BRISK: Binary robust invariant scalable keypoints. In International Conference on Computer Vision. IEEE, 2548--2555.
[26]
Ce Li, Chunyu Xie, Baochang Zhang, Chen Chen, and Jungong Han. 2018. Deep Fisher discriminant learning for mobile hand gesture recognition. Pattern Recognition 77 (2018), 276--288.
[27]
Jian Lin and Weiqiang Wang. 2009. Weakly-supervised violence detection in movies with audio and video based co-training. In Pacific-Rim Conference on Multimedia. Springer, 930--935.
[28]
Amira Ben Mabrouk and Ezzeddine Zagrouba. 2017. Spatio-temporal feature using optical flow based distribution for violence detection. Pattern Recognition Letters 92 (2017), 62--67.
[29]
Richard McPherson, Reza Shokri, and Vitaly Shmatikov. 2016. Defeating image obfuscation with deep learning. arXiv preprint arXiv:1609.00408 (2016).
[30]
Sadegh Mohammadi, Hamed Kiani, Alessandro Perina, and Vittorio Murino. 2015. Violence detection in crowded scenes using substantial derivative. In International Conference on Advanced Video and Signal Based Surveillance. IEEE, 1--6.
[31]
Enrique Bermejo Nievas, Oscar Deniz Suarez, Gloria Bueno García, and Rahul Sukthankar. 2011. Violence detection in video using computer vision techniques. In International Conference on Computer Analysis of Images and Patterns. Springer, 332--339.
[32]
Marko Robnik-Šikonja and Igor Kononenko. 2003. Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning 53, 1--2 (2003), 23--69.
[33]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.
[34]
Michael S Ryoo, Kiyoon Kim, and Hyun Jong Yang. 2018. Extreme low resolution activity recognition with multi-siamese embedding learning. In AAAI Conference on Artificial Intelligence.
[35]
Michael S Ryoo, Brandon Rothrock, Charles Fleming, and Hyun Jong Yang. 2017. Privacy-preserving human activity recognition from extreme low resolution. In AAAI Conference on Artificial Intelligence.
[36]
Ismael Serrano, Oscar Deniz, Gloria Bueno, Guillermo Garcia-Hernando, and Tae-Kyun Kim. 2018. Spatio-temporal elastic cuboid trajectories for efficient fight recognition using Hough forests. Machine Vision and Applications 29, 2 (2018), 207--217.
[37]
Ismael Serrano, Oscar Deniz, Jose Luis Espinosa-Aranda, and Gloria Bueno. 2018. Fight recognition in video using hough forests and 2D convolutional neural network. IEEE Transactions on Image Processing 27, 10 (2018), 4787--4797.
[38]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[39]
AJ Smola and B Schölkopf. 2004. A tutorial on support vector regression. Statistics and computing 14 (2004), 199--222.
[40]
Wei Song, Dongliang Zhang, Xiaobing Zhao, Jing Yu, Rui Zheng, and Antai Wang. 2019. A Novel Violent Video Detection Scheme Based on Modified 3D Convolutional Neural Networks. IEEE Access 7 (2019), 39172--39179.
[41]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.
[42]
Swathikiran Sudhakaran and Oswald Lanz. 2017. Learning to detect violent videos using convolutional long short-term memory. In International Conference on Advanced Video and Signal Based Surveillance. IEEE, 1--6.
[43]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In International Conference on Computer Vision. IEEE, 4489--4497.
[44]
Amin Ullah, Jamil Ahmad, Khan Muhammad, Muhammad Sajjad, and Sung Wook Baik. 2017. Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6 (2017), 1155--1166.
[45]
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision. Springer, 20--36.
[46]
Long Xu, Chen Gong, Jie Yang, Qiang Wu, and Lixiu Yao. 2014. Violent video detection based on MoSIFT feature and sparse coding. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3538--3542.
[47]
Mingze Xu, Aidean Sharghi, Xin Chen, and David J Crandall. 2018. Fully-coupled two-stream spatiotemporal networks for extremely low resolution action recognition. In Winter Conference on Applications of Computer Vision (WACV). IEEE, 1607--1615.
[48]
Xingyu Xu, Xiaoyu Wu, Ge Wang, and Huimin Wang. 2018. Violent Video Classification Based on Spatial-Temporal Cues Using Deep Learning. In International Symposium on Computational Intelligence and Design, Vol. 1. IEEE, 319--322.
[49]
Angela Yao, Juergen Gall, and Luc Van Gool. 2010. A hough transform-based voting framework for action recognition. In Conference on Computer Vision and Pattern Recognition. IEEE, 2061--2068.
[50]
Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In Conference on Computer Vision and Pattern Recognition. IEEE, 4694--4702.
[51]
Tao Zhang, Wenjing Jia, Chen Gong, Jun Sun, and Xiaoning Song. 2018. Semi-supervised dictionary learning via local sparse constraints for violence detection. Pattern Recognition Letters 107 (2018), 98--104.
[52]
Tao Zhang, Wenjing Jia, Xiangjian He, and Jie Yang. 2016. Discriminative dictionary learning with motion weber local descriptor for violence detection. IEEE Transactions on Circuits and Systems for Video Technology 27, 3 (2016), 696--709.
[53]
Tao Zhang, Wenjing Jia, Baoqing Yang, Jie Yang, Xiangjian He, and Zhonglong Zheng. 2017. MoWLD: a robust motion image descriptor for violence detection. Multimedia Tools and Applications 76, 1 (2017), 1419--1438.
[54]
Tao Zhang, Zhijie Yang, Wenjing Jia, Baoqing Yang, Jie Yang, and Xiangjian He. 2016. A new method for violence detection in surveillance scenes. Multimedia Tools and Applications 75, 12 (2016), 7327--7349.
[55]
Rui Zhao, Haider Ali, and Patrick Van der Smagt. 2017. Two-stream RNN/CNN for action recognition in 3D videos. In International Conference on Intelligent Robots and Systems. IEEE, 4260--4267.
[56]
Peipei Zhou, Qinghai Ding, Haibo Luo, and Xinglin Hou. 2017. Violent interaction detection in video based on deep learning. Journal of Physics: Conference Series 844, 1 (2017), 012044.
[57]
Peipei Zhou, Qinghai Ding, Haibo Luo, and Xinglin Hou. 2018. Violence detection in surveillance video using low-level features. PLoS one 13, 10 (2018), e0203668.
[58]
Wentao Zhu, Cuiling Lan, Junliang Xing, Wenjun Zeng, Yanghao Li, Li Shen, and Xiaohui Xie. 2016. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In AAAI Conference on Artificial Intelligence.

Cited By

View all
  • (2025)BGRU-MTRA: bilinear GRU networks with multi-path temporal residual attention for suspicious activity recognitionNeural Computing and Applications10.1007/s00521-024-10416-737:1(185-212)Online publication date: 1-Jan-2025
  • (2024)Secrets in Motion: Privacy-Preserving Video Classification with Built-In Access Control2024 9th International Conference on Smart and Sustainable Technologies (SpliTech)10.23919/SpliTech61897.2024.10612492(1-6)Online publication date: 25-Jun-2024
  • (2024)HARGAN: Generative Adversarial Network BasedDeep Learning Framework for Efficient Recognition of Human Actions from Surveillance VideosInternational Journal of Computational and Experimental Science and Engineering10.22399/ijcesen.58710:4Online publication date: 12-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing
March 2020
2348 pages
ISBN:9781450368667
DOI:10.1145/3341105
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 March 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. dynamic image
  3. privacy protection
  4. video surveillance
  5. violent activity recognition

Qualifiers

  • Research-article

Conference

SAC '20
Sponsor:
SAC '20: The 35th ACM/SIGAPP Symposium on Applied Computing
March 30 - April 3, 2020
Brno, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)BGRU-MTRA: bilinear GRU networks with multi-path temporal residual attention for suspicious activity recognitionNeural Computing and Applications10.1007/s00521-024-10416-737:1(185-212)Online publication date: 1-Jan-2025
  • (2024)Secrets in Motion: Privacy-Preserving Video Classification with Built-In Access Control2024 9th International Conference on Smart and Sustainable Technologies (SpliTech)10.23919/SpliTech61897.2024.10612492(1-6)Online publication date: 25-Jun-2024
  • (2024)HARGAN: Generative Adversarial Network BasedDeep Learning Framework for Efficient Recognition of Human Actions from Surveillance VideosInternational Journal of Computational and Experimental Science and Engineering10.22399/ijcesen.58710:4Online publication date: 12-Dec-2024
  • (2024)Exploring the World of Smart Prisons: Barriers, Trends, and Sustainable SolutionsHuman Behavior and Emerging Technologies10.1155/2024/61581542024:1Online publication date: 9-Sep-2024
  • (2024)Resstanet: deep residual spatio-temporal attention network for violent action recognitionInternational Journal of Information Technology10.1007/s41870-024-01799-w16:5(2891-2900)Online publication date: 25-Mar-2024
  • (2024)Human Violence Detection in Videos Using Key Frame Identification and 3D CNN with Convolutional Block Attention ModuleCircuits, Systems, and Signal Processing10.1007/s00034-024-02824-w43:12(7924-7950)Online publication date: 13-Aug-2024
  • (2023)A Deep Learning-Based Real-Time Video Object Contextualizing and Archiving System2023 25th International Conference on Advanced Communication Technology (ICACT)10.23919/ICACT56868.2023.10079454(137-144)Online publication date: 19-Feb-2023
  • (2023)Frame importance and temporal memory effect-based fast video quality assessment for user-generated contentApplied Intelligence10.1007/s10489-023-04624-253:19(21517-21531)Online publication date: 5-Jun-2023
  • (2023)Anomalous-Aggressive Event Detection TechniquesProceedings of Eighth International Congress on Information and Communication Technology10.1007/978-981-99-3043-2_7(77-95)Online publication date: 1-Sep-2023
  • (2022)A Generalized Model for Crowd Violence Detection Focusing on Human Contour and Dynamic Features2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00042(327-335)Online publication date: May-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media