research-article

An Empirical Study of CNN-LSTM on Class Imbalance Datasets for Violence Video Detection

Authors:
Moch Arief Soeleman

University of Dian Nuswantoro, Indonesia

University of Dian Nuswantoro, Indonesia
View Profile

,
Catur Supriyanto

University of Dian Nuswantoro, Indonesia

University of Dian Nuswantoro, Indonesia
View Profile

,
Dwi Puji Prabowo

University of Dian Nuswantoro, Indonesia

University of Dian Nuswantoro, Indonesia
View Profile

IC3INA '21: Proceedings of the 2021 International Conference on Computer, Control, Informatics and Its ApplicationsOctober 2021Pages 81–85https://doi.org/10.1145/3489088.3489126

Published:13 February 2022Publication History

IC3INA '21: Proceedings of the 2021 International Conference on Computer, Control, Informatics and Its Applications

Pages 81–85

ABSTRACT

Violence detection has become an important topic of video surveillance in the last decade. Some studies in violence video detection demonstrated that learned features from Convolution Neural Network (CNN) gives high accuracy compared to handcrafted features. For this reason, we evaluate several CNN architectures to detect violence action in video. This work compares five pretrained networks VGG16, VGG19, ResNet50, Inception V3, and Xception. Then, the extracted features from each frame are forwarded to a long short-term memory (LSTM) network. We evaluate the pretrained networks on class imbalance datasets since violence video detection might suffer from class imbalance. Two public datasets are being used to evaluate the model; hockey fight dataset and violent crowd dataset. Our experiment results show that InceptionV3 achieved better performance in most cases.

References

Mujtaba Asad, Jie Yang, Jiang He, Pourya Shamsolmoali, and Xiangjian He. 2021. Multi-frame feature-fusion-based model for violence detection. The Visual Computer 37(2021), 1415–1431.Google ScholarDigital Library
Hugo Calderon-Vilca, Kent Cuadros Ramos, Elmer Diaz Quiroz, Jorge Angeles Rojas, René Calderon Vilca, and Alejandro Apaza Tarqui. 2021. The Best Model of Convolutional Neural Networks Combined with LSTM for the Detection of Interpersonal Physical Violence in Videos. In 29th Conference of Open Innovations Association (FRUCT).Google Scholar
Qi Fan, Zhe Wang, Dongdong Li, Daqi Gao, and Hongyuan Zha. 2017. Entropy-based fuzzy support vector machine for imbalanced datasets. Knowledge-Based Systems 115 (2017), 87–99.Google ScholarCross Ref
Mariana-Iuliana Georgescu, Radu Tudor Ionescu, and Radu Tudor Ionescu. 2019. Local Learning with Deep and Handcrafted Features for Facial Expression Recognition. IEEE Access 37(2019), 64827–64836.Google ScholarCross Ref
Tal Hassner, Yossi Itcher, and Orit Kliper-Gross. 2012. Violent flows: Real-time detection of violent crowd behavior. In the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).Google ScholarCross Ref
Amira Ben Mabrouk and Ezzeddine Zagrouba. 2018. Abnormal behavior recognition for intelligent video surveillance systems : a review. Expert Systems With Applications 91 (2018), 480–491.Google ScholarDigital Library
Loris Nanni, Stefano Ghidoni, and Sheryl Brahnam. 2017. Handcrafted vs Non-Handcrafted Features for computer vision classification. Pattern Recognition 71(2017), 158–172.Google ScholarCross Ref
Enrique Bermejo Nievas, Oscar Deniz Suarez, Gloria Bueno García, and Rahul Sukthankar. 2011. Violence Detection in Video Using Computer Vision Techniques. In International Conference on Computer Analysis of Images and Patterns.Google Scholar
Francisco A. Pujol, Higinio Mora, and Maria Luisa Pertegal. 2019. A soft computing approach to violence detection in social media for smart cities. Soft Computing 24(2019), 11007–11017.Google ScholarDigital Library
Fath U Min Ullah, Amin Ullah, Khan Muhammad, Ijaz Ul Haq, and Sung Wook Baik. 2019. Violence Detection Using Spatiotemporal Features with 3D Convolutional Neural Network. sensors 19, 11 (2019), 1–15.Google Scholar
Ting Xiao, Lei Liu, Kai Li, Wenjian Qin, Shaode Yu, and Zhicheng Li. 2018. Comparison of Transferred Deep Neural Networks in Ultrasonic Breast Masses Discrimination. BioMed Research International 2018 (2018), 1–9.Google Scholar
Tao Zhang, Wenjing Jia, Baoqing Yang, Jie Yang, Xiangjian He, and Zhonglong Zheng. 2017. MoWLD: a robust motion image descriptor for violence detection. Multimed Tools Appl 76(2017), 1419–1438.Google ScholarDigital Library
Peipei Zhou, Qinghai Ding, Haibo Luo, and Xinglin Hou. 2018. Violence detection in surveillance video using low-level features. PLOS ONE 13(2018), 1–15.Google ScholarCross Ref

Index Terms

An Empirical Study of CNN-LSTM on Class Imbalance Datasets for Violence Video Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Chinese Text Classification Based on Hybrid Model of CNN and LSTM
DSIT 2020: Proceedings of the 3rd International Conference on Data Science and Information Technology

Text classification is one of the basic tasks of natural language processing. In recent years, deep learning has been widely used in text classification tasks. The representative one is the convolutional neural network. The convolutional neural network(...
Read More
Global Anomaly Detection Based on a Deep Prediction Neural Network
Human Centered Computing
Abstract
Abnormal event detection in public scenes is very important in recent society. In this paper, a method for global anomaly detection in video surveillance is proposed, which is based on a deep prediction neural network. The deep prediction neural ...
Read More
Facial expression recognition using bidirectional LSTM - CNN
Abstract
Nowadays, there has been much attention on computer vision regarding human-computer interaction, especially facial expression recognition (FER). Many researchers have explored and suggested systems for this field. In this paper, we propose the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

IC3INA '21: Proceedings of the 2021 International Conference on Computer, Control, Informatics and Its Applications
October 2021
204 pages
ISBN:9781450385244
DOI:10.1145/3489088

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 February 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CNN
Imbalance dataset
LSTM
Violence video detection
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 40
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

An Empirical Study of CNN-LSTM on Class Imbalance Datasets for Violence Video Detection

IC3INA '21: Proceedings of the 2021 International Conference on Computer, Control, Informatics and Its Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

Chinese Text Classification Based on Hybrid Model of CNN and LSTM

Global Anomaly Detection Based on a Deep Prediction Neural Network

Facial expression recognition using bidirectional LSTM - CNN

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

An Empirical Study of CNN-LSTM on Class Imbalance Datasets for Violence Video Detection

IC3INA '21: Proceedings of the 2021 International Conference on Computer, Control, Informatics and Its Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

Chinese Text Classification Based on Hybrid Model of CNN and LSTM

Global Anomaly Detection Based on a Deep Prediction Neural Network

Facial expression recognition using bidirectional LSTM - CNN

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media