ABSTRACT
Violence detection has become an important topic of video surveillance in the last decade. Some studies in violence video detection demonstrated that learned features from Convolution Neural Network (CNN) gives high accuracy compared to handcrafted features. For this reason, we evaluate several CNN architectures to detect violence action in video. This work compares five pretrained networks VGG16, VGG19, ResNet50, Inception V3, and Xception. Then, the extracted features from each frame are forwarded to a long short-term memory (LSTM) network. We evaluate the pretrained networks on class imbalance datasets since violence video detection might suffer from class imbalance. Two public datasets are being used to evaluate the model; hockey fight dataset and violent crowd dataset. Our experiment results show that InceptionV3 achieved better performance in most cases.
- Mujtaba Asad, Jie Yang, Jiang He, Pourya Shamsolmoali, and Xiangjian He. 2021. Multi-frame feature-fusion-based model for violence detection. The Visual Computer 37(2021), 1415–1431.Google ScholarDigital Library
- Hugo Calderon-Vilca, Kent Cuadros Ramos, Elmer Diaz Quiroz, Jorge Angeles Rojas, René Calderon Vilca, and Alejandro Apaza Tarqui. 2021. The Best Model of Convolutional Neural Networks Combined with LSTM for the Detection of Interpersonal Physical Violence in Videos. In 29th Conference of Open Innovations Association (FRUCT).Google Scholar
- Qi Fan, Zhe Wang, Dongdong Li, Daqi Gao, and Hongyuan Zha. 2017. Entropy-based fuzzy support vector machine for imbalanced datasets. Knowledge-Based Systems 115 (2017), 87–99.Google ScholarCross Ref
- Mariana-Iuliana Georgescu, Radu Tudor Ionescu, and Radu Tudor Ionescu. 2019. Local Learning with Deep and Handcrafted Features for Facial Expression Recognition. IEEE Access 37(2019), 64827–64836.Google ScholarCross Ref
- Tal Hassner, Yossi Itcher, and Orit Kliper-Gross. 2012. Violent flows: Real-time detection of violent crowd behavior. In the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).Google ScholarCross Ref
- Amira Ben Mabrouk and Ezzeddine Zagrouba. 2018. Abnormal behavior recognition for intelligent video surveillance systems : a review. Expert Systems With Applications 91 (2018), 480–491.Google ScholarDigital Library
- Loris Nanni, Stefano Ghidoni, and Sheryl Brahnam. 2017. Handcrafted vs Non-Handcrafted Features for computer vision classification. Pattern Recognition 71(2017), 158–172.Google ScholarCross Ref
- Enrique Bermejo Nievas, Oscar Deniz Suarez, Gloria Bueno García, and Rahul Sukthankar. 2011. Violence Detection in Video Using Computer Vision Techniques. In International Conference on Computer Analysis of Images and Patterns.Google Scholar
- Francisco A. Pujol, Higinio Mora, and Maria Luisa Pertegal. 2019. A soft computing approach to violence detection in social media for smart cities. Soft Computing 24(2019), 11007–11017.Google ScholarDigital Library
- Fath U Min Ullah, Amin Ullah, Khan Muhammad, Ijaz Ul Haq, and Sung Wook Baik. 2019. Violence Detection Using Spatiotemporal Features with 3D Convolutional Neural Network. sensors 19, 11 (2019), 1–15.Google Scholar
- Ting Xiao, Lei Liu, Kai Li, Wenjian Qin, Shaode Yu, and Zhicheng Li. 2018. Comparison of Transferred Deep Neural Networks in Ultrasonic Breast Masses Discrimination. BioMed Research International 2018 (2018), 1–9.Google Scholar
- Tao Zhang, Wenjing Jia, Baoqing Yang, Jie Yang, Xiangjian He, and Zhonglong Zheng. 2017. MoWLD: a robust motion image descriptor for violence detection. Multimed Tools Appl 76(2017), 1419–1438.Google ScholarDigital Library
- Peipei Zhou, Qinghai Ding, Haibo Luo, and Xinglin Hou. 2018. Violence detection in surveillance video using low-level features. PLOS ONE 13(2018), 1–15.Google ScholarCross Ref
Index Terms
- An Empirical Study of CNN-LSTM on Class Imbalance Datasets for Violence Video Detection
Recommendations
Chinese Text Classification Based on Hybrid Model of CNN and LSTM
DSIT 2020: Proceedings of the 3rd International Conference on Data Science and Information TechnologyText classification is one of the basic tasks of natural language processing. In recent years, deep learning has been widely used in text classification tasks. The representative one is the convolutional neural network. The convolutional neural network(...
Global Anomaly Detection Based on a Deep Prediction Neural Network
Human Centered ComputingAbstractAbnormal event detection in public scenes is very important in recent society. In this paper, a method for global anomaly detection in video surveillance is proposed, which is based on a deep prediction neural network. The deep prediction neural ...
Facial expression recognition using bidirectional LSTM - CNN
AbstractNowadays, there has been much attention on computer vision regarding human-computer interaction, especially facial expression recognition (FER). Many researchers have explored and suggested systems for this field. In this paper, we propose the ...
Comments