Abstract:
In the contemporary landscape of pervasive surveillance, the significance of video-modal-based violence detection in crime prevention and public safety is paramount. This...Show MoreMetadata
Abstract:
In the contemporary landscape of pervasive surveillance, the significance of video-modal-based violence detection in crime prevention and public safety is paramount. This research endeavors to rectify the limitations of previous studies that grappled with balancing accuracy and computational efficiency in violence detection models. Prior research predominantly focused on human-centric forms of violence, neglecting a comprehensive exploration of diverse manifestations. This study introduces a novel deep neural network architecture for violence video classification, utilizing the custom Violent-500 dataset. The proposed architecture achieves an accuracy of 92.40%, with a video processing time of 5 milliseconds and 0.57 GFLOPs. The model incorporates multiscale ConvLSTM and EvoNorm-S0 to optimize performance while reducing parameters compared to existing models. The Violent-500 dataset, comprising 500 labeled videos, enriches the diversity of data for violence detection. Furthermore, Grad-CAM visualization enhances interpretability in the model’s decision-making process. Evaluation extends beyond the Violent500 dataset to include assessments on two additional datasets, affirming the proposed architecture’s efficacy, efficiency, and interpretability in addressing challenges in video-modal-based violence detection.
Date of Conference: 18-21 June 2024
Date Added to IEEE Xplore: 19 July 2024
ISBN Information: