Abstract
This paper mainly studies the retrieval and location of violence in long time sequence video. Aiming at the low accuracy of violence detection in long time sequence video, a two-stage violence time sequence location method based on DC3D network model is proposed in this paper. In the video preprocessing stage, this paper adopts the method of generating video at multiple scales. In the first stage, we first use a large number of labeled datasets for pre-training to obtain a C3D network model for generating candidate videos and finally filter out meaningless background videos. In the second stage, we use the deconvolution method to identify the candidate video accurately to the frame level, so as to determine the specific time of violence. In the first stage, this method can improve the overall accuracy by generating candidate videos. DC3D network model can accurately locate the time of violence to the level of frame. The experimental results show that the proposed method can quickly retrieve and locate the violent fighting behavior in the surveillance video under the long time sequence video. The research results of this paper can provide convenience for the surveillance personnel to quickly retrieve and locate the target segment in a large amount of video data.







Similar content being viewed by others
Data availability
The authors declare that all the experimental data in this paper are true, valid and available. The authors declare that all experimental data are obtained from detailed experiments.
Change history
12 July 2022
A Correction to this paper has been published: https://doi.org/10.1007/s11227-022-04699-7
References
Chen X, Li M, Zhong H, Ma Y, Hsu C-H (2022) DNNOff: offloading DNN-based intelligent IoT applications in mobile edge computing. IEEE Transa Ind Inform 18(4):2820–2829
Chen X, Zhang J, Lin B, Chen Z, Wolter K, Min G (2022) Energy-efficient offloading for DNN-based smart IoT systems in cloud-edge environments. IEEE Trans. Parallel Distrib. Syst. 33(3):683–697
Chen X, Junqin H, Chen Z, Lin B, Xiong N, Min G (2022) A reinforcement learning empowered feedback control system for industrial internet of things. IEEE Trans Ind Inf 18(4):2724–2733
Chen X, Yang L, Chen Z, Min G, Zheng X, Rong C Resource allocation with workload-time windows for cloud-based software services: a deep reinforcement learning approach. IEEE Trans Cloud Comput, doi: https://doi.org/10.1109/TCC.2022.3169157.
Huang G, Chen X, Zhang Y, Zhang X (2012) Towards architecture-based management of platforms in the cloud. Front Comput Sci 6(4):388–397
Huang G, Luo C, Wu K, Ma Y, Zhang Y, Liu X (2019) Software-defined infrastructure for decentralized data lifecycle governance: principled design and open challenges. In: IEEE international conference on distributed computing systems
Ye O, Huang P, Zhang Z, Zheng Y et al (2021) Multiview learning with robust double-sided twin SVM. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3088519
Ye Q, Li Z, Fu L et al (2019) Nonpeaked discriminant analysis. IEEE Trans Neural Netw Learn Syst 30(12):3818–3832
Fu L, Li Z, Ye Q et al (2022) Learning robust discriminant subspace based on joint L2, p- and L2, s-norm distance metrics. IEEE Trans Neural Netw Learn Syst. 33(1):130–144. https://doi.org/10.1109/TNNLS.2020.3027588
Lan Z, Zhu Y, Hauptmann AG, et al (2017) Deep local video feature for action recognition[C]. In: 2017 IEEE conference computer vision and pattern recognition workshops (CVPRW), Honolulu, HI, pp 1219–1225
Gaidon A, Harchaoui Z, Schmid C (2013) Temporal localization of actions with actoms. In: TPAMI
Singh G, Cuzzolin F (2016) Untrimmed classification foractivity detection: submission to activitynet challenge. In: CVPR activity net workshop
Wang L, Qiao Y, Tang X (2014) Action recognition and detection by combining motion and appearance features. In: ECCV THUMOS workshop
Yeung S, Russakovsky O, Jin N, Andriluka M, Mori G, Fei-Fei L (2015) Every moment counts: Dense detailed labeling of actions in complex videos. ar Xiv preprint arXiv: 1507.05738.
Shou Z, Wang D, Chang SF (2016) Temporal action localization in untrimmed videos via multi-stage CNNs[C]// computer vision and pattern recognition. IEEE, 1049–1058.
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2016) Deep end2end voxel2voxel prediction. In: CVPR workshop on deep learning in computer vision
Shou Z, Chan J, Zareian A, et al. (2017) CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos[J]
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: ICCV
Kim K, Chalidabhongse TH, Harwood D (2005) Real-time foreground-background segmentation using codebook model[J]. Real-Time Imag 11(3):172–185
Blunsden SJ, Fisher RB (2010) The BEHAVE video dataset: ground truthed video for multi-person behavior classification. Annals of the BMVA 4:1–12
Huang K, Tan T (2010) Vs-star: a visual interpretation system for visual surveillance[J]. Pattern Recogn Lett 31(14):2265–2285
Shah M, Javed O, Shafique K (2007) Automated visual surveillance in realistic scenarios[M]. IEEE Computer Society Press, New Jersey
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition[J]. Comput Sci
Annane D, Chevrolet JC, Chevret S, Raphael JC (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 1(4):568–576
Grabner H, Bischof H (2006) On-line boosting and vision[C]// computer vision and pattern recognition, In: 2006 IEEE computer society conference on IEEE, pp 260–267.
Dong Z, Qin J, Wang Y (2016) Multi-stream deep networks for person to person violence detection in videos. In: Tan T, Li X, Chen X, Zhou J, Yang J, Cheng H (eds) CCPR 2016. CCIS, vol 662. Springer, Singapore, pp 517–531. https://doi.org/10.1007/978-981-10-3002-443
Richard A, Gall J (2016) Temporal action detection using astatistical language model. In: CVPR
Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: CVPR
Karaman S, Seidenari L, Bimbo AD (2014) Fast saliency based pooling of fisher encoded dense trajectories. In: ECCV THUMOS workshop.
Yeung S, Russakovsky O, Mori G, Fei-Fei L (2016) End-to-end learning of action detection from frame glimpses in videos. In: CVPR
Funding
This paper is supported by Special Fund for Science and Technology Innovation Strategy of Guangdong Province in 2021 (Special Fund for Climbing Plan) (pdjh2021a0944), Special Projects in Key Fields of Colleges and Universities in Guangdong Province in 2021(2021ZDZX1093), Dongguan Science and Technology of Social Development Program in 2021 (20211800900252), Special fund for electronic information engineering technology specialty group of national double high program of Dongguan Polytechnic in 2021 (ZXYYD001), Special fund for electronic information engineering technology specialty group of national double high program of Dongguan Polytechnic in 2021 (ZXF002), Special fund for electronic information engineering technology specialty group of national double high program of Dongguan Polytechnic in 2022 (ZXB202203), Special fund for electronic information engineering technology specialty group of national double high program of Dongguan Polytechnic in 2022 (ZXC202201) and Special fund for electronic information engineering technology specialty group of national double high program of Dongguan Polytechnic in 2022 (ZXD202204).
Author information
Authors and Affiliations
Contributions
WQ was involved in the conceptualization, methodology, validation, investigation, writing and funding acquisition. TZ contributed to the formal analysis, software, resources and visualization. JL contributed to the software, resources, methodology, validation, writing—review and editing, supervision and funding acquisition. JL helped in the data curation, formal analysis, validation,writing and funding acquisition.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Informed consent
Any participants (or their guardians if unable to give informed consent, or next of kin, if deceased) who may be identifiable through the manuscript (such as a case report) have been given an opportunity to review the final manuscript and have provided written consent to publish.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: In this article the affiliation details for author Ting Zhu were incorrectly assigned.
Rights and permissions
About this article
Cite this article
Qu, W., Zhu, T., Liu, J. et al. A time sequence location method of long video violence based on improved C3D network. J Supercomput 78, 19545–19565 (2022). https://doi.org/10.1007/s11227-022-04649-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04649-3