A time sequence location method of long video violence based on improved C3D network

Qu, Wen; Zhu, Ting; Liu, Jie; Li, Jianxin

doi:10.1007/s11227-022-04649-3

A time sequence location method of long video violence based on improved C3D network

Published: 25 June 2022

Volume 78, pages 19545–19565, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Wen Qu¹,
Ting Zhu¹,
Jie Liu² &
…
Jianxin Li²

380 Accesses
Explore all metrics

A Correction to this article was published on 12 July 2022

This article has been updated

Abstract

This paper mainly studies the retrieval and location of violence in long time sequence video. Aiming at the low accuracy of violence detection in long time sequence video, a two-stage violence time sequence location method based on DC3D network model is proposed in this paper. In the video preprocessing stage, this paper adopts the method of generating video at multiple scales. In the first stage, we first use a large number of labeled datasets for pre-training to obtain a C3D network model for generating candidate videos and finally filter out meaningless background videos. In the second stage, we use the deconvolution method to identify the candidate video accurately to the frame level, so as to determine the specific time of violence. In the first stage, this method can improve the overall accuracy by generating candidate videos. DC3D network model can accurately locate the time of violence to the level of frame. The experimental results show that the proposed method can quickly retrieve and locate the violent fighting behavior in the surveillance video under the long time sequence video. The research results of this paper can provide convenience for the surveillance personnel to quickly retrieve and locate the target segment in a large amount of video data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-stream framework using spatial–temporal collaboration learning networks for violence and non-violence classification in complex video environments

Article 16 January 2025

A multi-stream CNN for deep violence detection in video sequences using handcrafted features

Article 26 July 2021

Feature Fusion Based Deep Spatiotemporal Model for Violence Detection in Videos

Data availability

The authors declare that all the experimental data in this paper are true, valid and available. The authors declare that all experimental data are obtained from detailed experiments.

Change history

12 July 2022
A Correction to this paper has been published: https://doi.org/10.1007/s11227-022-04699-7

References

Chen X, Li M, Zhong H, Ma Y, Hsu C-H (2022) DNNOff: offloading DNN-based intelligent IoT applications in mobile edge computing. IEEE Transa Ind Inform 18(4):2820–2829
Article Google Scholar
Chen X, Zhang J, Lin B, Chen Z, Wolter K, Min G (2022) Energy-efficient offloading for DNN-based smart IoT systems in cloud-edge environments. IEEE Trans. Parallel Distrib. Syst. 33(3):683–697
Article Google Scholar
Chen X, Junqin H, Chen Z, Lin B, Xiong N, Min G (2022) A reinforcement learning empowered feedback control system for industrial internet of things. IEEE Trans Ind Inf 18(4):2724–2733
Article Google Scholar
Chen X, Yang L, Chen Z, Min G, Zheng X, Rong C Resource allocation with workload-time windows for cloud-based software services: a deep reinforcement learning approach. IEEE Trans Cloud Comput, doi: https://doi.org/10.1109/TCC.2022.3169157.
Huang G, Chen X, Zhang Y, Zhang X (2012) Towards architecture-based management of platforms in the cloud. Front Comput Sci 6(4):388–397
Article MathSciNet Google Scholar
Huang G, Luo C, Wu K, Ma Y, Zhang Y, Liu X (2019) Software-defined infrastructure for decentralized data lifecycle governance: principled design and open challenges. In: IEEE international conference on distributed computing systems
Ye O, Huang P, Zhang Z, Zheng Y et al (2021) Multiview learning with robust double-sided twin SVM. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3088519
Article Google Scholar
Ye Q, Li Z, Fu L et al (2019) Nonpeaked discriminant analysis. IEEE Trans Neural Netw Learn Syst 30(12):3818–3832
Article MathSciNet Google Scholar
Fu L, Li Z, Ye Q et al (2022) Learning robust discriminant subspace based on joint L2, p- and L2, s-norm distance metrics. IEEE Trans Neural Netw Learn Syst. 33(1):130–144. https://doi.org/10.1109/TNNLS.2020.3027588
Article MathSciNet Google Scholar
Lan Z, Zhu Y, Hauptmann AG, et al (2017) Deep local video feature for action recognition[C]. In: 2017 IEEE conference computer vision and pattern recognition workshops (CVPRW), Honolulu, HI, pp 1219–1225
Gaidon A, Harchaoui Z, Schmid C (2013) Temporal localization of actions with actoms. In: TPAMI
Singh G, Cuzzolin F (2016) Untrimmed classification foractivity detection: submission to activitynet challenge. In: CVPR activity net workshop
Wang L, Qiao Y, Tang X (2014) Action recognition and detection by combining motion and appearance features. In: ECCV THUMOS workshop
Yeung S, Russakovsky O, Jin N, Andriluka M, Mori G, Fei-Fei L (2015) Every moment counts: Dense detailed labeling of actions in complex videos. ar Xiv preprint arXiv: 1507.05738.
Shou Z, Wang D, Chang SF (2016) Temporal action localization in untrimmed videos via multi-stage CNNs[C]// computer vision and pattern recognition. IEEE, 1049–1058.
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2016) Deep end2end voxel2voxel prediction. In: CVPR workshop on deep learning in computer vision
Shou Z, Chan J, Zareian A, et al. (2017) CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos[J]
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: ICCV
Kim K, Chalidabhongse TH, Harwood D (2005) Real-time foreground-background segmentation using codebook model[J]. Real-Time Imag 11(3):172–185
Article Google Scholar
Blunsden SJ, Fisher RB (2010) The BEHAVE video dataset: ground truthed video for multi-person behavior classification. Annals of the BMVA 4:1–12
Google Scholar
Huang K, Tan T (2010) Vs-star: a visual interpretation system for visual surveillance[J]. Pattern Recogn Lett 31(14):2265–2285
Article Google Scholar
Shah M, Javed O, Shafique K (2007) Automated visual surveillance in realistic scenarios[M]. IEEE Computer Society Press, New Jersey
Book Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition[J]. Comput Sci
Annane D, Chevrolet JC, Chevret S, Raphael JC (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 1(4):568–576
Google Scholar
Grabner H, Bischof H (2006) On-line boosting and vision[C]// computer vision and pattern recognition, In: 2006 IEEE computer society conference on IEEE, pp 260–267.
Dong Z, Qin J, Wang Y (2016) Multi-stream deep networks for person to person violence detection in videos. In: Tan T, Li X, Chen X, Zhou J, Yang J, Cheng H (eds) CCPR 2016. CCIS, vol 662. Springer, Singapore, pp 517–531. https://doi.org/10.1007/978-981-10-3002-443
Chapter Google Scholar
Richard A, Gall J (2016) Temporal action detection using astatistical language model. In: CVPR
Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: CVPR
Karaman S, Seidenari L, Bimbo AD (2014) Fast saliency based pooling of fisher encoded dense trajectories. In: ECCV THUMOS workshop.
Yeung S, Russakovsky O, Mori G, Fei-Fei L (2016) End-to-end learning of action detection from frame glimpses in videos. In: CVPR

Download references

Funding

This paper is supported by Special Fund for Science and Technology Innovation Strategy of Guangdong Province in 2021 (Special Fund for Climbing Plan) (pdjh2021a0944), Special Projects in Key Fields of Colleges and Universities in Guangdong Province in 2021(2021ZDZX1093), Dongguan Science and Technology of Social Development Program in 2021 (20211800900252), Special fund for electronic information engineering technology specialty group of national double high program of Dongguan Polytechnic in 2021 (ZXYYD001), Special fund for electronic information engineering technology specialty group of national double high program of Dongguan Polytechnic in 2021 (ZXF002), Special fund for electronic information engineering technology specialty group of national double high program of Dongguan Polytechnic in 2022 (ZXB202203), Special fund for electronic information engineering technology specialty group of national double high program of Dongguan Polytechnic in 2022 (ZXC202201) and Special fund for electronic information engineering technology specialty group of national double high program of Dongguan Polytechnic in 2022 (ZXD202204).

Author information

Authors and Affiliations

Department of Information Engineering, Gannan University of Science and Technology, Ganzhou, 341000, Jiangxi, China
Wen Qu & Ting Zhu
School of Electronic Information, Dongguan Polytechnic, Dongguan, 523808, China
Jie Liu & Jianxin Li

Authors

Wen Qu
View author publications
You can also search for this author inPubMed Google Scholar
Ting Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Jie Liu
View author publications
You can also search for this author inPubMed Google Scholar
Jianxin Li
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

WQ was involved in the conceptualization, methodology, validation, investigation, writing and funding acquisition. TZ contributed to the formal analysis, software, resources and visualization. JL contributed to the software, resources, methodology, validation, writing—review and editing, supervision and funding acquisition. JL helped in the data curation, formal analysis, validation,writing and funding acquisition.

Corresponding author

Correspondence to Ting Zhu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Informed consent

Any participants (or their guardians if unable to give informed consent, or next of kin, if deceased) who may be identifiable through the manuscript (such as a case report) have been given an opportunity to review the final manuscript and have provided written consent to publish.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: In this article the affiliation details for author Ting Zhu were incorrectly assigned.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qu, W., Zhu, T., Liu, J. et al. A time sequence location method of long video violence based on improved C3D network. J Supercomput 78, 19545–19565 (2022). https://doi.org/10.1007/s11227-022-04649-3

Download citation

Accepted: 07 June 2022
Published: 25 June 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11227-022-04649-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A time sequence location method of long video violence based on improved C3D network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A multi-stream framework using spatial–temporal collaboration learning networks for violence and non-violence classification in complex video environments

A multi-stream CNN for deep violence detection in video sequences using handcrafted features

Feature Fusion Based Deep Spatiotemporal Model for Violence Detection in Videos

Data availability

Change history

12 July 2022

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now