Key frame extraction method with global information balance

Shen, Xiaohu; An, Jubai; Teng, Zhisong

doi:10.1007/s11042-023-16386-7

Key frame extraction method with global information balance

Published: 08 August 2023

Volume 83, pages 21905–21928, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

248 Accesses
Explore all metrics

Abstract

Key frame extraction can provide evidence for traffic violation detection, which is essential to support administrative punishment. However, the existing key frame extraction methods failed to model context information in complex semantic cases, such as failing to yield to pedestrian. To address this problem, we have proposed a key frame extraction model with global information balance (GIB), an intelligent vehicle violation screenshot method based on balancing the global information of video frames. The proposed GIB extracts three screenshots from the videos of vehicles failing to yield to pedestrians at crosswalks without signals. First, the proposed GIB defines the extraction of global information based on trajectories, comprising spatial structure and motion attributes as feature factors. Then, based on semantic correlation analysis for global information, relational entity filtering is implemented to avoid the interference of non-key entities and improve the effectiveness of the features. Finally, a search and pruning policy prioritizing mutual information is designed to maximize the global information entropy among preserved nodes to ensure the optimal prediction solution in case of a large global search solution space. The policy is implemented in the key frame prediction task in the Seq2Seq model based on the attention mechanism. The results of several experiments confirm the superior performance of the proposed method compared to conventional methods in terms of the evaluation of frame-time differential, perceptual hashing, and subjective scoring. For example, the perceptual hashing values of the proposed method were 10.5% and 6.7% greater than semantic correlation extraction and image similarity extraction, respectively, which are baseline methods based on local information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Key frame extraction for video summarization using local description and repeatability graph clustering

Article 02 November 2018

Grid sampling based hypergraph matching technique for multiple objects tracking in video frames

Article 10 November 2023

Key Frames Extraction Based on Local Features for Efficient Video Summarization

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Code availability

Some or all data, models, or code generated or used during the study are available from the corresponding author by request.

References

Ahmad F, Li N, Tahir M (2019) An improved D-CNN based on YOLOv3 for pedestrian detection. In: IEEE 4th international conference on signal and image processing (ICSIP), Wuxi, pp 405–409. https://doi.org/10.1109/SIPROCESS.2019.8868839
Chamasemani FF, Affendey LS, Mustapha N, Khalid F (2015) A study on surveillance video abstraction techniques, International conference on control system, computing and engineering (ICCSCE), Penang, pp 470–475. https://doi.org/10.1109/ICCSCE.2015.7482231
Chao G, Tsai Y, Jeng S (2010) Augmented 3-D keyframe extraction for surveillance videos. IEEE Trans Circuits Syst 20(11):1395–1408. https://doi.org/10.1109/TCSVT.2010.2087491
Article Google Scholar
Damnjanovic U, Fernandez V, Izquierdo E, Martinez JM (2008) Event detection and clustering for surveillance video summarization. In: 2008 ninth international workshop on image analysis for multimedia interactive services, Klagenfurt, pp 63–66. https://doi.org/10.1109/WIAMIS.2008.53
Dang L, Nguyen GT, Cao T (2020) Object tracking using improved deep_sort_YOLOv3 architecture. ICIC Express Lett 14(10):961–969. https://doi.org/10.24507/icicel.14.10.961
Article Google Scholar
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2017) Long-term recurrent convolutional networks for visual recognition and description. In: Computer vision and pattern recognition, Hawaii, pp 677–691. https://doi.org/10.1590/S1676-06032009000300007
Dow CR, Ngo HH, Lee LH, Lai PY, Wang KC, Bui VT (2020) A crosswalk pedestrian recognition system by using deep learning and zebra crossing recognition techniques. Softw Pract Exper 50(5):630–644. https://doi.org/10.1002/spe.2742
Article Google Scholar
Ge R, Wang ZH, Xu X, Ji Y, Liu C, Gong SR (2017) Action recognition with hierarchical convolutional neural networks features and bidirectional long short-term memory model. Control Theory Appl 34(06):790–796
Google Scholar
Ji Z, Su Y, Qian R, Ma J (2010) Surveillance video summarization based on moving object detection and trajectory extraction. In: 2010 2nd international conference on signal processing systems, Dalian, CN, pp V2-250–V2-253. https://doi.org/10.1109/ICSPS.2010.5555504
Ji J, Yao Y, Wei J, Quan Y (2019) Perceptual hashing for SAR image segmentation. Int J Remote Sens 40(9–10):3672–3688
Article Google Scholar
Kajabad EN, Ivanov SV, Ramezanzade N (2020) Customer detection and tracking by deep learning and Kalman filter algorithms. In: 2020 international conference on electrical, communication, and computer engineering (ICECCE), Istanbul, pp 1–6. https://doi.org/10.1109/ICECCE49384.2020.9179224
Kar A, Rai N, Sikka K, Sharma G (2017) AdaScan: adaptive scan pooling in deep convolutional neural networks for human action recognition in videos. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, pp 5699–5708. https://doi.org/10.1109/CVPR.2017.604
Lan Z, Shuai D, Li YC (2016) An algorithm of key frame extraction in road monitoring video based on correlation coefficient. J Chongqing Jiaotong Univ (Nat Sci) 35(01):129–133+176
Google Scholar
Laroca R, Severo E, Zanlorensi LA, Oliveira LS, Goncalves GR, Schwartz WR, Menotti D (2018) A robust real-time automatic license plate recognition based on the YOLO detector. In: Proc. International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, pp 1–10. https://doi.org/10.1109/IJCNN.2018.8489629
Lee S, Kim HG, Ro YM (2020) BMAN: bidirectional multi-scale aggregation networks for abnormal event detection. IEEE Trans Image Process 29:2395–2408. https://doi.org/10.1109/TIP.2019.2948286
Article ADS Google Scholar
Li Y, Lan C, Xing J, Zeng W, Yuan C, Liu J (2016) Online human action detection using joint classification-regression recurrent neural networks. In: European conference on computer vision, Amsterdam, pp 203–220. https://doi.org/10.1007/978-3-319-46478-7_13
Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Computer vision and pattern recognition, Hawaii, pp 936–944. https://doi.org/10.1109/CVPR.2017.106
Liu Y, Zhang S, Wang R, Zhang Y (2013) Key frame extraction based on the visual attention model for lane surveillance video. J Image Graph 18(08):933–943
Google Scholar
Luo Y, Zhou H, Qin T, Chen X, Yun M (2018) Key frame extraction of surveillance video based on moving object detection and image similarity. Pattern Recogn Img Anal 28(2):225–231. https://doi.org/10.1134/S1054661818020190
Article Google Scholar
Ma X, Chen X, Khokhar A, Dan S (2010) Motion trajectory-based video retrieval, classification, and summarization. In: Studies in computational intelligence. Springer, Berlin, Heidelberg, pp 53–82. https://doi.org/10.1007/978-3-642-12900-1_3
Chapter Google Scholar
Mahmoud KM, Ismail MA, Ghanem NM (2013) Vscan: an enhanced video summarization using density-based spatial clustering. In: International conference on image analysis and processing, Naples, pp 733–742. https://doi.org/10.1007/978-3-642-41181-6_74
Ministry of Housing and Urban-Rural Development of the People’s Republic of China. (2011) GB50688–2011, code for design of urban road traffic facility. https://www.gov.cn/zhengce/zhengceku/2019-08/20/content_5454346.htm. Accessed 20 Aug 2019
Nallapati R, Zhou B, Santos C, Gulcehre C (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of the 20th SIGNLL conference on computational natural language learning, pp 280–290. https://doi.org/10.18653/v1/K16-1028
Chapter Google Scholar
Raikwar SC, Bhatnagar C, Jalal AS (2014) A framework for key frame extraction from surveillance video. In: 2014 international conference on computer and communication technology (ICCCT). Allahabad, pp 297–300. https://doi.org/10.1109/ICCCT.2014.7001508
Chapter Google Scholar
Ran Z, Yao CW, Jin H, Zhu L, Zhang Q, Deng W (2015) Parallel key frame extraction for surveillance video Service in a Smart City. PLoS One 10(8):e0135694
Article Google Scholar
Sang D, Hung D (2019) YOLOv3-VD: a sparse network for vehicle detection using variational dropout. In: Proc. symposium on information and communication technology (SoICT), Guangzhou, pp 280–284. https://doi.org/10.1145/3368926.3369691
Shen XH, An JB, Teng ZS (2021) Recognition method of traffic violations based on complex interaction between multiple entities. Int J Intell Syst 36(9):5241–5263. https://doi.org/10.1002/int.22511
Article Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Neural information processing systems, Montreal, pp 568–576. https://doi.org/10.1002/14651858.CD001941.pub3
Song X, Sun L, Lei J, Tao D, Yuan G, Song M (2016) Event-based large scale surveillance video summarization. Neuro Comput 187(Apr 26):66–74. https://doi.org/10.1016/j.neucom.2015.07.131
Article Google Scholar
Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K (2015) Sequence to sequence -- video to text. In: 2015 IEEE international conference on computer vision (ICCV), Santiago, pp 4534–4542. https://doi.org/10.1109/ICCV.2015.515
Womg A, Shafiee MJ, Li F, Chwyl B (2018) Tiny SSD: a tiny single-shot detection deep convolutional neural network for real-time embedded object detection. In: 15th conference on computer and robot vision (CRV), Toronto, pp 95–101. https://doi.org/10.1109/CRV.2018.00023
Xia J, Wang J, Chen JM, Cui Z (2010) Key frame extraction of traffic video based on virtual detection line. J Suzhou Univ 30(02):1–5+11
Google Scholar
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Computer vision and pattern recognition, Boston, pp 842–850. https://doi.org/10.1109/ICCV.2015.515
Yan X, Gilani SZ, Qin H, Feng M, Zhang L, Mian A (2018) Deep keyframe detection in human action videos. arXiv preprint. https://doi.org/10.48550/arXiv.1804.10021
Yin YH, Li HF, Fu W (2020) Faster-YOLO: an accurate and faster object detection method. Sci Signal Process 102:102756. https://doi.org/10.1016/j.dsp.2020.102756
Article Google Scholar
Zhang J, Chen X, Li Y, Chen Y, Mou L (2021) Pedestrian detection algorithm based on improved Yolo v3. In: 2021 IEEE international conference on power, intelligent computing and systems (ICPICS), Shenyang, pp 180–183. https://doi.org/10.1109/ICPICS52425.2021.9524267
Zhang Y, Zhang J, Tao R (2021) Key frame extraction of surveillance video based on fractional Fourier transform. J Beijing Inst Technol 30(03):311–321
Google Scholar
Zhong MJ, Zhang YB (2019) A key frame extraction method of vehicle surveillance video based on visual saliency. Comput Technol Dev 29(06):164–169
Google Scholar
Zhong Q, Zhang Y, Zhang J (2020) Key frame extraction algorithm of motion video based on priori. IEEE Access 8:174424–174436
Article Google Scholar

Download references

Acknowledgments

The authors greatly appreciate the reviewers’ suggestions and the editor’s encouragement. The work is partially supported by the National Natural Science Foundation of China (Grant Number: 61976032) and the Applicable Innovation Project of the Ministry of Public Security of China (Grant Number: 2020YYCXHNST046).

Funding

This work was sponsored by the National Natural Science Foundation of China (Grant Number: 61976032) and the Applicable Innovation Project of the Ministry of Public Security of China (Grant Number: 2020YYCXHNST046).

Author information

Authors and Affiliations

Department of Forensic Science and Technology, Jiangsu Police Institute, Nanjing, 210031, China
Shen Xiaohu
College of Information Science and Technology, Dalian Maritime University, Dalian, 116026, Liaoning, China
Shen Xiaohu & Jubai An
Nanjing Public Security Sub-Bureau of Jiangning, Nanjing, 211100, China
Zhisong Teng

Authors

Shen Xiaohu
View author publications
You can also search for this author inPubMed Google Scholar
Jubai An
View author publications
You can also search for this author inPubMed Google Scholar
Zhisong Teng
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Shen Xiaohu: Conceptualization, Methodology, Software, Writing-Original draft preparation.

An Jubai: Writing-Reviewing and Editing, Supervision, Funding acquisition.

Zhisong Teng: Validation, Investigation, Resources, Data curation.

Corresponding author

Correspondence to Shen Xiaohu.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Written informed consent for the publication of this paper was obtained from all authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shen, X., An, J. & Teng, Z. Key frame extraction method with global information balance. Multimed Tools Appl 83, 21905–21928 (2024). https://doi.org/10.1007/s11042-023-16386-7

Download citation

Received: 28 August 2021
Revised: 16 July 2023
Accepted: 19 July 2023
Published: 08 August 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16386-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Key frame extraction method with global information balance

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Key frame extraction for video summarization using local description and repeatability graph clustering

Grid sampling based hypergraph matching technique for multiple objects tracking in video frames

Key Frames Extraction Based on Local Features for Efficient Video Summarization

Data availability

Code availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now