Skip to main content
Log in

Key frame extraction method with global information balance

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Key frame extraction can provide evidence for traffic violation detection, which is essential to support administrative punishment. However, the existing key frame extraction methods failed to model context information in complex semantic cases, such as failing to yield to pedestrian. To address this problem, we have proposed a key frame extraction model with global information balance (GIB), an intelligent vehicle violation screenshot method based on balancing the global information of video frames. The proposed GIB extracts three screenshots from the videos of vehicles failing to yield to pedestrians at crosswalks without signals. First, the proposed GIB defines the extraction of global information based on trajectories, comprising spatial structure and motion attributes as feature factors. Then, based on semantic correlation analysis for global information, relational entity filtering is implemented to avoid the interference of non-key entities and improve the effectiveness of the features. Finally, a search and pruning policy prioritizing mutual information is designed to maximize the global information entropy among preserved nodes to ensure the optimal prediction solution in case of a large global search solution space. The policy is implemented in the key frame prediction task in the Seq2Seq model based on the attention mechanism. The results of several experiments confirm the superior performance of the proposed method compared to conventional methods in terms of the evaluation of frame-time differential, perceptual hashing, and subjective scoring. For example, the perceptual hashing values of the proposed method were 10.5% and 6.7% greater than semantic correlation extraction and image similarity extraction, respectively, which are baseline methods based on local information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Code availability

Some or all data, models, or code generated or used during the study are available from the corresponding author by request.

References

  1. Ahmad F, Li N, Tahir M (2019) An improved D-CNN based on YOLOv3 for pedestrian detection. In: IEEE 4th international conference on signal and image processing (ICSIP), Wuxi, pp 405–409. https://doi.org/10.1109/SIPROCESS.2019.8868839

  2. Chamasemani FF, Affendey LS, Mustapha N, Khalid F (2015) A study on surveillance video abstraction techniques, International conference on control system, computing and engineering (ICCSCE), Penang, pp 470–475. https://doi.org/10.1109/ICCSCE.2015.7482231

  3. Chao G, Tsai Y, Jeng S (2010) Augmented 3-D keyframe extraction for surveillance videos. IEEE Trans Circuits Syst 20(11):1395–1408. https://doi.org/10.1109/TCSVT.2010.2087491

    Article  Google Scholar 

  4. Damnjanovic U, Fernandez V, Izquierdo E, Martinez JM (2008) Event detection and clustering for surveillance video summarization. In: 2008 ninth international workshop on image analysis for multimedia interactive services, Klagenfurt, pp 63–66. https://doi.org/10.1109/WIAMIS.2008.53

  5. Dang L, Nguyen GT, Cao T (2020) Object tracking using improved deep_sort_YOLOv3 architecture. ICIC Express Lett 14(10):961–969. https://doi.org/10.24507/icicel.14.10.961

    Article  Google Scholar 

  6. Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2017) Long-term recurrent convolutional networks for visual recognition and description. In: Computer vision and pattern recognition, Hawaii, pp 677–691. https://doi.org/10.1590/S1676-06032009000300007

  7. Dow CR, Ngo HH, Lee LH, Lai PY, Wang KC, Bui VT (2020) A crosswalk pedestrian recognition system by using deep learning and zebra crossing recognition techniques. Softw Pract Exper 50(5):630–644. https://doi.org/10.1002/spe.2742

    Article  Google Scholar 

  8. Ge R, Wang ZH, Xu X, Ji Y, Liu C, Gong SR (2017) Action recognition with hierarchical convolutional neural networks features and bidirectional long short-term memory model. Control Theory Appl 34(06):790–796

    Google Scholar 

  9. Ji Z, Su Y, Qian R, Ma J (2010) Surveillance video summarization based on moving object detection and trajectory extraction. In: 2010 2nd international conference on signal processing systems, Dalian, CN, pp V2-250–V2-253. https://doi.org/10.1109/ICSPS.2010.5555504

  10. Ji J, Yao Y, Wei J, Quan Y (2019) Perceptual hashing for SAR image segmentation. Int J Remote Sens 40(9–10):3672–3688

    Article  Google Scholar 

  11. Kajabad EN, Ivanov SV, Ramezanzade N (2020) Customer detection and tracking by deep learning and Kalman filter algorithms. In: 2020 international conference on electrical, communication, and computer engineering (ICECCE), Istanbul, pp 1–6. https://doi.org/10.1109/ICECCE49384.2020.9179224

  12. Kar A, Rai N, Sikka K, Sharma G (2017) AdaScan: adaptive scan pooling in deep convolutional neural networks for human action recognition in videos. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, pp 5699–5708. https://doi.org/10.1109/CVPR.2017.604

  13. Lan Z, Shuai D, Li YC (2016) An algorithm of key frame extraction in road monitoring video based on correlation coefficient. J Chongqing Jiaotong Univ (Nat Sci) 35(01):129–133+176

    Google Scholar 

  14. Laroca R, Severo E, Zanlorensi LA, Oliveira LS, Goncalves GR, Schwartz WR, Menotti D (2018) A robust real-time automatic license plate recognition based on the YOLO detector. In: Proc. International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, pp 1–10. https://doi.org/10.1109/IJCNN.2018.8489629

  15. Lee S, Kim HG, Ro YM (2020) BMAN: bidirectional multi-scale aggregation networks for abnormal event detection. IEEE Trans Image Process 29:2395–2408. https://doi.org/10.1109/TIP.2019.2948286

    Article  ADS  Google Scholar 

  16. Li Y, Lan C, Xing J, Zeng W, Yuan C, Liu J (2016) Online human action detection using joint classification-regression recurrent neural networks. In: European conference on computer vision, Amsterdam, pp 203–220. https://doi.org/10.1007/978-3-319-46478-7_13

  17. Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Computer vision and pattern recognition, Hawaii, pp 936–944. https://doi.org/10.1109/CVPR.2017.106

  18. Liu Y, Zhang S, Wang R, Zhang Y (2013) Key frame extraction based on the visual attention model for lane surveillance video. J Image Graph 18(08):933–943

    Google Scholar 

  19. Luo Y, Zhou H, Qin T, Chen X, Yun M (2018) Key frame extraction of surveillance video based on moving object detection and image similarity. Pattern Recogn Img Anal 28(2):225–231. https://doi.org/10.1134/S1054661818020190

    Article  Google Scholar 

  20. Ma X, Chen X, Khokhar A, Dan S (2010) Motion trajectory-based video retrieval, classification, and summarization. In: Studies in computational intelligence. Springer, Berlin, Heidelberg, pp 53–82. https://doi.org/10.1007/978-3-642-12900-1_3

    Chapter  Google Scholar 

  21. Mahmoud KM, Ismail MA, Ghanem NM (2013) Vscan: an enhanced video summarization using density-based spatial clustering. In: International conference on image analysis and processing, Naples, pp 733–742. https://doi.org/10.1007/978-3-642-41181-6_74

  22. Ministry of Housing and Urban-Rural Development of the People’s Republic of China. (2011) GB50688–2011, code for design of urban road traffic facility. https://www.gov.cn/zhengce/zhengceku/2019-08/20/content_5454346.htm. Accessed 20 Aug 2019

  23. Nallapati R, Zhou B, Santos C, Gulcehre C (2016) Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of the 20th SIGNLL conference on computational natural language learning, pp 280–290. https://doi.org/10.18653/v1/K16-1028

    Chapter  Google Scholar 

  24. Raikwar SC, Bhatnagar C, Jalal AS (2014) A framework for key frame extraction from surveillance video. In: 2014 international conference on computer and communication technology (ICCCT). Allahabad, pp 297–300. https://doi.org/10.1109/ICCCT.2014.7001508

    Chapter  Google Scholar 

  25. Ran Z, Yao CW, Jin H, Zhu L, Zhang Q, Deng W (2015) Parallel key frame extraction for surveillance video Service in a Smart City. PLoS One 10(8):e0135694

    Article  Google Scholar 

  26. Sang D, Hung D (2019) YOLOv3-VD: a sparse network for vehicle detection using variational dropout. In: Proc. symposium on information and communication technology (SoICT), Guangzhou, pp 280–284. https://doi.org/10.1145/3368926.3369691

  27. Shen XH, An JB, Teng ZS (2021) Recognition method of traffic violations based on complex interaction between multiple entities. Int J Intell Syst 36(9):5241–5263. https://doi.org/10.1002/int.22511

    Article  Google Scholar 

  28. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Neural information processing systems, Montreal, pp 568–576. https://doi.org/10.1002/14651858.CD001941.pub3

  29. Song X, Sun L, Lei J, Tao D, Yuan G, Song M (2016) Event-based large scale surveillance video summarization. Neuro Comput 187(Apr 26):66–74. https://doi.org/10.1016/j.neucom.2015.07.131

    Article  Google Scholar 

  30. Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K (2015) Sequence to sequence -- video to text. In: 2015 IEEE international conference on computer vision (ICCV), Santiago, pp 4534–4542. https://doi.org/10.1109/ICCV.2015.515

  31. Womg A, Shafiee MJ, Li F, Chwyl B (2018) Tiny SSD: a tiny single-shot detection deep convolutional neural network for real-time embedded object detection. In: 15th conference on computer and robot vision (CRV), Toronto, pp 95–101. https://doi.org/10.1109/CRV.2018.00023

  32. Xia J, Wang J, Chen JM, Cui Z (2010) Key frame extraction of traffic video based on virtual detection line. J Suzhou Univ 30(02):1–5+11

    Google Scholar 

  33. Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Computer vision and pattern recognition, Boston, pp 842–850. https://doi.org/10.1109/ICCV.2015.515

  34. Yan X, Gilani SZ, Qin H, Feng M, Zhang L, Mian A (2018) Deep keyframe detection in human action videos. arXiv preprint. https://doi.org/10.48550/arXiv.1804.10021

  35. Yin YH, Li HF, Fu W (2020) Faster-YOLO: an accurate and faster object detection method. Sci Signal Process 102:102756. https://doi.org/10.1016/j.dsp.2020.102756

    Article  Google Scholar 

  36. Zhang J, Chen X, Li Y, Chen Y, Mou L (2021) Pedestrian detection algorithm based on improved Yolo v3. In: 2021 IEEE international conference on power, intelligent computing and systems (ICPICS), Shenyang, pp 180–183. https://doi.org/10.1109/ICPICS52425.2021.9524267

  37. Zhang Y, Zhang J, Tao R (2021) Key frame extraction of surveillance video based on fractional Fourier transform. J Beijing Inst Technol 30(03):311–321

    Google Scholar 

  38. Zhong MJ, Zhang YB (2019) A key frame extraction method of vehicle surveillance video based on visual saliency. Comput Technol Dev 29(06):164–169

    Google Scholar 

  39. Zhong Q, Zhang Y, Zhang J (2020) Key frame extraction algorithm of motion video based on priori. IEEE Access 8:174424–174436

    Article  Google Scholar 

Download references

Acknowledgments

The authors greatly appreciate the reviewers’ suggestions and the editor’s encouragement. The work is partially supported by the National Natural Science Foundation of China (Grant Number: 61976032) and the Applicable Innovation Project of the Ministry of Public Security of China (Grant Number: 2020YYCXHNST046).

Funding

This work was sponsored by the National Natural Science Foundation of China (Grant Number: 61976032) and the Applicable Innovation Project of the Ministry of Public Security of China (Grant Number: 2020YYCXHNST046).

Author information

Authors and Affiliations

Authors

Contributions

Shen Xiaohu: Conceptualization, Methodology, Software, Writing-Original draft preparation.

An Jubai: Writing-Reviewing and Editing, Supervision, Funding acquisition.

Zhisong Teng: Validation, Investigation, Resources, Data curation.

Corresponding author

Correspondence to Shen Xiaohu.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Written informed consent for the publication of this paper was obtained from all authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, X., An, J. & Teng, Z. Key frame extraction method with global information balance. Multimed Tools Appl 83, 21905–21928 (2024). https://doi.org/10.1007/s11042-023-16386-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16386-7

Keywords