Query-based video summarization with multi-label classification network

Hu, Weifeng; Zhang, Yu; Li, Yujun; Zhao, Jia; Hu, Xifeng; Cui, Yan; Wang, Xuejing

doi:10.1007/s11042-023-15126-1

Query-based video summarization with multi-label classification network

Published: 22 March 2023

Volume 82, pages 37529–37549, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Weifeng Hu ORCID: orcid.org/0000-0001-7821-5720¹,
Yu Zhang^1,2,
Yujun Li¹,
Jia Zhao¹,
Xifeng Hu¹,
Yan Cui³ &
…
Xuejing Wang¹

184 Accesses
3 Citations
Explore all metrics

Abstract

Generic video summarization algorithms are characterized by the uniqueness of the final video summary result, which cannot satisfy the different summary requirements of different users for the same video. This paper addresses the task of query-based video summarization, which takes users’ queries and long videos as inputs and aims to generate a query-based video summary. In this article, we propose a query-based video summarization algorithm with a multi-label classification network (MLC-SUM). Specifically, we treat video summarization as a target-based multi-label classification problem, and predict the correlation between video content and multi-concept labels by inputting convolutional features into a multi-layer perceptron, then use the cross-correlation of the labels to weight the predicted probability. Finally, we select the part of the video content with the highest relevance to the user’s query sentence as the video summary output. Experiments on three common datasets verify the effectiveness and superiority of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Watch Hours in Minutes: Summarizing Videos with User Intent

Video Summarization Using Fully Convolutional Sequence Networks

Multi-query Video Retrieval

Data availability

The datasets generated during the current study are available from the corresponding author on reasonable request.

References

Avila S, Lopes A, Luz AD et al (2011) VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68. https://doi.org/10.1016/j.patrec.2010.08.004
Article Google Scholar
Cizmeciler K, Erdem E, Erdem A (2022) Leveraging semantic saliency maps for query-specific video summarization[J]. Multimed Tools Appl 81(12):17457–17482
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE computer society conference on computer vision and pattern recognition. pp 886-893
Dataset, evaluation and a memory network-based approach (n.d.) . In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2127–2136. https://doi.org/10.1109/CVPR.2017.229
Ejaz N, Mehmood I, Baik SW (2013) Efficient visual attention based framework for extracting key frames from videos. Signal Process Image Commun 28(1):34–44. https://doi.org/10.1016/j.image.2012.10.002
Article Google Scholar
Fajtl J, Sokeh HS, Argyriou V et al (2019) Summarizing Videos with Attention. Proceedings of the Asian Conference on Computer Vision Workshops. pp 39–54 https://doi.org/10.1007/978-3-030-21074-84
Fakhar B, Kanan HR, Behrad A (2019) Event detection in soccer videos using unsupervised learning of spatiotemporal features based on pooled spatial pyramid model. Multimed Tools Appl 78(12):16995–17025
Article Google Scholar
Gong B, Chao WL, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. Adv Neural Inf Proces Syst 3:2069–2077
Google Scholar
Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user videos. European Conference on Computer Vision. pp 505–520. https://doi.org/10.1007/978-3-319-10584-0_33
Hussain T, Muhammad K, Ullah A, Cao Z, Baik SW, de Albuquerque VHC (2020) Cloud-assisted multiview video summarization using CNN and bidirectional LSTM. IEEE Trans Indust Inform 16(1):77–86
Article Google Scholar
Ji Z, Xiong K, Pang Y, Li X (2020) Video summarization with attention-based encoder-decoder networks. IEEE Trans Circuits Syst Video Technol 30(6):1709–1717
Article Google Scholar
Jiang Y, Cui K, Peng B and Xu C (2019) Comprehensive video understanding: video summarization with content-based video recommender design. 2019 IEEE/CVF international conference on computer vision workshop (ICCVW). pp 1562-1569 https://doi.org/10.1109/ICCVW.2019.00195
Kanmani M, Narasimhan V (2018) Swarm intelligent based contrast enhancement algorithm with improved visual perception for color images 77. pp 12701–12724
Kanmani M, Narasimhan V (2019) An optimal weighted averaging fusion strategy for remotely sensed images[J]. Multidim Syst Sign Process 30(4):1911–1935
Article MATH Google Scholar
Kanmani M, Narasimhan V (2019) Particle swarm optimisation aided weighted averaging fusion strategy for CT and MRI medical images[J]. Int J Biomed Eng Technol 31(3):278–291
Article Google Scholar
Kanmani M, Narasimhan V (2020) Optimal fusion aided face recognition from visible and thermal face images[J]. Multimed Tools Appl 79:25–26. https://doi.org/10.1007/s11042-020-08628-9
Article Google Scholar
Kwon H, Shim W, Cho M (2019) Temporal U-nets for video summarization with scene and action recognition. Proceedings of the 2019 IEEE/CVF international conference on computer vision workshop. pp 1541-1544 https://doi.org/10.1109/ICCVW.2019.00192
Lee YJ, Grauman K (2015) Predicting important objects for egocentric video summarization. Int J Comput Vis 114(1):38–55
Article MathSciNet Google Scholar
Li X, Zhao B, Lu X (2017) A general framework for edited video and raw video summarization. IEEE Trans Image Process 26(8):3652–3664
Article MathSciNet MATH Google Scholar
Madheswari K, Venkateswaran N (2015) Swarm intelligence based optimization in thermal image fusion using dual tree discrete wavelet transform[C] quantitative infrared thermography Asia. pp 1-20 https://doi.org/10.21611/qirt.2015.0101
Mahasseni B, Lam M and Todorovic S (2017) Unsupervised video summarization with adversarial LSTM networks. Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2982-2991 https://doi.org/10.1109/CVPR.2017.318
Ngo CW, Ma YF, Zhang HJ (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305. https://doi.org/10.1109/TCSVT.2004.841694
Article Google Scholar
Pfeioeer S, Lienhart R, Fischer S et al (1996) Abstracting digital movies automatically. J Vis Commun Image Represent 7(4):345–353
Article Google Scholar
Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. European Conference on Computer Vision. pp 540–555 https://doi.org/10.1007/978-3-319-10599-4_35
Rochan M, Ye L, Wang Y (2018) Video summarization using fully convolutional sequence networks. Proceedings of European conference on computer vision. pp 358-374. https://doi.org/10.1007/978-3-030-01258-8_22
Sharghi A, Gong B and Shah M (2016) Query-focused extractive video summarization. European conference on computer vision. pp 3-19. https://doi.org/10.1007/978-3-319-46484-8_1
Song Y, Vallmitjana J, Stent A (2015) TVSum: summarizing web videos using titles. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Uchihashi S, Foote J, Girgensohn A et al (1999) Video manga: generating semantically meaningful video summaries. Proceedings of the ACM international conference on multimedia. pp 383-392
Vasudevan AB, Gygli M, Volokitin A, Van Gool L (2017) Query-adaptive video summarization via quality aware relevance estimation. Proceedings of the 25th ACM international conference on multimedia. pp 582-590 https://doi.org/10.1145/3123266.3123297
Wang M, Hong R, Li G, Zha ZJ, Yan S, Chua TS (2012) Event driven web video summarization by tag localization and key-shot identification. IEEE Trans Multimed 14(4):975–985. https://doi.org/10.1109/TMM.2012.2185041
Article Google Scholar
Wolf W (1996) Key frame selection by motion analysis. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 2. pp 1228–1231
Xiao S, Zhao Z, Zhang Z et al (2020) Convolutional hierarchical attention network for query-focused video summarization. AAAI conference on artificial intelligence. pp 12426-12433 https://doi.org/10.1609/aaai.v34i07.6929
Xiao S, Zhao Z, Zhang Z et al (2020) Query-biased self-attentive network for query-focused video summarization. IEEE Trans Image Process 29:5889–5899. https://doi.org/10.1109/TIP.2020.2985868
Article MATH Google Scholar
Zeng M, Huang G Q (2011) Video summarization by motion analysis: using optical flow technique. Proceedings of the International Conference on Information Management, Innovation Management and Industrial Engineering, pp 205–208. https://doi.org/10.1109/ICIII.2011.332
Zhang Y (2021) Research on video summarization based on semantic content understanding. Shandong University, Thesis for Master Degree
Google Scholar
Zhang K, ChaoWL SF, Grauman K (2016) Video summarization with long short-term memory. European Conference on Computer Vision. pp 766–782 https://doi.org/10.1007/978-3-319-46478-7_47
Zhang Y, Kampffmeyer M, Liang X et al (2018) Query-conditioned three-player adversarial network for video summarization. arXiv preprint arXiv:1807.06677.
Zhong R, Wang R, Zou YZ et al (2021) Graph attention networks adjusted bi-LSTM for video summarization. IEEE Sign Proc Lett 28:663–667. https://doi.org/10.1109/LSP.2021.3066349
Article Google Scholar
Zhou K, Qiao Y and Xiang T (2017) Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. arXiv preprint arXiv:1801.00054
Zhuang Y, Rui Y, Huang TS et al (1988) Adaptive key frame extraction using unsupervised clustering. Proceedings of the international conference on image processing. pp 866-870. https://doi.org/10.1109/ICIP.1998.723655

Download references

Acknowledgments

The work is supported by National key Research and Development plan (2020YFC0832600).

Author information

Authors and Affiliations

School of Information Science and Engineering, Shandong University, Qingdao, 266200, China
Weifeng Hu, Yu Zhang, Yujun Li, Jia Zhao, Xifeng Hu & Xuejing Wang
State Grid of China Technology College, Jinan, 250002, China
Yu Zhang
Institute of Sociology, Chinese Academy of Social Sciences, Beijing, 100732, China
Yan Cui

Authors

Weifeng Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yujun Li
View author publications
You can also search for this author in PubMed Google Scholar
Jia Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xifeng Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Cui
View author publications
You can also search for this author in PubMed Google Scholar
Xuejing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weifeng Hu.

Ethics declarations

Conflict of interest

No conflicts of interests about the publication by all authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hu, W., Zhang, Y., Li, Y. et al. Query-based video summarization with multi-label classification network. Multimed Tools Appl 82, 37529–37549 (2023). https://doi.org/10.1007/s11042-023-15126-1

Download citation

Received: 16 December 2021
Revised: 25 November 2022
Accepted: 13 March 2023
Published: 22 March 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s11042-023-15126-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Query-based video summarization with multi-label classification network

Abstract

Access this article

Similar content being viewed by others

Watch Hours in Minutes: Summarizing Videos with User Intent

Video Summarization Using Fully Convolutional Sequence Networks

Multi-query Video Retrieval

Data availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Query-based video summarization with multi-label classification network

Abstract

Access this article

Similar content being viewed by others

Watch Hours in Minutes: Summarizing Videos with User Intent

Video Summarization Using Fully Convolutional Sequence Networks

Multi-query Video Retrieval

Data availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation