An iteratively reweighting algorithm for dynamic video summarization

Dong, Pei; Xia, Yong; Wang, Shanshan; Zhuo, Li; Feng, David Dagan

doi:10.1007/s11042-014-2126-8

An iteratively reweighting algorithm for dynamic video summarization

Published: 27 June 2014

Volume 74, pages 9449–9473, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Pei Dong^1,2,
Yong Xia^1,3,
Shanshan Wang^1,4,5,
Li Zhuo² &
…
David Dagan Feng¹

494 Accesses
7 Citations
12 Altmetric
Explore all metrics

Abstract

Information explosion has imposed unprecedented challenges on the conventional ways of video data consumption. Hence providing condensed and meaningful video summary to viewers has been recognized as a beneficial and attractive research in the multimedia community in recent years. Analyzing both the visual and textual modalities proves essential for an automatic video summarizer to pick up important contents from a video. However, most established studies in this direction either use heuristic rules or rely on simple ways of text analysis. This paper proposes an iteratively reweighting dynamic video summarization (IRDVS) algorithm based on the joint and adaptive use of the visual modality and accompanying subtitles. The proposed algorithm takes advantage of our developed SEmantic inDicator of videO seGment (SEDOG) feature for exploring the most representative concepts for describing the video. Meanwhile, the iteratively reweighting scheme effectively updates the dynamic surrogate of the original video by combining the high-level features in an adaptive manner. The proposed algorithm has been compared to four state-of-the-art video summarization approaches, namely the speech transcript-based (STVS) algorithm, attention model-based (AMVS) algorithm, sparse dictionary selection-based (DSVS) algorithm and heterogeneity image patch index-based (HIPVS) algorithm, on different video genres, including documentary, movie and TV news. Our results show that the proposed IRDVS algorithm can produce summarized videos with better quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Movie Description

Article Open access 25 January 2017

Notes

References

(2013) Here’s to eight great years. YouTube Blog. http://youtube-global.blogspot.com/2013/05/heres-to-eight-great-years.html.
Ahmad S (1991) VISIT: A neural model of covert visual attention. In: Advances in Neural Information Processing Systems (NIPS), vol 4. pp 420–427.
Alatan AA, Akansu A, Wolf W (2001) Multi-modal dialog scene detection using hidden Markov models for content-based multimedia indexing. Multimed Tools and Appl 14(2):137–151
Article MATH Google Scholar
Almeida J, Leite NJ, Torres RS (2012) VISON: video summarization for online applications. Pattern Recogn Lett 33(4):397–409
Article Google Scholar
Almeida J, Leite NJ, Torres RS (2013) Online video summarization on compressed domain. J Vis Commun Image Represent 24(6):729–738
Article Google Scholar
Bai L, Hu Y, Lao S, Smeaton AF, O’Connor NE (2010) Automatic summarization of rushes video using bipartite graphs. Multimed Tools and Appl 49(1):63–80
Article Google Scholar
Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans on Pattern Anal and Mach Intell 35(1):185–207
Article MathSciNet Google Scholar
Chen BW, Bharanitharan K, Wang JC, Fu Z, Wang JF (2014) Novel mutual information analysis of attentive motion entropy algorithm for sports video summarization. In: Huang YM, Chao HC, Deng DJ, Park JJ (eds) Advanced Technologies, Embedded and Multimedia for Human-centric Computing, vol 260. Lecture Notes in Electrical Engineering. Springer, Netherlands, pp 1031–1042
Chapter Google Scholar
Chen B-W, Wang J-C, Wang J-F (2009) A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Trans on Multimed 11(2):295–312
Article Google Scholar
Chênes C, Chanel G, Soleymani M, Pun T (2013) Highlight detection in movie scenes through inter-users, physiological linkage. In: Ramzan N, Zwol R, Lee J-S, Clüver K, Hua X-S (eds) Social Media Retrieval. Computer Communications and Networks, Springer London, pp 217–237
Chapter Google Scholar
Choudary C, Liu T (2007) Summarization of visual content in instructional videos. IEEE Trans on Multimed 9(7):1443–1455
Article Google Scholar
Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans on Multimed 14(1):66–75
Article Google Scholar
Dang CT, Radha H (2014) Heterogeneity image patch index and its application to consumer video summarization. IEEE Trans on Image Process 23(6):2704–2718
de Avila SEF, Lopes APB, da Luz JA, de Albuquerque AA (2011) VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68
Article Google Scholar
Dong P, Wang Z, Zhuo L, Feng DD (2010) Video summarization with visual and semantic features. In: Qiu G, Lam K-M, Kiya H, Xue X, Kuo CCJ, Lew MS (eds) Advances in Multimedia Information Processing - Pacific Rim Conference on Multimedia 2010, Part I. Lecture Notes in Computer Science, vol 6297. Springer, Berlin, pp 203–214
Google Scholar
Ejaz N, Mehmood I, Wook Baik S (2013) Efficient visual attention based framework for extracting key frames from videos. Signal Process Image Commun 28(1):34–44
Article Google Scholar
Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. J Vis Commun Image Represent 23(7):1031–1040
Article Google Scholar
Ekin A, Tekalp AM, Mehrotra R (2003) Automatic soccer video analysis and summarization. IEEE Trans Image Process 12(7):796–807
Article Google Scholar
Evangelopoulos G, Rapantzikos K, Potamianos A, Maragos P, Zlatintsi A, Avrithis Y (2008) Movie summarization based on audiovisual saliency detection. In: Proceedings of the 15th IEEE International Conference on Image Processing (ICIP), 12–15 Oct. 2008. pp 2528–2531.
Evangelopoulos G, Zlatintsi A, Potamianos A, Maragos P, Rapantzikos K, Skoumas G, Avrithis Y (2013) Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans on Multimed 15(7):1553–1568
Article Google Scholar
Evangelopoulos G, Zlatintsi A, Skoumas G, Rapantzikos K, Potamianos A, Maragos P, Avrithis Y (2009) Video event detection and summarization using audio, visual and text saliency. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp 3553–3556.
Fersini E, Sartori F (2012) Semantic storyboard of judicial debates: a novel multimedia summarization environment. Program: Elec Libr Inf Syst 46(2):119–219
Article Google Scholar
Garestier F, Le Toan T (2010) Estimation of the backscatter vertical profile of a pine forest using single baseline P-band (Pol-)InSAR data. IEEE Trans Geosci Remote Sens 48(9):3340–3348
Article Google Scholar
Hauptmann A, Yan R, Lin W-H, Christel M, Wactlar H (2007) Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news. IEEE Trans on Multimed 9(5):958–966
Hauptmann A, Yan R, Lin W-H (2007) How many high-level concepts will fill the semantic gap in news video retrieval? In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR), Amsterdam, The Netherlands. ACM, pp 627–634.
Hung M-H, Hsieh C-H (2008) Event detection of broadcast baseball videos. IEEE Trans on Circ and Syst for Video Technol 18(12):1713–1726
Article MathSciNet Google Scholar
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans on Pattern Anal and MachIntell 20(11):1254–1259
Article Google Scholar
James W (1890) The Principles of psychology. Harvard University Press.
Jiang Y-G, Bhattacharya S, Chang S-F, Shah M (2013) High-level event recognition in unconstrained videos. Int J Multimed Inf Retrieval 2(2):73–101
Article Google Scholar
Jiang Y-G, Ngo C-W, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR), Amsterdam, The Netherlands. ACM, pp 494–501.
Jiang YG, Yang J, Ngo CW, Hauptmann AG (2010) Representations of keypoint-based semantic concept detection: a comprehensive study. IEEE Trans on Multimed 12(1):42–53
Kennedy L, Hauptmann A (2006) LSCOM lexicon definitions and annotations (version 1.0). DTO Challenge workshop on large scale concept ontology for multimedia. Columbia University ADVENT technical report.
Kim J-N, Choi T-S (2000) A fast full-search motion-estimation algorithm using representative pixels and adaptive matching scan. IEEE Trans on Circ and Syst for Video Technol 10(7):1040–1048
Article Google Scholar
Kleban J, Sarkar A, Moxley E, Mangiat S, Joshi S, Kuo T, Manjunath BS (2007) Feature fusion and redundancy pruning for rush video summarization. In: Proceedings of the international workshop on TRECVID video summarization (TVS), Augsburg, Bavaria, Germany. ACM, pp 84–88.
Knudsen EI (2007) Fundamental components of attention. Annu Rev Neurosci 30:57–78
Article Google Scholar
Koral KF, Yendiki A, Lin Q, Dewaraja YK, Fessler JA (2004) Determining total I-131 activity within a VoI using SPECT, a UHE collimator, OSEM, and a constant conversion factor. IEEE Trans Nucl Sci 51(3):611–618
Article Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). pp 2169–2178.
Lin L, Chen C, Shyu M-L, Chen S-C (2011) Weighted subspace filtering and ranking algorithms for video concept retrieval. IEEE Multimed 18(3):32–43
Article Google Scholar
Loui A, Luo J, Chang S-F, Ellis D, Jiang W, Kennedy L, Lee K, Yanagawa A (2007) Kodak’s consumer video benchmark data set: concept definition and annotation. In: Proceedings of the 9th ACM SIGMM international workshop on Multimedia Information Retrieval (MIR), Augsburg, Bavaria, Germany. ACM, pp 245–254.
Luo JB, Papin C, Costello K (2009) Towards extracting semantically meaningful key frames from personal video clips: from humans to computers. IEEE Trans on Circ and Syst for Video Technol 19(2):289–301
Article Google Scholar
Ma Y-F, Hua X-S, Lu L, Zhang H-J (2005) A generic framework of user attention model and its application in video summarization. IEEE Trans on Multimed 7(5):907–919
Article Google Scholar
Ma Y-F, Lu L, Zhang H-J, Li M (2002) A user attention model for video summarization. In: Proceedings of the Tenth ACM International Conference on Multimedia, Juan-les-Pins, France. ACM, pp 533–542.
Matos N, Pereira F (2008) Automatic creation and evaluation of MPEG-7 compliant summary descriptions for generic audiovisual content. Signal Process Image Commun 23(8):581–598
Article Google Scholar
Money AG, Agius H (2008) Video summarisation: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143
Article Google Scholar
Money AG, Agius H (2010) ELVIS: Entertainment-led video summaries. ACM Trans Multimed Comput Commun Appl 6(3):1–30
Article Google Scholar
Mylonas P, Spyrou E, Avrithis Y, Kollias S (2009) Using visual context and region semantics for high-level concept detection. IEEE Trans on Multimed 11(2):229–243
Article Google Scholar
Ngo C-W, Ma Y-F, Zhang H-J (2005) Video summarization and scene detection by graph modeling. IEEE Trans on Circ and Syst for Video Technol 15(2):296–305
Article Google Scholar
Over P, Smeaton AF, Awad G (2008) The TRECVID 2008 BBC rushes summarization evaluation. In: Proceedings of the 2nd ACM TRECVID video summarization workshop, Vancouver, British Columbia, Canada. ACM, pp 1–20.
Over P, Smeaton AF, Kelly P (2007) The TRECVID 2007 BBC rushes summarization evaluation pilot. In: Proceedings of the international workshop on TRECVID video summarization, Augsburg, Bavaria, Germany. ACM, pp 1–15.
Pal R, Ghosh A, Pal SK (2012) Video summarization and significance of content: a review. In: Handbook on soft computing for video surveillance. Chapman & Hall/CRC cryptography and network security series. Chapman and Hall/CRC, pp 79–102.
Pedersen T, Patwardhan S, Michelizzi J (2004) WordNet:Similarity - Measuring the relatedness of concepts. In: Proceedings of the nineteenth national conference on artificial intelligence (AAAI). pp 1024–1025.
Peng W-T, Chu W-T, Chang C-H, Chou C-N, Huang W-J, Chang W-Y, Hung Y-P (2011) Editing by viewing: automatic home video summarization by viewing behavior analysis. IEEE Trans on Multimed 13(3):539–550
Article Google Scholar
Posner MI, Petersen SE (1990) The attention system of the human brain. Annu Rev Neurosci 13:25–42
Article Google Scholar
Pritch Y, Rav-Acha A, Peleg S (2008) Nonchronological video synopsis and indexing. IEEE Trans on Pattern Anal and Mach Intell 30(11):1971–1984
Article Google Scholar
Rapantzikos K, Avrithis Y, Kollias S (2011) Spatiotemporal features for action recognition and salient event detection. Cogn Comput 3(1):167–184
Article Google Scholar
Ren J, Jiang J (2009) Hierarchical modeling and adaptive clustering for real-time summarization of rush videos. IEEE Trans on Multimed 11(5):906–917
Article Google Scholar
Tamrakar A, Ali S, Yu Q, Liu J, Javed O, Divakaran A, Cheng H, Sawhney H (2012) Evaluation of low-level features and their combinations for complex event detection in open source videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 16–21 June 2012. pp 3681–3688.
Tang S, Zheng Y-T, Wang Y, Chua TS (2012) Sparse ensemble learning for concept detection. IEEE Trans on Multimed 14(1):43–54
Article Google Scholar
Taskiran CM, Pizlo Z, Amir A, Ponceleon D, Delp EJ (2006) Automated video program summarization using speech transcripts. IEEE Trans on Multimed 8(4):775–791
Article Google Scholar
Tavassolipour M, Karimian M, Kasaei S (2014) Event detection and summarization in soccer videos using Bayesian network and copula. IEEE Trans on Circand Systfor Video Technol 24(2):291–304
Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1):1–37
Article Google Scholar
Viola PA, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Article Google Scholar
Wang M, Hong R, Li G, Zha Z-J, Yan S, Chua T-S (2012) Event driven web video summarization by tag localization and key-shot identification. IEEE Trans on Multimed 14(4):975–985
Article Google Scholar
Wang F, Ngo C-W (2012) Summarizing rushes videos by motion, object, and event understanding. IEEE Trans on Multimed 14(1):76–87
Article Google Scholar
Wang S, Zhu Y, Wu G, Ji Q (2013) Hybrid video emotional tagging using users’ EEG and video content. Multimed Tools and Appl doi:10.1007/s11042-013-1450-8
Wei X-Y, Jiang Y-G, Ngo C-W (2011) Concept-driven multi-modality fusion for video search. IEEE Trans on Circ and Syst for Video Technol 21(1):62–73
Article Google Scholar
Wu J, Rehg JM (2011) CENTRIST: a visual descriptor for scene categorization. IEEE Trans on Pattern Analand Mach Intell 33(8):1489–1501
Article Google Scholar
Xu G, Ma Y-F, Zhang H-J, Yang S-Q (2005) An HMM-based framework for video semantic analysis. IEEE Trans on Circ and Syst for Video Technol 15(11):1422–1433
Article Google Scholar
Yuan Z, Lu T, Wu D, Huang Y, Yu H (2011) Video summarization with semantic concept preservation. In: Proceedings of the 10th International Conference on Mobile and Ubiquitous Multimedia (ACM MUM), Beijing, China. ACM, 2107609, pp 109–112.
Zhu S, Ngo C-W, Jiang Y-G (2012) Sampling and ontologically pooling web images for visual concept learning. IEEE Trans on Multimed 14(4):1068–1078
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by the Australian Research Council grants, in part by the China Scholarship Council under Grant 2011623084, in part by the National Natural Science Foundation of China (No. 61372149, No. 61370189, No. 61100212), in part by the Program for New Century Excellent Talents in University (No. NCET-11-0892), in part by the Specialized Research Fund for the Doctoral Program of Higher Education (No. 20121103110017), in part by the Natural Science Foundation of Beijing (No. 4142009), in part by the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions (No. CIT&TCD201304036, No. CIT&TCD201404043), and in part by the Science and Technology Development Program of Beijing Education Committee (No. KM201410005002). We appreciate the anonymous reviewers for their constructive comments. Copyrights of images, videos and subtitles used in this work are the property of their respective owners.

Author information

Authors and Affiliations

Biomedical and Multimedia Information Technology (BMIT) Research Group, School of Information Technologies, The University of Sydney, Sydney, NSW 2006, Australia
Pei Dong, Yong Xia, Shanshan Wang & David Dagan Feng
Signal and Information Processing Laboratory, Beijing University of Technology, Beijing, 100124, China
Pei Dong & Li Zhuo
Shaanxi Key Lab of Speech & Image Information Processing, School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072, China
Yong Xia
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
Shanshan Wang
School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Shanshan Wang

Authors

Pei Dong
View author publications
You can also search for this author in PubMed Google Scholar
Yong Xia
View author publications
You can also search for this author in PubMed Google Scholar
Shanshan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhuo
View author publications
You can also search for this author in PubMed Google Scholar
David Dagan Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Pei Dong or Yong Xia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, P., Xia, Y., Wang, S. et al. An iteratively reweighting algorithm for dynamic video summarization. Multimed Tools Appl 74, 9449–9473 (2015). https://doi.org/10.1007/s11042-014-2126-8

Download citation

Received: 03 January 2014
Revised: 07 May 2014
Accepted: 26 May 2014
Published: 27 June 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s11042-014-2126-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An iteratively reweighting algorithm for dynamic video summarization

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

Video summarization using deep learning techniques: a detailed analysis and investigation

Movie Description

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An iteratively reweighting algorithm for dynamic video summarization

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

Video summarization using deep learning techniques: a detailed analysis and investigation

Movie Description

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation