Abstract
Information explosion has imposed unprecedented challenges on the conventional ways of video data consumption. Hence providing condensed and meaningful video summary to viewers has been recognized as a beneficial and attractive research in the multimedia community in recent years. Analyzing both the visual and textual modalities proves essential for an automatic video summarizer to pick up important contents from a video. However, most established studies in this direction either use heuristic rules or rely on simple ways of text analysis. This paper proposes an iteratively reweighting dynamic video summarization (IRDVS) algorithm based on the joint and adaptive use of the visual modality and accompanying subtitles. The proposed algorithm takes advantage of our developed SEmantic inDicator of videO seGment (SEDOG) feature for exploring the most representative concepts for describing the video. Meanwhile, the iteratively reweighting scheme effectively updates the dynamic surrogate of the original video by combining the high-level features in an adaptive manner. The proposed algorithm has been compared to four state-of-the-art video summarization approaches, namely the speech transcript-based (STVS) algorithm, attention model-based (AMVS) algorithm, sparse dictionary selection-based (DSVS) algorithm and heterogeneity image patch index-based (HIPVS) algorithm, on different video genres, including documentary, movie and TV news. Our results show that the proposed IRDVS algorithm can produce summarized videos with better quality.
Similar content being viewed by others
References
(2013) Here’s to eight great years. YouTube Blog. http://youtube-global.blogspot.com/2013/05/heres-to-eight-great-years.html.
Ahmad S (1991) VISIT: A neural model of covert visual attention. In: Advances in Neural Information Processing Systems (NIPS), vol 4. pp 420–427.
Alatan AA, Akansu A, Wolf W (2001) Multi-modal dialog scene detection using hidden Markov models for content-based multimedia indexing. Multimed Tools and Appl 14(2):137–151
Almeida J, Leite NJ, Torres RS (2012) VISON: video summarization for online applications. Pattern Recogn Lett 33(4):397–409
Almeida J, Leite NJ, Torres RS (2013) Online video summarization on compressed domain. J Vis Commun Image Represent 24(6):729–738
Bai L, Hu Y, Lao S, Smeaton AF, O’Connor NE (2010) Automatic summarization of rushes video using bipartite graphs. Multimed Tools and Appl 49(1):63–80
Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans on Pattern Anal and Mach Intell 35(1):185–207
Chen BW, Bharanitharan K, Wang JC, Fu Z, Wang JF (2014) Novel mutual information analysis of attentive motion entropy algorithm for sports video summarization. In: Huang YM, Chao HC, Deng DJ, Park JJ (eds) Advanced Technologies, Embedded and Multimedia for Human-centric Computing, vol 260. Lecture Notes in Electrical Engineering. Springer, Netherlands, pp 1031–1042
Chen B-W, Wang J-C, Wang J-F (2009) A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Trans on Multimed 11(2):295–312
Chênes C, Chanel G, Soleymani M, Pun T (2013) Highlight detection in movie scenes through inter-users, physiological linkage. In: Ramzan N, Zwol R, Lee J-S, Clüver K, Hua X-S (eds) Social Media Retrieval. Computer Communications and Networks, Springer London, pp 217–237
Choudary C, Liu T (2007) Summarization of visual content in instructional videos. IEEE Trans on Multimed 9(7):1443–1455
Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans on Multimed 14(1):66–75
Dang CT, Radha H (2014) Heterogeneity image patch index and its application to consumer video summarization. IEEE Trans on Image Process 23(6):2704–2718
de Avila SEF, Lopes APB, da Luz JA, de Albuquerque AA (2011) VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68
Dong P, Wang Z, Zhuo L, Feng DD (2010) Video summarization with visual and semantic features. In: Qiu G, Lam K-M, Kiya H, Xue X, Kuo CCJ, Lew MS (eds) Advances in Multimedia Information Processing - Pacific Rim Conference on Multimedia 2010, Part I. Lecture Notes in Computer Science, vol 6297. Springer, Berlin, pp 203–214
Ejaz N, Mehmood I, Wook Baik S (2013) Efficient visual attention based framework for extracting key frames from videos. Signal Process Image Commun 28(1):34–44
Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. J Vis Commun Image Represent 23(7):1031–1040
Ekin A, Tekalp AM, Mehrotra R (2003) Automatic soccer video analysis and summarization. IEEE Trans Image Process 12(7):796–807
Evangelopoulos G, Rapantzikos K, Potamianos A, Maragos P, Zlatintsi A, Avrithis Y (2008) Movie summarization based on audiovisual saliency detection. In: Proceedings of the 15th IEEE International Conference on Image Processing (ICIP), 12–15 Oct. 2008. pp 2528–2531.
Evangelopoulos G, Zlatintsi A, Potamianos A, Maragos P, Rapantzikos K, Skoumas G, Avrithis Y (2013) Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans on Multimed 15(7):1553–1568
Evangelopoulos G, Zlatintsi A, Skoumas G, Rapantzikos K, Potamianos A, Maragos P, Avrithis Y (2009) Video event detection and summarization using audio, visual and text saliency. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp 3553–3556.
Fersini E, Sartori F (2012) Semantic storyboard of judicial debates: a novel multimedia summarization environment. Program: Elec Libr Inf Syst 46(2):119–219
Garestier F, Le Toan T (2010) Estimation of the backscatter vertical profile of a pine forest using single baseline P-band (Pol-)InSAR data. IEEE Trans Geosci Remote Sens 48(9):3340–3348
Hauptmann A, Yan R, Lin W-H, Christel M, Wactlar H (2007) Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news. IEEE Trans on Multimed 9(5):958–966
Hauptmann A, Yan R, Lin W-H (2007) How many high-level concepts will fill the semantic gap in news video retrieval? In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR), Amsterdam, The Netherlands. ACM, pp 627–634.
Hung M-H, Hsieh C-H (2008) Event detection of broadcast baseball videos. IEEE Trans on Circ and Syst for Video Technol 18(12):1713–1726
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans on Pattern Anal and MachIntell 20(11):1254–1259
James W (1890) The Principles of psychology. Harvard University Press.
Jiang Y-G, Bhattacharya S, Chang S-F, Shah M (2013) High-level event recognition in unconstrained videos. Int J Multimed Inf Retrieval 2(2):73–101
Jiang Y-G, Ngo C-W, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR), Amsterdam, The Netherlands. ACM, pp 494–501.
Jiang YG, Yang J, Ngo CW, Hauptmann AG (2010) Representations of keypoint-based semantic concept detection: a comprehensive study. IEEE Trans on Multimed 12(1):42–53
Kennedy L, Hauptmann A (2006) LSCOM lexicon definitions and annotations (version 1.0). DTO Challenge workshop on large scale concept ontology for multimedia. Columbia University ADVENT technical report.
Kim J-N, Choi T-S (2000) A fast full-search motion-estimation algorithm using representative pixels and adaptive matching scan. IEEE Trans on Circ and Syst for Video Technol 10(7):1040–1048
Kleban J, Sarkar A, Moxley E, Mangiat S, Joshi S, Kuo T, Manjunath BS (2007) Feature fusion and redundancy pruning for rush video summarization. In: Proceedings of the international workshop on TRECVID video summarization (TVS), Augsburg, Bavaria, Germany. ACM, pp 84–88.
Knudsen EI (2007) Fundamental components of attention. Annu Rev Neurosci 30:57–78
Koral KF, Yendiki A, Lin Q, Dewaraja YK, Fessler JA (2004) Determining total I-131 activity within a VoI using SPECT, a UHE collimator, OSEM, and a constant conversion factor. IEEE Trans Nucl Sci 51(3):611–618
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). pp 2169–2178.
Lin L, Chen C, Shyu M-L, Chen S-C (2011) Weighted subspace filtering and ranking algorithms for video concept retrieval. IEEE Multimed 18(3):32–43
Loui A, Luo J, Chang S-F, Ellis D, Jiang W, Kennedy L, Lee K, Yanagawa A (2007) Kodak’s consumer video benchmark data set: concept definition and annotation. In: Proceedings of the 9th ACM SIGMM international workshop on Multimedia Information Retrieval (MIR), Augsburg, Bavaria, Germany. ACM, pp 245–254.
Luo JB, Papin C, Costello K (2009) Towards extracting semantically meaningful key frames from personal video clips: from humans to computers. IEEE Trans on Circ and Syst for Video Technol 19(2):289–301
Ma Y-F, Hua X-S, Lu L, Zhang H-J (2005) A generic framework of user attention model and its application in video summarization. IEEE Trans on Multimed 7(5):907–919
Ma Y-F, Lu L, Zhang H-J, Li M (2002) A user attention model for video summarization. In: Proceedings of the Tenth ACM International Conference on Multimedia, Juan-les-Pins, France. ACM, pp 533–542.
Matos N, Pereira F (2008) Automatic creation and evaluation of MPEG-7 compliant summary descriptions for generic audiovisual content. Signal Process Image Commun 23(8):581–598
Money AG, Agius H (2008) Video summarisation: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143
Money AG, Agius H (2010) ELVIS: Entertainment-led video summaries. ACM Trans Multimed Comput Commun Appl 6(3):1–30
Mylonas P, Spyrou E, Avrithis Y, Kollias S (2009) Using visual context and region semantics for high-level concept detection. IEEE Trans on Multimed 11(2):229–243
Ngo C-W, Ma Y-F, Zhang H-J (2005) Video summarization and scene detection by graph modeling. IEEE Trans on Circ and Syst for Video Technol 15(2):296–305
Over P, Smeaton AF, Awad G (2008) The TRECVID 2008 BBC rushes summarization evaluation. In: Proceedings of the 2nd ACM TRECVID video summarization workshop, Vancouver, British Columbia, Canada. ACM, pp 1–20.
Over P, Smeaton AF, Kelly P (2007) The TRECVID 2007 BBC rushes summarization evaluation pilot. In: Proceedings of the international workshop on TRECVID video summarization, Augsburg, Bavaria, Germany. ACM, pp 1–15.
Pal R, Ghosh A, Pal SK (2012) Video summarization and significance of content: a review. In: Handbook on soft computing for video surveillance. Chapman & Hall/CRC cryptography and network security series. Chapman and Hall/CRC, pp 79–102.
Pedersen T, Patwardhan S, Michelizzi J (2004) WordNet:Similarity - Measuring the relatedness of concepts. In: Proceedings of the nineteenth national conference on artificial intelligence (AAAI). pp 1024–1025.
Peng W-T, Chu W-T, Chang C-H, Chou C-N, Huang W-J, Chang W-Y, Hung Y-P (2011) Editing by viewing: automatic home video summarization by viewing behavior analysis. IEEE Trans on Multimed 13(3):539–550
Posner MI, Petersen SE (1990) The attention system of the human brain. Annu Rev Neurosci 13:25–42
Pritch Y, Rav-Acha A, Peleg S (2008) Nonchronological video synopsis and indexing. IEEE Trans on Pattern Anal and Mach Intell 30(11):1971–1984
Rapantzikos K, Avrithis Y, Kollias S (2011) Spatiotemporal features for action recognition and salient event detection. Cogn Comput 3(1):167–184
Ren J, Jiang J (2009) Hierarchical modeling and adaptive clustering for real-time summarization of rush videos. IEEE Trans on Multimed 11(5):906–917
Tamrakar A, Ali S, Yu Q, Liu J, Javed O, Divakaran A, Cheng H, Sawhney H (2012) Evaluation of low-level features and their combinations for complex event detection in open source videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 16–21 June 2012. pp 3681–3688.
Tang S, Zheng Y-T, Wang Y, Chua TS (2012) Sparse ensemble learning for concept detection. IEEE Trans on Multimed 14(1):43–54
Taskiran CM, Pizlo Z, Amir A, Ponceleon D, Delp EJ (2006) Automated video program summarization using speech transcripts. IEEE Trans on Multimed 8(4):775–791
Tavassolipour M, Karimian M, Kasaei S (2014) Event detection and summarization in soccer videos using Bayesian network and copula. IEEE Trans on Circand Systfor Video Technol 24(2):291–304
Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1):1–37
Viola PA, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Wang M, Hong R, Li G, Zha Z-J, Yan S, Chua T-S (2012) Event driven web video summarization by tag localization and key-shot identification. IEEE Trans on Multimed 14(4):975–985
Wang F, Ngo C-W (2012) Summarizing rushes videos by motion, object, and event understanding. IEEE Trans on Multimed 14(1):76–87
Wang S, Zhu Y, Wu G, Ji Q (2013) Hybrid video emotional tagging using users’ EEG and video content. Multimed Tools and Appl doi:10.1007/s11042-013-1450-8
Wei X-Y, Jiang Y-G, Ngo C-W (2011) Concept-driven multi-modality fusion for video search. IEEE Trans on Circ and Syst for Video Technol 21(1):62–73
Wu J, Rehg JM (2011) CENTRIST: a visual descriptor for scene categorization. IEEE Trans on Pattern Analand Mach Intell 33(8):1489–1501
Xu G, Ma Y-F, Zhang H-J, Yang S-Q (2005) An HMM-based framework for video semantic analysis. IEEE Trans on Circ and Syst for Video Technol 15(11):1422–1433
Yuan Z, Lu T, Wu D, Huang Y, Yu H (2011) Video summarization with semantic concept preservation. In: Proceedings of the 10th International Conference on Mobile and Ubiquitous Multimedia (ACM MUM), Beijing, China. ACM, 2107609, pp 109–112.
Zhu S, Ngo C-W, Jiang Y-G (2012) Sampling and ontologically pooling web images for visual concept learning. IEEE Trans on Multimed 14(4):1068–1078
Acknowledgments
This work was supported in part by the Australian Research Council grants, in part by the China Scholarship Council under Grant 2011623084, in part by the National Natural Science Foundation of China (No. 61372149, No. 61370189, No. 61100212), in part by the Program for New Century Excellent Talents in University (No. NCET-11-0892), in part by the Specialized Research Fund for the Doctoral Program of Higher Education (No. 20121103110017), in part by the Natural Science Foundation of Beijing (No. 4142009), in part by the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions (No. CIT&TCD201304036, No. CIT&TCD201404043), and in part by the Science and Technology Development Program of Beijing Education Committee (No. KM201410005002). We appreciate the anonymous reviewers for their constructive comments. Copyrights of images, videos and subtitles used in this work are the property of their respective owners.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Dong, P., Xia, Y., Wang, S. et al. An iteratively reweighting algorithm for dynamic video summarization. Multimed Tools Appl 74, 9449–9473 (2015). https://doi.org/10.1007/s11042-014-2126-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2126-8