Skip to main content
Log in

An iteratively reweighting algorithm for dynamic video summarization

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Information explosion has imposed unprecedented challenges on the conventional ways of video data consumption. Hence providing condensed and meaningful video summary to viewers has been recognized as a beneficial and attractive research in the multimedia community in recent years. Analyzing both the visual and textual modalities proves essential for an automatic video summarizer to pick up important contents from a video. However, most established studies in this direction either use heuristic rules or rely on simple ways of text analysis. This paper proposes an iteratively reweighting dynamic video summarization (IRDVS) algorithm based on the joint and adaptive use of the visual modality and accompanying subtitles. The proposed algorithm takes advantage of our developed SEmantic inDicator of videO seGment (SEDOG) feature for exploring the most representative concepts for describing the video. Meanwhile, the iteratively reweighting scheme effectively updates the dynamic surrogate of the original video by combining the high-level features in an adaptive manner. The proposed algorithm has been compared to four state-of-the-art video summarization approaches, namely the speech transcript-based (STVS) algorithm, attention model-based (AMVS) algorithm, sparse dictionary selection-based (DSVS) algorithm and heterogeneity image patch index-based (HIPVS) algorithm, on different video genres, including documentary, movie and TV news. Our results show that the proposed IRDVS algorithm can produce summarized videos with better quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.tomdiethe.com/teaching/remove_stopwords.m

  2. http://www.youtube.com/

  3. http://www.open-video.org/

References

  1. (2013) Here’s to eight great years. YouTube Blog. http://youtube-global.blogspot.com/2013/05/heres-to-eight-great-years.html.

  2. Ahmad S (1991) VISIT: A neural model of covert visual attention. In: Advances in Neural Information Processing Systems (NIPS), vol 4. pp 420–427.

  3. Alatan AA, Akansu A, Wolf W (2001) Multi-modal dialog scene detection using hidden Markov models for content-based multimedia indexing. Multimed Tools and Appl 14(2):137–151

    Article  MATH  Google Scholar 

  4. Almeida J, Leite NJ, Torres RS (2012) VISON: video summarization for online applications. Pattern Recogn Lett 33(4):397–409

    Article  Google Scholar 

  5. Almeida J, Leite NJ, Torres RS (2013) Online video summarization on compressed domain. J Vis Commun Image Represent 24(6):729–738

    Article  Google Scholar 

  6. Bai L, Hu Y, Lao S, Smeaton AF, O’Connor NE (2010) Automatic summarization of rushes video using bipartite graphs. Multimed Tools and Appl 49(1):63–80

    Article  Google Scholar 

  7. Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans on Pattern Anal and Mach Intell 35(1):185–207

    Article  MathSciNet  Google Scholar 

  8. Chen BW, Bharanitharan K, Wang JC, Fu Z, Wang JF (2014) Novel mutual information analysis of attentive motion entropy algorithm for sports video summarization. In: Huang YM, Chao HC, Deng DJ, Park JJ (eds) Advanced Technologies, Embedded and Multimedia for Human-centric Computing, vol 260. Lecture Notes in Electrical Engineering. Springer, Netherlands, pp 1031–1042

    Chapter  Google Scholar 

  9. Chen B-W, Wang J-C, Wang J-F (2009) A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Trans on Multimed 11(2):295–312

    Article  Google Scholar 

  10. Chênes C, Chanel G, Soleymani M, Pun T (2013) Highlight detection in movie scenes through inter-users, physiological linkage. In: Ramzan N, Zwol R, Lee J-S, Clüver K, Hua X-S (eds) Social Media Retrieval. Computer Communications and Networks, Springer London, pp 217–237

    Chapter  Google Scholar 

  11. Choudary C, Liu T (2007) Summarization of visual content in instructional videos. IEEE Trans on Multimed 9(7):1443–1455

    Article  Google Scholar 

  12. Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans on Multimed 14(1):66–75

    Article  Google Scholar 

  13. Dang CT, Radha H (2014) Heterogeneity image patch index and its application to consumer video summarization. IEEE Trans on Image Process 23(6):2704–2718

  14. de Avila SEF, Lopes APB, da Luz JA, de Albuquerque AA (2011) VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68

    Article  Google Scholar 

  15. Dong P, Wang Z, Zhuo L, Feng DD (2010) Video summarization with visual and semantic features. In: Qiu G, Lam K-M, Kiya H, Xue X, Kuo CCJ, Lew MS (eds) Advances in Multimedia Information Processing - Pacific Rim Conference on Multimedia 2010, Part I. Lecture Notes in Computer Science, vol 6297. Springer, Berlin, pp 203–214

    Google Scholar 

  16. Ejaz N, Mehmood I, Wook Baik S (2013) Efficient visual attention based framework for extracting key frames from videos. Signal Process Image Commun 28(1):34–44

    Article  Google Scholar 

  17. Ejaz N, Tariq TB, Baik SW (2012) Adaptive key frame extraction for video summarization using an aggregation mechanism. J Vis Commun Image Represent 23(7):1031–1040

    Article  Google Scholar 

  18. Ekin A, Tekalp AM, Mehrotra R (2003) Automatic soccer video analysis and summarization. IEEE Trans Image Process 12(7):796–807

    Article  Google Scholar 

  19. Evangelopoulos G, Rapantzikos K, Potamianos A, Maragos P, Zlatintsi A, Avrithis Y (2008) Movie summarization based on audiovisual saliency detection. In: Proceedings of the 15th IEEE International Conference on Image Processing (ICIP), 12–15 Oct. 2008. pp 2528–2531.

  20. Evangelopoulos G, Zlatintsi A, Potamianos A, Maragos P, Rapantzikos K, Skoumas G, Avrithis Y (2013) Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Trans on Multimed 15(7):1553–1568

    Article  Google Scholar 

  21. Evangelopoulos G, Zlatintsi A, Skoumas G, Rapantzikos K, Potamianos A, Maragos P, Avrithis Y (2009) Video event detection and summarization using audio, visual and text saliency. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp 3553–3556.

  22. Fersini E, Sartori F (2012) Semantic storyboard of judicial debates: a novel multimedia summarization environment. Program: Elec Libr Inf Syst 46(2):119–219

    Article  Google Scholar 

  23. Garestier F, Le Toan T (2010) Estimation of the backscatter vertical profile of a pine forest using single baseline P-band (Pol-)InSAR data. IEEE Trans Geosci Remote Sens 48(9):3340–3348

    Article  Google Scholar 

  24. Hauptmann A, Yan R, Lin W-H, Christel M, Wactlar H (2007) Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news. IEEE Trans on Multimed 9(5):958–966

  25. Hauptmann A, Yan R, Lin W-H (2007) How many high-level concepts will fill the semantic gap in news video retrieval? In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR), Amsterdam, The Netherlands. ACM, pp 627–634.

  26. Hung M-H, Hsieh C-H (2008) Event detection of broadcast baseball videos. IEEE Trans on Circ and Syst for Video Technol 18(12):1713–1726

    Article  MathSciNet  Google Scholar 

  27. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans on Pattern Anal and MachIntell 20(11):1254–1259

    Article  Google Scholar 

  28. James W (1890) The Principles of psychology. Harvard University Press.

  29. Jiang Y-G, Bhattacharya S, Chang S-F, Shah M (2013) High-level event recognition in unconstrained videos. Int J Multimed Inf Retrieval 2(2):73–101

    Article  Google Scholar 

  30. Jiang Y-G, Ngo C-W, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR), Amsterdam, The Netherlands. ACM, pp 494–501.

  31. Jiang YG, Yang J, Ngo CW, Hauptmann AG (2010) Representations of keypoint-based semantic concept detection: a comprehensive study. IEEE Trans on Multimed 12(1):42–53

  32. Kennedy L, Hauptmann A (2006) LSCOM lexicon definitions and annotations (version 1.0). DTO Challenge workshop on large scale concept ontology for multimedia. Columbia University ADVENT technical report.

  33. Kim J-N, Choi T-S (2000) A fast full-search motion-estimation algorithm using representative pixels and adaptive matching scan. IEEE Trans on Circ and Syst for Video Technol 10(7):1040–1048

    Article  Google Scholar 

  34. Kleban J, Sarkar A, Moxley E, Mangiat S, Joshi S, Kuo T, Manjunath BS (2007) Feature fusion and redundancy pruning for rush video summarization. In: Proceedings of the international workshop on TRECVID video summarization (TVS), Augsburg, Bavaria, Germany. ACM, pp 84–88.

  35. Knudsen EI (2007) Fundamental components of attention. Annu Rev Neurosci 30:57–78

    Article  Google Scholar 

  36. Koral KF, Yendiki A, Lin Q, Dewaraja YK, Fessler JA (2004) Determining total I-131 activity within a VoI using SPECT, a UHE collimator, OSEM, and a constant conversion factor. IEEE Trans Nucl Sci 51(3):611–618

    Article  Google Scholar 

  37. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). pp 2169–2178.

  38. Lin L, Chen C, Shyu M-L, Chen S-C (2011) Weighted subspace filtering and ranking algorithms for video concept retrieval. IEEE Multimed 18(3):32–43

    Article  Google Scholar 

  39. Loui A, Luo J, Chang S-F, Ellis D, Jiang W, Kennedy L, Lee K, Yanagawa A (2007) Kodak’s consumer video benchmark data set: concept definition and annotation. In: Proceedings of the 9th ACM SIGMM international workshop on Multimedia Information Retrieval (MIR), Augsburg, Bavaria, Germany. ACM, pp 245–254.

  40. Luo JB, Papin C, Costello K (2009) Towards extracting semantically meaningful key frames from personal video clips: from humans to computers. IEEE Trans on Circ and Syst for Video Technol 19(2):289–301

    Article  Google Scholar 

  41. Ma Y-F, Hua X-S, Lu L, Zhang H-J (2005) A generic framework of user attention model and its application in video summarization. IEEE Trans on Multimed 7(5):907–919

    Article  Google Scholar 

  42. Ma Y-F, Lu L, Zhang H-J, Li M (2002) A user attention model for video summarization. In: Proceedings of the Tenth ACM International Conference on Multimedia, Juan-les-Pins, France. ACM, pp 533–542.

  43. Matos N, Pereira F (2008) Automatic creation and evaluation of MPEG-7 compliant summary descriptions for generic audiovisual content. Signal Process Image Commun 23(8):581–598

    Article  Google Scholar 

  44. Money AG, Agius H (2008) Video summarisation: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143

    Article  Google Scholar 

  45. Money AG, Agius H (2010) ELVIS: Entertainment-led video summaries. ACM Trans Multimed Comput Commun Appl 6(3):1–30

    Article  Google Scholar 

  46. Mylonas P, Spyrou E, Avrithis Y, Kollias S (2009) Using visual context and region semantics for high-level concept detection. IEEE Trans on Multimed 11(2):229–243

    Article  Google Scholar 

  47. Ngo C-W, Ma Y-F, Zhang H-J (2005) Video summarization and scene detection by graph modeling. IEEE Trans on Circ and Syst for Video Technol 15(2):296–305

    Article  Google Scholar 

  48. Over P, Smeaton AF, Awad G (2008) The TRECVID 2008 BBC rushes summarization evaluation. In: Proceedings of the 2nd ACM TRECVID video summarization workshop, Vancouver, British Columbia, Canada. ACM, pp 1–20.

  49. Over P, Smeaton AF, Kelly P (2007) The TRECVID 2007 BBC rushes summarization evaluation pilot. In: Proceedings of the international workshop on TRECVID video summarization, Augsburg, Bavaria, Germany. ACM, pp 1–15.

  50. Pal R, Ghosh A, Pal SK (2012) Video summarization and significance of content: a review. In: Handbook on soft computing for video surveillance. Chapman & Hall/CRC cryptography and network security series. Chapman and Hall/CRC, pp 79–102.

  51. Pedersen T, Patwardhan S, Michelizzi J (2004) WordNet:Similarity - Measuring the relatedness of concepts. In: Proceedings of the nineteenth national conference on artificial intelligence (AAAI). pp 1024–1025.

  52. Peng W-T, Chu W-T, Chang C-H, Chou C-N, Huang W-J, Chang W-Y, Hung Y-P (2011) Editing by viewing: automatic home video summarization by viewing behavior analysis. IEEE Trans on Multimed 13(3):539–550

    Article  Google Scholar 

  53. Posner MI, Petersen SE (1990) The attention system of the human brain. Annu Rev Neurosci 13:25–42

    Article  Google Scholar 

  54. Pritch Y, Rav-Acha A, Peleg S (2008) Nonchronological video synopsis and indexing. IEEE Trans on Pattern Anal and Mach Intell 30(11):1971–1984

    Article  Google Scholar 

  55. Rapantzikos K, Avrithis Y, Kollias S (2011) Spatiotemporal features for action recognition and salient event detection. Cogn Comput 3(1):167–184

    Article  Google Scholar 

  56. Ren J, Jiang J (2009) Hierarchical modeling and adaptive clustering for real-time summarization of rush videos. IEEE Trans on Multimed 11(5):906–917

    Article  Google Scholar 

  57. Tamrakar A, Ali S, Yu Q, Liu J, Javed O, Divakaran A, Cheng H, Sawhney H (2012) Evaluation of low-level features and their combinations for complex event detection in open source videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 16–21 June 2012. pp 3681–3688.

  58. Tang S, Zheng Y-T, Wang Y, Chua TS (2012) Sparse ensemble learning for concept detection. IEEE Trans on Multimed 14(1):43–54

    Article  Google Scholar 

  59. Taskiran CM, Pizlo Z, Amir A, Ponceleon D, Delp EJ (2006) Automated video program summarization using speech transcripts. IEEE Trans on Multimed 8(4):775–791

    Article  Google Scholar 

  60. Tavassolipour M, Karimian M, Kasaei S (2014) Event detection and summarization in soccer videos using Bayesian network and copula. IEEE Trans on Circand Systfor Video Technol 24(2):291–304

  61. Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1):1–37

    Article  Google Scholar 

  62. Viola PA, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154

    Article  Google Scholar 

  63. Wang M, Hong R, Li G, Zha Z-J, Yan S, Chua T-S (2012) Event driven web video summarization by tag localization and key-shot identification. IEEE Trans on Multimed 14(4):975–985

    Article  Google Scholar 

  64. Wang F, Ngo C-W (2012) Summarizing rushes videos by motion, object, and event understanding. IEEE Trans on Multimed 14(1):76–87

    Article  Google Scholar 

  65. Wang S, Zhu Y, Wu G, Ji Q (2013) Hybrid video emotional tagging using users’ EEG and video content. Multimed Tools and Appl doi:10.1007/s11042-013-1450-8

  66. Wei X-Y, Jiang Y-G, Ngo C-W (2011) Concept-driven multi-modality fusion for video search. IEEE Trans on Circ and Syst for Video Technol 21(1):62–73

    Article  Google Scholar 

  67. Wu J, Rehg JM (2011) CENTRIST: a visual descriptor for scene categorization. IEEE Trans on Pattern Analand Mach Intell 33(8):1489–1501

    Article  Google Scholar 

  68. Xu G, Ma Y-F, Zhang H-J, Yang S-Q (2005) An HMM-based framework for video semantic analysis. IEEE Trans on Circ and Syst for Video Technol 15(11):1422–1433

    Article  Google Scholar 

  69. Yuan Z, Lu T, Wu D, Huang Y, Yu H (2011) Video summarization with semantic concept preservation. In: Proceedings of the 10th International Conference on Mobile and Ubiquitous Multimedia (ACM MUM), Beijing, China. ACM, 2107609, pp 109–112.

  70. Zhu S, Ngo C-W, Jiang Y-G (2012) Sampling and ontologically pooling web images for visual concept learning. IEEE Trans on Multimed 14(4):1068–1078

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the Australian Research Council grants, in part by the China Scholarship Council under Grant 2011623084, in part by the National Natural Science Foundation of China (No. 61372149, No. 61370189, No. 61100212), in part by the Program for New Century Excellent Talents in University (No. NCET-11-0892), in part by the Specialized Research Fund for the Doctoral Program of Higher Education (No. 20121103110017), in part by the Natural Science Foundation of Beijing (No. 4142009), in part by the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions (No. CIT&TCD201304036, No. CIT&TCD201404043), and in part by the Science and Technology Development Program of Beijing Education Committee (No. KM201410005002). We appreciate the anonymous reviewers for their constructive comments. Copyrights of images, videos and subtitles used in this work are the property of their respective owners.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Pei Dong or Yong Xia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, P., Xia, Y., Wang, S. et al. An iteratively reweighting algorithm for dynamic video summarization. Multimed Tools Appl 74, 9449–9473 (2015). https://doi.org/10.1007/s11042-014-2126-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2126-8

Keywords

Navigation