Skip to main content
Log in

Exploring video content structure for hierarchical summarization

  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract.

In this paper, we propose a hierarchical video summarization strategy that explores video content structure to provide the users with a scalable, multilevel video summary. First, video-shot- segmentation and keyframe-extraction algorithms are applied to parse video sequences into physical shots and discrete keyframes. Next, an affinity (self-correlation) matrix is constructed to merge visually similar shots into clusters (supergroups). Since video shots with high similarities do not necessarily imply that they belong to the same story unit, temporal information is adopted by merging temporally adjacent shots (within a specified distance) from the supergroup into each video group. A video-scene-detection algorithm is thus proposed to merge temporally or spatially correlated video groups into scenario units. This is followed by a scene-clustering algorithm that eliminates visual redundancy among the units. A hierarchical video content structure with increasing granularity is constructed from the clustered scenes, video scenes, and video groups to keyframes. Finally, we introduce a hierarchical video summarization scheme by executing various approaches at different levels of the video content hierarchy to statically or dynamically construct the video summary. Extensive experiments based on real-world videos have been performed to validate the effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zhang H, Kantankanhalli A, Smoliar S (1993) Automatic partitioning of full-motion video. Multimedia Syst 1(1):2

    Google Scholar 

  2. Zhang H, Low C, Smoliar SW, Zhong D (1995) Video parsing, retrieval and browsing: an integrated and content-based solution. In: Proc. ACM Multimedia

  3. Yeung M, Yeo B (1997) Video visualization for compact presentation and fast browsing of pictorial content. IEEE Trans CSVT 7:771-785

    Article  Google Scholar 

  4. Pfeiffer S, Lienhart R, Fischer S, Effelsberg W (1996) Abstracting digital movies automatically. VCIP3 7(4):345-353

    Article  Google Scholar 

  5. Li Y, Zhang T, Tretter D () An overview of video abstract techniques. HP Technical Report 4

  6. Mills M (1992) A magnifier tool for video data. In: Proc. ACM Human Computer Interface, pp 93-98

  7. Uchihashi S, Foote J, Girgensohn A, Boreczky J (1999) Video Manga: Generating semantically meaningful video summaries. In: Proc. 7th ACM Multimedia conference, Orlando, FL, pp 383-392

  8. Doulamis N, Doulamis A, Avrithis Y, Ntalianis K, Kollias S (2000) Efficient Summarization of Stereoscopic Video Sequences. IEEE Trans CSVT 10(4) 5

    Google Scholar 

  9. Stefanidis A, Partsinevelos P, Agouris P, Doucette P (2000) Summarizing video datasets in the spatiotemporal domain. In: Proc. 11th international workshop on dataset and expert systems applications, pp 906-912

  10. Kim C, Hwang J (2000) An integrated scheme for object-based video abstraction. In: Proc. 8th ACM Multimedia conference, Los Angeles, pp 303-311

  11. Gong Y, Liu X (2000) Generating optimal video summaries. In: Proc. ICME, New York

  12. DeMenthon D, Kobla V, Doermann D (1998) Video summarization by curve simplification. In: Proc. of 6th ACM Multimedia conference, Bristol, UK, pp 13-16

  13. Ratakonda K, Sezan M, Crinon R (1999) Hierarchical video summarization. In: Proc. IS&T/SPIE conference on visual communications and image processing, San Jose, 3653:1531-1541

  14. Lienhart R (1999) Abstracting home video automatically. In: Proc. 7th ACM Multimedia conference, Orlando, FL

  15. Lienhart R, Pfeiffer S, Effelsberg W (1997) Video abstracting. Commun ACM 40(12):54-62

    Article  Google Scholar 

  16. He L, Sanocki W, Gupta A, Grudin J (1999) Auto-summarization of audio-video presentations. In: Proc. 7th ACM Multimedia conference, Orlando, FL, 30 October-5 November 1999, pp 489-498

  17. Fan J, Zhu X, Wu L (2001) Automatic model-based semantic object extraction algorithm. IEEE Trans Circuits Sys Video Technol 11(10):1073-1084

    Article  MATH  Google Scholar 

  18. Iran M, Anandan P (1998) Video indexing based on mosaic representation. Proc IEEE 86(5) 6

    Google Scholar 

  19. Taniguchi Y, Akutsu A, Tonomura Y (1997) PanoramaExcerpts: Extracting and packing panoramas for video browsing. In: Proc. ACM Multimedia conference, Seattle, pp 427-436

  20. Ponceleon D, Dieberger A (2001) Hierarchical brushing in a collection of video data. In: Proc. 34th Hawaii international conference on system sciences

  21. Christel M, Hauptmann A, Warmack A, Crosby S (1999) Adjustable filmstrips and skims as abstractions for a digital video library. In: Proc. IEEE conference on advances in digital libraries, Baltimore, MD, 19-21 May 1999

  22. Christel M (1999) Visual digest for news video libraries. In: Proc. 6th ACM Multimedia conference, Orlando, FL

  23. Smith M, Kanade T (1995) Video skimming for quick browsing based on audio and image characterization. Technical Report, CMU-CS-95- 186, School of Computer Science, Carnegie Mellon University, Pittsburgh

  24. Nam J, Tewfik A (1999) Dynamic video summarization and visualization. In: Proc. 6th ACM Multimedia conference, October 1999, Orlando, FL

  25. Ebadollahi S, Chang S, Wu H, Takoma S (2001) Echocardiogram video summarization. Proc SPIE MI7, San Diego

  26. Zhou W, Vellaikal A, Kuo CJ (2001) Rule-based video classification system for basketball video indexing. In: Proc. 9th ACM Multimedia conference workshop, Los Angeles

  27. Haering N, Qian R, Sezan M (1999) Detecting hunts in wildlife videos. In: Proc. IEEE international conference on multimedia computing and systems, Florence, Italy, vol I

  28. Zhu X, Wu L, Xue X, Lu X, Fan J (2001) Automatic scene detection in news programs by integrating visual feature and rules. In: Proc. 2nd IEEE Pacific-Rim conference on multimedia, Beijing, 24-26 October 2001. Lecture notes in computer science, vol 2195. Springer, Berlin Heidelberg New York, pp 837-842

  29. Smoliar S, Zhang H (1994) Content based video indexing and retrieval. IEEE Multimedia 1(2):62-72

    Article  Google Scholar 

  30. Zhu X, Fan J, Elmagarmid A, Aref W (2002) Hierarchical video summarization for medical data. In: Proc. SPIE: Storage and Retrieval for Media Databases, vol 4676, San Jose

  31. Toklu C, Liou A, Das M (2000) Videoabstract: a hybrid approach to generate semantically meaningful video summaries. In: Proc. ICME, New York

  32. Doulamis A, Doulamis N, Kollias S (2000) A fuzzy video content representation for video summarization and content-based retrieval. Signal Process 80(6): 8

    Article  Google Scholar 

  33. Zhong D, Zhang H, Chang S (1997) Clustering methods for video browsing and annotation. Technical report, Columbia University

  34. Vasconcelos N, Lippman A (1998) A spatiotemporal motion model for video summarization. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR), Santa Barbara, CA, June 1998

  35. Fan J, Yu J, Fujita G, Onoye T, Wu L, Shirakawa I (2001) Spatiotemporal segmentation for compact video representation. Signal Process Image Commun 16:553-566

    Google Scholar 

  36. Kender J, Yeo B (1998) Video scene segmentation via continuous video coherence. In: Proc. CVPR, Santa Barbara, CA

  37. Yeo B, Liu B (1995) Rapid scene analysis on compressed video. IEEE Trans CSVT 5(6):533-544

    Google Scholar 

  38. Fan J, Aref W, Elmagarmid A, Hacid M, Marzouk M, Zhu X (2001) MultiView: multilevel video content representation and retrieval. J Electron Imag 10(4):895-908

    Article  Google Scholar 

  39. Rui Y, Huang T, Mehrotra S (1999) Constructing table-of-content for video. ACM Multimedia Syst J on Video 7(5):359-368

    Article  Google Scholar 

  40. Lin T, Zhang H (2000) Automatic video scene extraction by shot grouping. In: Proc. ICPR, Barcelona

  41. Yeung M, Yeo B (1996) Time-constrained clustering for segmentation of video into story units. In: Proc. ICPR, Vienna, Austria

  42. Zhu X, Aref WG, Fan J, Catlin A, Elmagarmid A (2003) Medical video mining for efficient database indexing, management and access. In: Proc. IEEE ICDE, pp 569-580, India

  43. Hanjalic A, Zhang H (1999) An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis. IEEE Trans CSVT 9(8): 9

    MATH  Google Scholar 

  44. Girgensohn A, Boreczky J (1999) Time-constrained keyframe selection technique. In: Proc. IEEE conference on multimedia computing and systems, Florence, Italy, pp 756-761

  45. Zhuang Y, Rui Y, Huang T, Mehrotra S (1998) Adaptive key frame extraction using unsupervised clustering. In: Proc. IEEE ICIP, Chicago

  46. Manor L, Irani M (2002) Event-based analysis of video. In: Proc. CVPR, Kauai, HI, pp II-123-II-130

  47. Zhang H, Wu J, Zhong D, Smoliar S (1997) An integrated system for content-based video retrieval and browsing. Pattern Recog 30(4):643-658

    Article  Google Scholar 

  48. Weiss Y (1999) Segmentation using eigenvectors: a unifying view. In: Proc. IEEE ICCV, Corfu, Greece, pp 975-982

  49. Scott G, Longuet-Higgins H (1990) Feature grouping by relocalisation of eigenvectors of the proximity matrix. In: Proc. British Machine Vision conference, Oxford, UK

  50. Rasmussen E (1992) Clustering algorithms. In: Frakes W, Bazea-Yates R (eds) Information retrieval: data structure and algorithm. Prentice- Hall, Upper Saddle River, NJ, pp 419-442

  51. Sundaram H, Chang S (2000) Determining computable scenes in films and their structures using audio-visual memory models. In: Proc. ACM Multimedia conference, Los Angeles

  52. Costeira J, Kanade T (1994) A multi-body factorization method for motion analysis. Technical Report, CMU-CS-TR-94-220, Department of Computer Science, Carnegie Mellon University, Pittsburgh

  53. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888-905

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xingquan Zhu.

Additional information

Published online: 15 September 2004

Corespondence to: Xingquan Zhu

This research has been supported by the NSF under grants 9972883-EIA, 9974255-IIS, 9983248-EIA, and 0209120-IIS, a grant from the state of Indiana 21th Century Fund, and by the U.S. Army Research Laboratory and the U.S. Army Research Office under grant DAAD19-02-1-0178.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, X., Wu, X., Fan, J. et al. Exploring video content structure for hierarchical summarization. Multimedia Systems 10, 98–115 (2004). https://doi.org/10.1007/s00530-004-0142-7

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-004-0142-7

Keywords:

Navigation