Abstract
This paper proposes a key-pose based video summarization system for a video shot facilitated by using a video object segmentation method. Firstly, we detect the camera motion and extract video objects by a 3D graph-based algorithm. Once the objects are obtained, each of them is represented by a shape descriptor. Secondly, in order to find representative frames which preserve scene content as much accurately as possible, the proposed method calculates difference between pairs of frames based on shape descriptors of objects in the video shot. Finally, key-poses (representative frames) are extracted in a global manner by clustering these shapes. Experimental results on motion video shots show that the proposed method outputs satisfactory summarizations.


















Similar content being viewed by others
Notes
In our paper, the terms of “pose”, “shape”, and “object” are used in a broad sense. They have the same meaning.
Given a video, motion includes global motion caused by moving camera and local motion caused by moving objects.
See the “bwlabel” function in the MATLAB Image Processing Toolbox.
References
Agarwala A, Dontcheva M, Agrawala M, Drucker S, Colburn A, Curless B, Salesin D, Cohen M (2004) Interactive digital photomontage. ACM Trans Graph 23(3):294–302
Bay H, Tuytelaars T, Gool LV (2006) Surf: speeded up robust features. In: Proceedings of the European conference on computer vision
Boykov Y, Jolly M (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: Proceedings of the IEEE international conference on computer vision
Brendel W, Todorovic S (2009) Video object segmentation by tracking regions. In: Proceedings of the IEEE international conference on computer vision
Brox T, Malik J (2010) Object segmentation by long term analysis of point trajectories. In: Proceedings of the European conference on computer vision
Calic J, Gibson DP, Campbell NW (2007) Efficient layout of comic-like video summaries. IEEE Trans Circuits Syst Video Technol 17(7):931–936
Chen B, Wang J, Wang J (2009) A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Trans Multimedia 11(2):295–312
Comaniciu D (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans Multimedia 14:66–75
Criminisi A, Cross G, Blake A, Kolmogorov V (2006) Bilayer segmentation of live video. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Erol B, Kossentini F (2000) Automatic key video object plane selection using the shape information in the mpeg-4 compressed domain. IEEE Trans Multimedia 2(2):129–138
Feng S, Lei Z, Yi D, Li SZ (2012) Online content-aware video condensation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Ferman A, Tekalp A (2003) Two-stage hierarchical video summary extraction to match low-level user browsing preferences. IEEE Trans Multimedia 5(2):244–256
Ferman A, Gunsel B, Tekalp A (1997) Object-based indexing of mpeg-4 compressed video. In: Proceedings of the conference on visual communications and image processing
Fischler M, Bolles R (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395
Fontes de Avila SE, Brandao Lopes AP, da Luz Antonio J, Araujo AdA (2011) Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68
Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Fu Y, Guo Y, Zhu Y, Liu F, Song C, Zhou Z (2010) Multi-view video summarization. IEEE Trans Multimedia 12(7):717–729
Furini M, Geraci F, Montangero M, Pellegrini M (2010) Stimo: still and moving video storyboard for the web scenario. Multimed Tools Appl 46(1):47–69
Gao Y, Wang W, Yong J, Gu H (2009) Dynamic video summarization using two-level redundancy detection. Multimed Tools Appl 42(2):233–250
Gatica-Perez D, Loui AC, Sun MT (2003) Finding structure in home videos by probabilistic hierarchical clustering. IEEE Trans Circuits Syst Video Technol 13(6):539–548
Ghoniem M, Luo D, Yang J, Ribarsky W (2007) NewsLab: exploratory broadcast news video analysis. In: Proceedings of the IEEE symposium on visual analytics science and technology
Goyette N, Jodoin PM, Porikli F, Konrad J, Ishwar P (2012) changedetection.net: a new change detection benchmark dataset. In: Proceedings of the IEEE workshop on change detection
Grundmann M, Kwatra V, Han M, Essa I (2010) Efficient hierarchical graph-based video segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Guan G, Wang Z, Lu S, Deng JD, Feng DD (2012) Keypoint based keyframe selection. IEEE Trans Circuits Syst Video Technol 23(4):729–734
Havre SL, Hetzler EG, Whitney PD, Nowell LT (2002) ThemeRiver: visualizing thematic changes in large document collections. IEEE Trans Vis Comput Graph 8(1):9–20
Herranz L, Calic J, Martinez JM, Mrak M (2012) Scalable comic-like video summaries and layout disturbance. IEEE Trans Multimedia 14:1290–1297
Hu M (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8(2):179–187
Irani M, Anandan P, Hsu S (1995) Mosaic based representations of video sequences and their applications. In: Proceedings of the IEEE international conference on computer vision
Kim C, Hwang J (2000) An integrated scheme for object-based video abstraction. In: Proceedings of ACM international conference on multimedia
Kim C, Hwang J (2002) Object-based video abstraction for video surveillance systems. IEEE Trans Circuits Syst Video Technol 12(12):1128–1138
Kolmogorov V, Zabin R (2004) What energy functions can be minimized via graph cuts? IEEE Trans Pattern Anal Mach Intell 26(2):147–159
Lee YJ, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: Proceedings of the IEEE international conference on computer vision
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Li Y, Sun J, Shum H (2005) Video object cut and paste. ACM Trans Graph (TOG) 24(3):595–600
Li Y, Lee S, Yeh C, Kuo C (2006) Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques. IEEE Signal Process Mag 23(2):79–89
Li Z, Ishwar P, Konrad J (2009) Video condensation by ribbon carving. IEEE Trans Image Process 18(11):2572–2583
Liu D, Hua G, Chen T (2010) A hierarchical visual model for video object summarization. IEEE Trans Pattern Anal Mach Intell 32:2178–2190
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Ma Y, Hua X, Lu L, Zhang H (2005) A generic framework of user attention model and its application in video summarization. IEEE Trans Multimedia 7(5):907–919
Money A, Agius H (2008) Video summarisation: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143
Ngo C, Ma Y, Zhang H (2003) Automatic video summarization by graph modeling. In: Proceedings of the IEEE international conference on computer vision
Ngo C, Ma Y, Zhang H (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305
Over P, Smeaton A, Kelly P (2007) The TRECVID 2007 BBC rushes summarization evaluation pilot. In: Proceedings of the international workshop on TRECVID video summarization
Over P, Smeaton A, Awad G (2008) The TRECVID 2008 BBC rushes summarization evaluation. In: Proceedings of the international workshop on TRECVID video summarization
Rasheed Z, Sheikh Y, Shah M (2005) On the use of computable features for film classification. IEEE Trans Circuits Syst Video Technol 15(1):52–64
Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Silva L, Scharcanski J (2010) Video segmentation based on motion coherence of particles in a video sequence. IEEE Trans Image Process 19(4):1036–1049
Szeliski R (2006) Image alignment and stitching: a tutorial. Found Trends Comput Graph Vis 2:1–105
Taniguchi Y, Akutsu A, Tonomura Y (1997) Panoramaexcerpts: extracting and packing panoramas for video browsing. In: Proceedings of ACM international conference on multimedia
Tian Z, Xue J, Lan X, Li C, Zheng N (2011) Key object-based static video summarization. In: Proceedings of ACM international conference on multimedia
Tian Z, Xue J, Zheng N, Lan X, Li C (2011) 3d spatio-temporal graph cuts for video objects segmentation. In: Proceedings of the international conference on image processing
Truong B, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl (TOMCCAP) 3(1):1–37
Valdes V, Martinez JM (2010) A framework for video abstraction systems analysis and modelling from an operational point of view. Multimed Tools Appl 49:7–35
Vasconcelos N, Lippman A (2000) Statistical models of video structure for content analysis and characterization. IEEE Trans Image Process 9(1):3–19
Vazquez-Reina A, Avidan S, Pfister H, Miller E (2010) Multiple hypothesis video segmentation from superpixel flows. In: Proceedings of the European conference on computer vision
Wang F, Ngo C (2012) Summarizing rushes videos by motion, object and event understanding. IEEE Trans Multimedia 14(1):76–87
Werlberger M, Trobin W, Pock T, Wedel A, Cremers D, Bischof H (2009) Anisotropic huber-l1 optical flow. In: Proceedings of the British machine vision conference
Yeung MM, Yeo BL (1997) Video visualization for compact presentation and fast browsing of pictorial content. IEEE Trans Circuits Syst Video Technol 7(5):771–785
Zhou H, Hermans T, Karandikar A, Rehg J (2010) Movie genre classification via scene categorization. In: Proceedings of ACM international conference on multimedia
Zhuang Y, Rui Y, Huang T, Mehrotra S (1998) Adaptive key frame extraction using unsupervised clustering. In: Proceedings of the international conference on image processing
Acknowledgements
This work was supported in part by the National Basic Research Program of China (973 Program) under Grant No. 2010CB327902, and the NSFC Nos. 90920301, 61273252, and 61175010.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tian, Z., Xue, J., Lan, X. et al. Object segmentation and key-pose based summarization for motion video. Multimed Tools Appl 72, 1773–1802 (2014). https://doi.org/10.1007/s11042-013-1488-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1488-7