Object segmentation and key-pose based summarization for motion video

Tian, Zhiqiang; Xue, Jianru; Lan, Xuguang; Li, Ce; Zheng, Nanning

doi:10.1007/s11042-013-1488-7

Object segmentation and key-pose based summarization for motion video

Published: 01 May 2013

Volume 72, pages 1773–1802, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zhiqiang Tian¹,
Jianru Xue¹,
Xuguang Lan¹,
Ce Li¹ &
…
Nanning Zheng¹

677 Accesses
6 Citations
Explore all metrics

Abstract

This paper proposes a key-pose based video summarization system for a video shot facilitated by using a video object segmentation method. Firstly, we detect the camera motion and extract video objects by a 3D graph-based algorithm. Once the objects are obtained, each of them is represented by a shape descriptor. Secondly, in order to find representative frames which preserve scene content as much accurately as possible, the proposed method calculates difference between pairs of frames based on shape descriptors of objects in the video shot. Finally, key-poses (representative frames) are extracted in a global manner by clustering these shapes. Experimental results on motion video shots show that the proposed method outputs satisfactory summarizations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PTDS CenterTrack: pedestrian tracking in dense scenes with re-identification and feature enhancement

Article 15 April 2024

Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes

Exploring the State-of-the-Art in Multi-Object Tracking: A Comprehensive Survey, Evaluation, Challenges, and Future Directions

Article 09 February 2024

Notes

In our paper, the terms of “pose”, “shape”, and “object” are used in a broad sense. They have the same meaning.
http://www.maths.lth.se/matematiklth/personal/petter/surfmex.php
Given a video, motion includes global motion caused by moving camera and local motion caused by moving objects.
See the “bwlabel” function in the MATLAB Image Processing Toolbox.

References

Agarwala A, Dontcheva M, Agrawala M, Drucker S, Colburn A, Curless B, Salesin D, Cohen M (2004) Interactive digital photomontage. ACM Trans Graph 23(3):294–302
Article Google Scholar
Bay H, Tuytelaars T, Gool LV (2006) Surf: speeded up robust features. In: Proceedings of the European conference on computer vision
Boykov Y, Jolly M (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: Proceedings of the IEEE international conference on computer vision
Brendel W, Todorovic S (2009) Video object segmentation by tracking regions. In: Proceedings of the IEEE international conference on computer vision
Brox T, Malik J (2010) Object segmentation by long term analysis of point trajectories. In: Proceedings of the European conference on computer vision
Calic J, Gibson DP, Campbell NW (2007) Efficient layout of comic-like video summaries. IEEE Trans Circuits Syst Video Technol 17(7):931–936
Article Google Scholar
Chen B, Wang J, Wang J (2009) A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Trans Multimedia 11(2):295–312
Article Google Scholar
Comaniciu D (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Article Google Scholar
Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans Multimedia 14:66–75
Article Google Scholar
Criminisi A, Cross G, Blake A, Kolmogorov V (2006) Bilayer segmentation of live video. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Erol B, Kossentini F (2000) Automatic key video object plane selection using the shape information in the mpeg-4 compressed domain. IEEE Trans Multimedia 2(2):129–138
Article Google Scholar
Feng S, Lei Z, Yi D, Li SZ (2012) Online content-aware video condensation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Ferman A, Tekalp A (2003) Two-stage hierarchical video summary extraction to match low-level user browsing preferences. IEEE Trans Multimedia 5(2):244–256
Article Google Scholar
Ferman A, Gunsel B, Tekalp A (1997) Object-based indexing of mpeg-4 compressed video. In: Proceedings of the conference on visual communications and image processing
Fischler M, Bolles R (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395
Article MathSciNet Google Scholar
Fontes de Avila SE, Brandao Lopes AP, da Luz Antonio J, Araujo AdA (2011) Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68
Article Google Scholar
Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Article MATH MathSciNet Google Scholar
Fu Y, Guo Y, Zhu Y, Liu F, Song C, Zhou Z (2010) Multi-view video summarization. IEEE Trans Multimedia 12(7):717–729
Article Google Scholar
Furini M, Geraci F, Montangero M, Pellegrini M (2010) Stimo: still and moving video storyboard for the web scenario. Multimed Tools Appl 46(1):47–69
Article Google Scholar
Gao Y, Wang W, Yong J, Gu H (2009) Dynamic video summarization using two-level redundancy detection. Multimed Tools Appl 42(2):233–250
Article Google Scholar
Gatica-Perez D, Loui AC, Sun MT (2003) Finding structure in home videos by probabilistic hierarchical clustering. IEEE Trans Circuits Syst Video Technol 13(6):539–548
Article Google Scholar
Ghoniem M, Luo D, Yang J, Ribarsky W (2007) NewsLab: exploratory broadcast news video analysis. In: Proceedings of the IEEE symposium on visual analytics science and technology
Goyette N, Jodoin PM, Porikli F, Konrad J, Ishwar P (2012) changedetection.net: a new change detection benchmark dataset. In: Proceedings of the IEEE workshop on change detection
Grundmann M, Kwatra V, Han M, Essa I (2010) Efficient hierarchical graph-based video segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Guan G, Wang Z, Lu S, Deng JD, Feng DD (2012) Keypoint based keyframe selection. IEEE Trans Circuits Syst Video Technol 23(4):729–734
Article Google Scholar
Havre SL, Hetzler EG, Whitney PD, Nowell LT (2002) ThemeRiver: visualizing thematic changes in large document collections. IEEE Trans Vis Comput Graph 8(1):9–20
Article Google Scholar
Herranz L, Calic J, Martinez JM, Mrak M (2012) Scalable comic-like video summaries and layout disturbance. IEEE Trans Multimedia 14:1290–1297
Article Google Scholar
Hu M (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8(2):179–187
Article MATH Google Scholar
Irani M, Anandan P, Hsu S (1995) Mosaic based representations of video sequences and their applications. In: Proceedings of the IEEE international conference on computer vision
Kim C, Hwang J (2000) An integrated scheme for object-based video abstraction. In: Proceedings of ACM international conference on multimedia
Kim C, Hwang J (2002) Object-based video abstraction for video surveillance systems. IEEE Trans Circuits Syst Video Technol 12(12):1128–1138
Article Google Scholar
Kolmogorov V, Zabin R (2004) What energy functions can be minimized via graph cuts? IEEE Trans Pattern Anal Mach Intell 26(2):147–159
Article Google Scholar
Lee YJ, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: Proceedings of the IEEE international conference on computer vision
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Li Y, Sun J, Shum H (2005) Video object cut and paste. ACM Trans Graph (TOG) 24(3):595–600
Article Google Scholar
Li Y, Lee S, Yeh C, Kuo C (2006) Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques. IEEE Signal Process Mag 23(2):79–89
Article MATH Google Scholar
Li Z, Ishwar P, Konrad J (2009) Video condensation by ribbon carving. IEEE Trans Image Process 18(11):2572–2583
Article MathSciNet Google Scholar
Liu D, Hua G, Chen T (2010) A hierarchical visual model for video object summarization. IEEE Trans Pattern Anal Mach Intell 32:2178–2190
Article Google Scholar
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Ma Y, Hua X, Lu L, Zhang H (2005) A generic framework of user attention model and its application in video summarization. IEEE Trans Multimedia 7(5):907–919
Article Google Scholar
Money A, Agius H (2008) Video summarisation: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143
Article Google Scholar
Ngo C, Ma Y, Zhang H (2003) Automatic video summarization by graph modeling. In: Proceedings of the IEEE international conference on computer vision
Ngo C, Ma Y, Zhang H (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305
Article Google Scholar
Over P, Smeaton A, Kelly P (2007) The TRECVID 2007 BBC rushes summarization evaluation pilot. In: Proceedings of the international workshop on TRECVID video summarization
Over P, Smeaton A, Awad G (2008) The TRECVID 2008 BBC rushes summarization evaluation. In: Proceedings of the international workshop on TRECVID video summarization
Rasheed Z, Sheikh Y, Shah M (2005) On the use of computable features for film classification. IEEE Trans Circuits Syst Video Technol 15(1):52–64
Article Google Scholar
Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Article Google Scholar
Silva L, Scharcanski J (2010) Video segmentation based on motion coherence of particles in a video sequence. IEEE Trans Image Process 19(4):1036–1049
Article MathSciNet Google Scholar
Szeliski R (2006) Image alignment and stitching: a tutorial. Found Trends Comput Graph Vis 2:1–105
Article Google Scholar
Taniguchi Y, Akutsu A, Tonomura Y (1997) Panoramaexcerpts: extracting and packing panoramas for video browsing. In: Proceedings of ACM international conference on multimedia
Tian Z, Xue J, Lan X, Li C, Zheng N (2011) Key object-based static video summarization. In: Proceedings of ACM international conference on multimedia
Tian Z, Xue J, Zheng N, Lan X, Li C (2011) 3d spatio-temporal graph cuts for video objects segmentation. In: Proceedings of the international conference on image processing
Truong B, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl (TOMCCAP) 3(1):1–37
Article Google Scholar
Valdes V, Martinez JM (2010) A framework for video abstraction systems analysis and modelling from an operational point of view. Multimed Tools Appl 49:7–35
Article Google Scholar
Vasconcelos N, Lippman A (2000) Statistical models of video structure for content analysis and characterization. IEEE Trans Image Process 9(1):3–19
Google Scholar
Vazquez-Reina A, Avidan S, Pfister H, Miller E (2010) Multiple hypothesis video segmentation from superpixel flows. In: Proceedings of the European conference on computer vision
Wang F, Ngo C (2012) Summarizing rushes videos by motion, object and event understanding. IEEE Trans Multimedia 14(1):76–87
Article Google Scholar
Werlberger M, Trobin W, Pock T, Wedel A, Cremers D, Bischof H (2009) Anisotropic huber-l1 optical flow. In: Proceedings of the British machine vision conference
Yeung MM, Yeo BL (1997) Video visualization for compact presentation and fast browsing of pictorial content. IEEE Trans Circuits Syst Video Technol 7(5):771–785
Article Google Scholar
Zhou H, Hermans T, Karandikar A, Rehg J (2010) Movie genre classification via scene categorization. In: Proceedings of ACM international conference on multimedia
Zhuang Y, Rui Y, Huang T, Mehrotra S (1998) Adaptive key frame extraction using unsupervised clustering. In: Proceedings of the international conference on image processing

Download references

Acknowledgements

This work was supported in part by the National Basic Research Program of China (973 Program) under Grant No. 2010CB327902, and the NSFC Nos. 90920301, 61273252, and 61175010.

Author information

Authors and Affiliations

Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, 710049, China
Zhiqiang Tian, Jianru Xue, Xuguang Lan, Ce Li & Nanning Zheng

Authors

Zhiqiang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Jianru Xue
View author publications
You can also search for this author in PubMed Google Scholar
Xuguang Lan
View author publications
You can also search for this author in PubMed Google Scholar
Ce Li
View author publications
You can also search for this author in PubMed Google Scholar
Nanning Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiqiang Tian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tian, Z., Xue, J., Lan, X. et al. Object segmentation and key-pose based summarization for motion video. Multimed Tools Appl 72, 1773–1802 (2014). https://doi.org/10.1007/s11042-013-1488-7

Download citation

Published: 01 May 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11042-013-1488-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object segmentation and key-pose based summarization for motion video

Abstract

Access this article

Similar content being viewed by others

PTDS CenterTrack: pedestrian tracking in dense scenes with re-identification and feature enhancement

Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes

Exploring the State-of-the-Art in Multi-Object Tracking: A Comprehensive Survey, Evaluation, Challenges, and Future Directions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Object segmentation and key-pose based summarization for motion video

Abstract

Access this article

Similar content being viewed by others

PTDS CenterTrack: pedestrian tracking in dense scenes with re-identification and feature enhancement

Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes

Exploring the State-of-the-Art in Multi-Object Tracking: A Comprehensive Survey, Evaluation, Challenges, and Future Directions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation