Skip to main content
Log in

Object segmentation and key-pose based summarization for motion video

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper proposes a key-pose based video summarization system for a video shot facilitated by using a video object segmentation method. Firstly, we detect the camera motion and extract video objects by a 3D graph-based algorithm. Once the objects are obtained, each of them is represented by a shape descriptor. Secondly, in order to find representative frames which preserve scene content as much accurately as possible, the proposed method calculates difference between pairs of frames based on shape descriptors of objects in the video shot. Finally, key-poses (representative frames) are extracted in a global manner by clustering these shapes. Experimental results on motion video shots show that the proposed method outputs satisfactory summarizations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. In our paper, the terms of “pose”, “shape”, and “object” are used in a broad sense. They have the same meaning.

  2. http://www.maths.lth.se/matematiklth/personal/petter/surfmex.php

  3. Given a video, motion includes global motion caused by moving camera and local motion caused by moving objects.

  4. See the “bwlabel” function in the MATLAB Image Processing Toolbox.

References

  1. Agarwala A, Dontcheva M, Agrawala M, Drucker S, Colburn A, Curless B, Salesin D, Cohen M (2004) Interactive digital photomontage. ACM Trans Graph 23(3):294–302

    Article  Google Scholar 

  2. Bay H, Tuytelaars T, Gool LV (2006) Surf: speeded up robust features. In: Proceedings of the European conference on computer vision

  3. Boykov Y, Jolly M (2001) Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In: Proceedings of the IEEE international conference on computer vision

  4. Brendel W, Todorovic S (2009) Video object segmentation by tracking regions. In: Proceedings of the IEEE international conference on computer vision

  5. Brox T, Malik J (2010) Object segmentation by long term analysis of point trajectories. In: Proceedings of the European conference on computer vision

  6. Calic J, Gibson DP, Campbell NW (2007) Efficient layout of comic-like video summaries. IEEE Trans Circuits Syst Video Technol 17(7):931–936

    Article  Google Scholar 

  7. Chen B, Wang J, Wang J (2009) A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Trans Multimedia 11(2):295–312

    Article  Google Scholar 

  8. Comaniciu D (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619

    Article  Google Scholar 

  9. Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans Multimedia 14:66–75

    Article  Google Scholar 

  10. Criminisi A, Cross G, Blake A, Kolmogorov V (2006) Bilayer segmentation of live video. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  11. Erol B, Kossentini F (2000) Automatic key video object plane selection using the shape information in the mpeg-4 compressed domain. IEEE Trans Multimedia 2(2):129–138

    Article  Google Scholar 

  12. Feng S, Lei Z, Yi D, Li SZ (2012) Online content-aware video condensation. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  13. Ferman A, Tekalp A (2003) Two-stage hierarchical video summary extraction to match low-level user browsing preferences. IEEE Trans Multimedia 5(2):244–256

    Article  Google Scholar 

  14. Ferman A, Gunsel B, Tekalp A (1997) Object-based indexing of mpeg-4 compressed video. In: Proceedings of the conference on visual communications and image processing

  15. Fischler M, Bolles R (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395

    Article  MathSciNet  Google Scholar 

  16. Fontes de Avila SE, Brandao Lopes AP, da Luz Antonio J, Araujo AdA (2011) Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68

    Article  Google Scholar 

  17. Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976

    Article  MATH  MathSciNet  Google Scholar 

  18. Fu Y, Guo Y, Zhu Y, Liu F, Song C, Zhou Z (2010) Multi-view video summarization. IEEE Trans Multimedia 12(7):717–729

    Article  Google Scholar 

  19. Furini M, Geraci F, Montangero M, Pellegrini M (2010) Stimo: still and moving video storyboard for the web scenario. Multimed Tools Appl 46(1):47–69

    Article  Google Scholar 

  20. Gao Y, Wang W, Yong J, Gu H (2009) Dynamic video summarization using two-level redundancy detection. Multimed Tools Appl 42(2):233–250

    Article  Google Scholar 

  21. Gatica-Perez D, Loui AC, Sun MT (2003) Finding structure in home videos by probabilistic hierarchical clustering. IEEE Trans Circuits Syst Video Technol 13(6):539–548

    Article  Google Scholar 

  22. Ghoniem M, Luo D, Yang J, Ribarsky W (2007) NewsLab: exploratory broadcast news video analysis. In: Proceedings of the IEEE symposium on visual analytics science and technology

  23. Goyette N, Jodoin PM, Porikli F, Konrad J, Ishwar P (2012) changedetection.net: a new change detection benchmark dataset. In: Proceedings of the IEEE workshop on change detection

  24. Grundmann M, Kwatra V, Han M, Essa I (2010) Efficient hierarchical graph-based video segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  25. Guan G, Wang Z, Lu S, Deng JD, Feng DD (2012) Keypoint based keyframe selection. IEEE Trans Circuits Syst Video Technol 23(4):729–734

    Article  Google Scholar 

  26. Havre SL, Hetzler EG, Whitney PD, Nowell LT (2002) ThemeRiver: visualizing thematic changes in large document collections. IEEE Trans Vis Comput Graph 8(1):9–20

    Article  Google Scholar 

  27. Herranz L, Calic J, Martinez JM, Mrak M (2012) Scalable comic-like video summaries and layout disturbance. IEEE Trans Multimedia 14:1290–1297

    Article  Google Scholar 

  28. Hu M (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8(2):179–187

    Article  MATH  Google Scholar 

  29. Irani M, Anandan P, Hsu S (1995) Mosaic based representations of video sequences and their applications. In: Proceedings of the IEEE international conference on computer vision

  30. Kim C, Hwang J (2000) An integrated scheme for object-based video abstraction. In: Proceedings of ACM international conference on multimedia

  31. Kim C, Hwang J (2002) Object-based video abstraction for video surveillance systems. IEEE Trans Circuits Syst Video Technol 12(12):1128–1138

    Article  Google Scholar 

  32. Kolmogorov V, Zabin R (2004) What energy functions can be minimized via graph cuts? IEEE Trans Pattern Anal Mach Intell 26(2):147–159

    Article  Google Scholar 

  33. Lee YJ, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: Proceedings of the IEEE international conference on computer vision

  34. Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  35. Li Y, Sun J, Shum H (2005) Video object cut and paste. ACM Trans Graph (TOG) 24(3):595–600

    Article  Google Scholar 

  36. Li Y, Lee S, Yeh C, Kuo C (2006) Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques. IEEE Signal Process Mag 23(2):79–89

    Article  MATH  Google Scholar 

  37. Li Z, Ishwar P, Konrad J (2009) Video condensation by ribbon carving. IEEE Trans Image Process 18(11):2572–2583

    Article  MathSciNet  Google Scholar 

  38. Liu D, Hua G, Chen T (2010) A hierarchical visual model for video object summarization. IEEE Trans Pattern Anal Mach Intell 32:2178–2190

    Article  Google Scholar 

  39. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  40. Ma Y, Hua X, Lu L, Zhang H (2005) A generic framework of user attention model and its application in video summarization. IEEE Trans Multimedia 7(5):907–919

    Article  Google Scholar 

  41. Money A, Agius H (2008) Video summarisation: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143

    Article  Google Scholar 

  42. Ngo C, Ma Y, Zhang H (2003) Automatic video summarization by graph modeling. In: Proceedings of the IEEE international conference on computer vision

  43. Ngo C, Ma Y, Zhang H (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15(2):296–305

    Article  Google Scholar 

  44. Over P, Smeaton A, Kelly P (2007) The TRECVID 2007 BBC rushes summarization evaluation pilot. In: Proceedings of the international workshop on TRECVID video summarization

  45. Over P, Smeaton A, Awad G (2008) The TRECVID 2008 BBC rushes summarization evaluation. In: Proceedings of the international workshop on TRECVID video summarization

  46. Rasheed Z, Sheikh Y, Shah M (2005) On the use of computable features for film classification. IEEE Trans Circuits Syst Video Technol 15(1):52–64

    Article  Google Scholar 

  47. Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  48. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  49. Silva L, Scharcanski J (2010) Video segmentation based on motion coherence of particles in a video sequence. IEEE Trans Image Process 19(4):1036–1049

    Article  MathSciNet  Google Scholar 

  50. Szeliski R (2006) Image alignment and stitching: a tutorial. Found Trends Comput Graph Vis 2:1–105

    Article  Google Scholar 

  51. Taniguchi Y, Akutsu A, Tonomura Y (1997) Panoramaexcerpts: extracting and packing panoramas for video browsing. In: Proceedings of ACM international conference on multimedia

  52. Tian Z, Xue J, Lan X, Li C, Zheng N (2011) Key object-based static video summarization. In: Proceedings of ACM international conference on multimedia

  53. Tian Z, Xue J, Zheng N, Lan X, Li C (2011) 3d spatio-temporal graph cuts for video objects segmentation. In: Proceedings of the international conference on image processing

  54. Truong B, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl (TOMCCAP) 3(1):1–37

    Article  Google Scholar 

  55. Valdes V, Martinez JM (2010) A framework for video abstraction systems analysis and modelling from an operational point of view. Multimed Tools Appl 49:7–35

    Article  Google Scholar 

  56. Vasconcelos N, Lippman A (2000) Statistical models of video structure for content analysis and characterization. IEEE Trans Image Process 9(1):3–19

    Google Scholar 

  57. Vazquez-Reina A, Avidan S, Pfister H, Miller E (2010) Multiple hypothesis video segmentation from superpixel flows. In: Proceedings of the European conference on computer vision

  58. Wang F, Ngo C (2012) Summarizing rushes videos by motion, object and event understanding. IEEE Trans Multimedia 14(1):76–87

    Article  Google Scholar 

  59. Werlberger M, Trobin W, Pock T, Wedel A, Cremers D, Bischof H (2009) Anisotropic huber-l1 optical flow. In: Proceedings of the British machine vision conference

  60. Yeung MM, Yeo BL (1997) Video visualization for compact presentation and fast browsing of pictorial content. IEEE Trans Circuits Syst Video Technol 7(5):771–785

    Article  Google Scholar 

  61. Zhou H, Hermans T, Karandikar A, Rehg J (2010) Movie genre classification via scene categorization. In: Proceedings of ACM international conference on multimedia

  62. Zhuang Y, Rui Y, Huang T, Mehrotra S (1998) Adaptive key frame extraction using unsupervised clustering. In: Proceedings of the international conference on image processing

Download references

Acknowledgements

This work was supported in part by the National Basic Research Program of China (973 Program) under Grant No. 2010CB327902, and the NSFC Nos. 90920301, 61273252, and 61175010.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiqiang Tian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tian, Z., Xue, J., Lan, X. et al. Object segmentation and key-pose based summarization for motion video. Multimed Tools Appl 72, 1773–1802 (2014). https://doi.org/10.1007/s11042-013-1488-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1488-7

Keywords

Navigation