Skip to main content
Log in

A structure-transfer-driven temporal subspace clustering for video summarization

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the explosively increasing of mobile phones and other oriented camera devices, more and more video data is captured and stored. This brings out an urgent need for fast browsing and understanding video contents. Automatic generation of video summarization is one of effective techniques to tackle these problems which extracts succinct summaries to represent the original long videos. It involves two problems: video segmentation and summary generation. Most previous works just focused on addressing the second problem by exploiting a simple strategy like boundary detection to segment videos. However, this type of approach leads to suboptimal result because they not only lack of learning mechanism in video segmentation stage, but also separate the whole task into two independent stages. In this paper, we proposed a novel structure-transfer-driven temporal subspace clustering segmentation (STSC) method for video summarization. We first learn the structure information from source videos and then transfer it to target videos. By the Determinantal Point Process (DPP) algorithm, we select an informative subset of shots to create the final video summary. Experimental results on SumMe and TVSum datasets demonstrate the effection of our proposed method, against state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. www.youtube.com

  2. https://webscope.sandbox.yahoo.com/

References

  1. Affandi RH, Kulesza A, Fox EB (2012) Markov determinantal point processes, arXiv:1210.4850

  2. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends®;, in Machine Learning, 2011, 3(1):1–122

    MATH  Google Scholar 

  3. Chang X, Yang Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Transactions on Neural Networks and Learning Systems 28(10):2294–2305

    Article  MathSciNet  Google Scholar 

  4. Chang X, Ma Z, Lin M, Yang Y, Hauptmann A (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920

    Article  MathSciNet  MATH  Google Scholar 

  5. Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197

    Article  Google Scholar 

  6. Chang X, Yu YL, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632

    Article  Google Scholar 

  7. Chao W-L, Gong B, Grauman K, Sha F (2015) Large-margin determinantal point processes. In: UAI, pp 191–200

  8. Chu WS, Song Y, Jaimes A (2015) Video co-summarization: video summarization by visual co-occurrence. In: Computer vision and pattern recognition, pp 3584–3592

  9. Elhamifar E, Vidal R (2009) Sparse subspace clustering. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2790–2797

  10. Fox E, Sudderth EB, Jordan MI, Willsky AS (2009) Nonparametric bayesian learning of switching linear dynamical systems. In: Proceedings of annual conference on neural information processing systems, pp 457–464

  11. Furini M, Geraci F, Montangero M, Pellegrini M (2010) Stimo: still and moving video storyboard for the web scenario. Multimedia Tools and Applications 46 (1):47

    Article  Google Scholar 

  12. Gong B, Chao W-L, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Proceedings of annual conference on neural information processing systems, pp 2069–2077

  13. Gygli M, Grabner H, Riemenschneider H, VanGool L (2014) Creating summaries from user videos. In: Proceedings of european conference on computer vision, 505–520

  14. Gygli M, Song Y, Cao L (2016) Video2gif: automatic generation of animated gifs from video :1001–1009

  15. Hoai M, Torre FDL (2013) Maximum margin temporal clustering. In: Proceedings of international conference on artificial intelligence and statistics

  16. Khosla A, Hamid R, Lin C-J, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2698–2705

  17. Kulesza A, Taskar B (2010) Structured determinantal point processes. In: Proceedings of annual conference on neural information processing systems, pp 1171–1179

  18. Kulesza A, Taskar B (2011) k-dpps: fixed-size determinantal point processes. In: Proceedings of international conference on machine learning, pp 1193–1200

  19. Kulesza A, Taskar B (2011) Learning determinantal point processes. In: Proceedings of twenty-seventh conference on uncertainty in artificial intelligence, pp 419–427

  20. Kulesza A, Taskar B et al (2012) Determinantal point processes for machine learning. Foundations and Trends®;, in Machine Learning 5(2–3):123–286

    Article  MATH  Google Scholar 

  21. Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proceedings of IEEE conference on computer vision and pattern recognition. IEEE, pp 1346–1353

  22. Li Y, Merialdo B (2010) Multi-video summarization based on video-mmr. In: 2010 11th international workshop on image analysis for multimedia interactive services (WIAMIS), pp 1–4

  23. Li Z, Nie F, Chang X, Yang Y (2017) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng PP(99):1–1

    Google Scholar 

  24. Liu D, Hua G, Chen T (2010) A hierarchical visual model for video object summarization. IEEE Trans Pattern Anal Mach Intell 32(12):2178–2190

    Article  Google Scholar 

  25. Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by low-rank representation. In: Proceedings of international conference on machine learning, pp 663–670

  26. Lu C-Y, Min H, Zhao Z-Q, Zhu L, Huang D-S, Yan S (2012) Robust and efficient subspace segmentation via least squares regression. In: Proceedings of european conference on computer vision, pp 347–360

  27. Massoudi A, Lefebvre F, Demarty CH, Oisel L, Chupeau B (2007) A video fingerprint based on visual digest and local fingerprints. In: Proceedings of IEEE international conference on image processing, pp 2297–2300

  28. Mundur P, Rao Y, Yesha Y (2006) Keyframe-based video summarization using delaunay clustering. Int J Digit Libr 6(2):219–232

    Article  Google Scholar 

  29. Nie L, Wang M, Gao Y, Zha ZJ, Chua TS (2013) Beyond text qa: multimedia answer generation by harvesting web information. IEEE Trans Multimedia 15(2):426–441

    Article  Google Scholar 

  30. Nie L, Wang M, Zha Z, Li G, Chua TS (2011) Multimedia answering:enriching text qa with media information, pp 695–704

  31. Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst 13:1–13:23

  32. Nie L, Yan S, Wang M, Hong R, Chua TS (2012) Harvesting visual concepts for image search with complex queries. In: ACM international conference on multimedia, pp 59–68

  33. Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: Proceedings of european conference on computer vision, pp 540–555

  34. Robards MW, Sunehag P (2009) Semi-markov kmeans clustering and activity recognition from body-worn sensors. In: Proceedings of IEEE international conference on data mining, pp 438–446

  35. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Comput Sci

  36. Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: summarizing web videos using titles. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5179–5187

  37. Wang S, Tu B, Xu C, Zhang Z (2014) Exact subspace clustering in linear time. In: Proceedings of AAAI conference on artificial intelligence, pp 2113–2120

  38. Zhang K, Chao WL, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: Proceedings of european conference on computer vision, pp 766–782

  39. Zhao B, Xing EP (2014) Quasi real-time summarization for consumer videos. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2513–2520

  40. Zhou F, Torre FDL, Hodgins JK (2013) Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE Trans Pattern Anal Mach Intell 35 (3):582–96

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peiguang Jing.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Shi, Y., Jing, P. et al. A structure-transfer-driven temporal subspace clustering for video summarization. Multimed Tools Appl 78, 24123–24145 (2019). https://doi.org/10.1007/s11042-018-6841-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6841-4

Keywords

Navigation