A structure-transfer-driven temporal subspace clustering for video summarization

Zhang, Jing; Shi, Yue; Jing, Peiguang; Liu, Jing; Su, Yuting

doi:10.1007/s11042-018-6841-4

A structure-transfer-driven temporal subspace clustering for video summarization

Published: 13 November 2018

Volume 78, pages 24123–24145, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jing Zhang¹,
Yue Shi¹,
Peiguang Jing ORCID: orcid.org/0000-0003-2648-7358¹,
Jing Liu¹ &
…
Yuting Su¹

348 Accesses
2 Citations
Explore all metrics

Abstract

With the explosively increasing of mobile phones and other oriented camera devices, more and more video data is captured and stored. This brings out an urgent need for fast browsing and understanding video contents. Automatic generation of video summarization is one of effective techniques to tackle these problems which extracts succinct summaries to represent the original long videos. It involves two problems: video segmentation and summary generation. Most previous works just focused on addressing the second problem by exploiting a simple strategy like boundary detection to segment videos. However, this type of approach leads to suboptimal result because they not only lack of learning mechanism in video segmentation stage, but also separate the whole task into two independent stages. In this paper, we proposed a novel structure-transfer-driven temporal subspace clustering segmentation (STSC) method for video summarization. We first learn the structure information from source videos and then transfer it to target videos. By the Determinantal Point Process (DPP) algorithm, we select an informative subset of shots to create the final video summary. Experimental results on SumMe and TVSum datasets demonstrate the effection of our proposed method, against state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Video Summarization by Robust Low-Rank Subspace Segmentation

VSMCNN-dynamic summarization of videos using salient features from multi-CNN model

Article 25 June 2022

A novel clustering method for static video summarization

Article 19 May 2016

Notes

References

Affandi RH, Kulesza A, Fox EB (2012) Markov determinantal point processes, arXiv:1210.4850
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends®;, in Machine Learning, 2011, 3(1):1–122
MATH Google Scholar
Chang X, Yang Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Transactions on Neural Networks and Learning Systems 28(10):2294–2305
Article MathSciNet Google Scholar
Chang X, Ma Z, Lin M, Yang Y, Hauptmann A (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920
Article MathSciNet MATH Google Scholar
Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197
Article Google Scholar
Chang X, Yu YL, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632
Article Google Scholar
Chao W-L, Gong B, Grauman K, Sha F (2015) Large-margin determinantal point processes. In: UAI, pp 191–200
Chu WS, Song Y, Jaimes A (2015) Video co-summarization: video summarization by visual co-occurrence. In: Computer vision and pattern recognition, pp 3584–3592
Elhamifar E, Vidal R (2009) Sparse subspace clustering. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2790–2797
Fox E, Sudderth EB, Jordan MI, Willsky AS (2009) Nonparametric bayesian learning of switching linear dynamical systems. In: Proceedings of annual conference on neural information processing systems, pp 457–464
Furini M, Geraci F, Montangero M, Pellegrini M (2010) Stimo: still and moving video storyboard for the web scenario. Multimedia Tools and Applications 46 (1):47
Article Google Scholar
Gong B, Chao W-L, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Proceedings of annual conference on neural information processing systems, pp 2069–2077
Gygli M, Grabner H, Riemenschneider H, VanGool L (2014) Creating summaries from user videos. In: Proceedings of european conference on computer vision, 505–520
Gygli M, Song Y, Cao L (2016) Video2gif: automatic generation of animated gifs from video :1001–1009
Hoai M, Torre FDL (2013) Maximum margin temporal clustering. In: Proceedings of international conference on artificial intelligence and statistics
Khosla A, Hamid R, Lin C-J, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2698–2705
Kulesza A, Taskar B (2010) Structured determinantal point processes. In: Proceedings of annual conference on neural information processing systems, pp 1171–1179
Kulesza A, Taskar B (2011) k-dpps: fixed-size determinantal point processes. In: Proceedings of international conference on machine learning, pp 1193–1200
Kulesza A, Taskar B (2011) Learning determinantal point processes. In: Proceedings of twenty-seventh conference on uncertainty in artificial intelligence, pp 419–427
Kulesza A, Taskar B et al (2012) Determinantal point processes for machine learning. Foundations and Trends®;, in Machine Learning 5(2–3):123–286
Article MATH Google Scholar
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proceedings of IEEE conference on computer vision and pattern recognition. IEEE, pp 1346–1353
Li Y, Merialdo B (2010) Multi-video summarization based on video-mmr. In: 2010 11th international workshop on image analysis for multimedia interactive services (WIAMIS), pp 1–4
Li Z, Nie F, Chang X, Yang Y (2017) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Trans Knowl Data Eng PP(99):1–1
Google Scholar
Liu D, Hua G, Chen T (2010) A hierarchical visual model for video object summarization. IEEE Trans Pattern Anal Mach Intell 32(12):2178–2190
Article Google Scholar
Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by low-rank representation. In: Proceedings of international conference on machine learning, pp 663–670
Lu C-Y, Min H, Zhao Z-Q, Zhu L, Huang D-S, Yan S (2012) Robust and efficient subspace segmentation via least squares regression. In: Proceedings of european conference on computer vision, pp 347–360
Massoudi A, Lefebvre F, Demarty CH, Oisel L, Chupeau B (2007) A video fingerprint based on visual digest and local fingerprints. In: Proceedings of IEEE international conference on image processing, pp 2297–2300
Mundur P, Rao Y, Yesha Y (2006) Keyframe-based video summarization using delaunay clustering. Int J Digit Libr 6(2):219–232
Article Google Scholar
Nie L, Wang M, Gao Y, Zha ZJ, Chua TS (2013) Beyond text qa: multimedia answer generation by harvesting web information. IEEE Trans Multimedia 15(2):426–441
Article Google Scholar
Nie L, Wang M, Zha Z, Li G, Chua TS (2011) Multimedia answering:enriching text qa with media information, pp 695–704
Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst 13:1–13:23
Nie L, Yan S, Wang M, Hong R, Chua TS (2012) Harvesting visual concepts for image search with complex queries. In: ACM international conference on multimedia, pp 59–68
Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: Proceedings of european conference on computer vision, pp 540–555
Robards MW, Sunehag P (2009) Semi-markov kmeans clustering and activity recognition from body-worn sensors. In: Proceedings of IEEE international conference on data mining, pp 438–446
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Comput Sci
Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: summarizing web videos using titles. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5179–5187
Wang S, Tu B, Xu C, Zhang Z (2014) Exact subspace clustering in linear time. In: Proceedings of AAAI conference on artificial intelligence, pp 2113–2120
Zhang K, Chao WL, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: Proceedings of european conference on computer vision, pp 766–782
Zhao B, Xing EP (2014) Quasi real-time summarization for consumer videos. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2513–2520
Zhou F, Torre FDL, Hodgins JK (2013) Hierarchical aligned cluster analysis for temporal clustering of human motion. IEEE Trans Pattern Anal Mach Intell 35 (3):582–96
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Tianjin University, Tianjin, China
Jing Zhang, Yue Shi, Peiguang Jing, Jing Liu & Yuting Su

Authors

Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yue Shi
View author publications
You can also search for this author in PubMed Google Scholar
Peiguang Jing
View author publications
You can also search for this author in PubMed Google Scholar
Jing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuting Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peiguang Jing.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Shi, Y., Jing, P. et al. A structure-transfer-driven temporal subspace clustering for video summarization. Multimed Tools Appl 78, 24123–24145 (2019). https://doi.org/10.1007/s11042-018-6841-4

Download citation

Received: 13 March 2018
Revised: 17 October 2018
Accepted: 05 November 2018
Published: 13 November 2018
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s11042-018-6841-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A structure-transfer-driven temporal subspace clustering for video summarization

Abstract

Access this article

Similar content being viewed by others

Video Summarization by Robust Low-Rank Subspace Segmentation

VSMCNN-dynamic summarization of videos using salient features from multi-CNN model

A novel clustering method for static video summarization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A structure-transfer-driven temporal subspace clustering for video summarization

Abstract

Access this article

Similar content being viewed by others

Video Summarization by Robust Low-Rank Subspace Segmentation

VSMCNN-dynamic summarization of videos using salient features from multi-CNN model

A novel clustering method for static video summarization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation