Greedy Salient Dictionary Learning for Activity Video Summarization

Mademlis, Ioannis; Tefas, Anastasios; Pitas, Ioannis

doi:10.1007/978-3-030-05710-7_48

Greedy Salient Dictionary Learning for Activity Video Summarization

Ioannis Mademlis¹⁸,
Anastasios Tefas¹⁸ &
Ioannis Pitas¹⁸

Conference paper
First Online: 08 December 2018

2577 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11295))

Abstract

Automated video summarization is well-suited to the task of analysing human activity videos (e.g., from surveillance feeds), mainly as a pre-processing step, due to the large volume of such data and the small percentage of actually important video frames. Although key-frame extraction remains the most popular way to summarize such footage, its successful application for activity videos is obstructed by the lack of editing cuts and the heavy inter-frame visual redundancy. Salient dictionary learning, recently proposed for activity video key-frame extraction, models the problem as the identification of a small number of video frames that, simultaneously, can best reconstruct the entire video stream and are salient compared to the rest. In previous work, the reconstruction term was modelled as a Column Subset Selection Problem (CSSP) and a numerical, SVD-based algorithm was adapted for solving it, while video frame saliency, in the fastest algorithm proposed up to now, was also estimated using SVD. In this paper, the numerical CSSP method is replaced by a greedy, iterative one, properly adapted for salient dictionary learning, while the SVD-based saliency term is retained. As proven by the extensive empirical evaluation, the resulting approach significantly outperforms all competing key-frame extraction methods with regard to speed, without sacrificing summarization accuracy. Additionally, computational complexity analysis of all salient dictionary learning and related methods is presented.

The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement numbers 287674 (3DTVS) and 316564 (IMPART).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 16:1–16:43 (2011)
Article Google Scholar
Arai, H., Maung, C., Schweitzer, H.: Optimal column subset selection by A-star search. In: AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Boutsidis, C., Mahoney, M.W., Drineas, P.: An improved approximation algorithm for the column subset selection problem. In: Symposium on Discrete Algorithms, pp. 968–977 (2009)
Google Scholar
Cernekova, Z., Pitas, I., Nikou, C.: Information theory-based shot cut/fade detection and video summarization. IEEE Trans. Circuits Syst. Video Technol. 16(1), 82–91 (2006)
Article Google Scholar
Chan, T.F., Hansen, P.C.: Low-rank revealing QR factorizations. Numer. Linear Algebra Appl. 1(1), 33–44 (1994)
Article MathSciNet Google Scholar
Cong, Y., Yuan, J., Luo, J.: Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans. Multimed. 14(1), 66–75 (2012)
Article Google Scholar
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: European Conference on Computer Vision (ECCV), pp. 1–2 (2004)
Google Scholar
Dang, C., Radha, H.: RPCA-KFE: key frame extraction for video using robust principal component analysis. IEEE Trans. Image Process. 24(11), 3742–3753 (2015)
Article MathSciNet Google Scholar
De Avilla, S.E.F., Lopes, A.P.B., Luz, A.L.J., Araujo, A.A.: VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn. Lett. 32(1), 56–68 (2011)
Article Google Scholar
Elhamifar, E., Sapiro, G., Vidal, R.: See all by looking at a few: Sparse modeling for finding representative objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Farahat, A.K., Ghodsi, A., Kamel, M.S.: Efficient greedy feature selection for unsupervised learning. Knowl. Inf. Syst. 35(2), 285–310 (2013)
Article Google Scholar
Gkalelis, N., Kim, H., Hilton, A., Nikolaidis, N., Pitas, I.: The i3DPOST multi-view and 3D human action/interaction database. In: Proceedings of the IEEE Conference for Visual Media Production (CVMP), pp. 159–168 (2009)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision (ICCV), pp. 1150–1157. IEEE(1999)
Google Scholar
Mademlis, I., Nikolaidis, N., Pitas, I.: Stereoscopic video description for key-frame extraction in movie summarization. In: European Signal Processing Conference (EUSIPCO), pp. 819–823. IEEE (2015)
Google Scholar
Mademlis, I., Tefas, A., Nikolaidis, N., Pitas, I.: Compact video description and representation for automated summarization of human activities. In: Angelov, P., Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds.) INNS 2016. AISC, vol. 529, pp. 18–28. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-47898-2_3
Chapter Google Scholar
Mademlis, I., Tefas, A., Nikolaidis, N., Pitas, I.: Movie shot selection preserving narrative properties. In: Proceedings of the IEEE International Workshop on Multimedia Signal Processing (MMSP) (2016)
Google Scholar
Mademlis, I., Tefas, A., Nikolaidis, N., Pitas, I.: Multimodal stereoscopic movie summarization conforming to narrative characteristics. IEEE Trans. Image Process. 25(12), 5828–5840 (2016)
Article MathSciNet Google Scholar
Mademlis, I., Tefas, A., Nikolaidis, N., Pitas, I.: Summarization of human activity videos via low-rank approximation. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)
Google Scholar
Mademlis, I., Tefas, A., Pitas, I.: Summarization of human activity videos using a salient dictionary. In: Proceedings of the IEEE International Conference on Image Processing (ICIP) (2017)
Google Scholar
Mademlis, I., Tefas, A., Pitas, I.: Regularized SVD-based video frame saliency for activity summarization. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018)
Google Scholar
Mahasseni, B., Lam, M., Todorovic, S.: Unsupervised video summarization with adversarial LSTM networks. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Mei, S., Guan, G., Wang, Z., Wan, S., He, M., Feng, D.D.: Video summarization via minimum sparse reconstruction. Pattern Recogn. 48(2), 522–533 (2015)
Article Google Scholar
Otani, M., Nakashima, Y., Rahtu, E., Heikkilä, J., Yokoya, N.: Video summarization using deep semantic features. arXiv preprint arXiv:1609.08758 (2016)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_11
Chapter Google Scholar
Sanderson, C., Curtin, R.: Armadillo: a template-based C++ library for linear algebra. J. Open Source Softw. 1, 26 (2016)
Article Google Scholar
Sener, F., Yao, A.: Unsupervised learning and segmentation of complex activities from video. arXiv preprint arXiv:1803.09490 (2018)
Song, X., Sun, L., Lei, J., Tao, D., Yuan, G., Song, M.: Event-based large scale surveillance video summarization. Neurocomputing 187, 66–74 (2016)
Article Google Scholar
Theodoridis, T., Tefas, A., Pitas, I.: Multi-view semantic temporal video segmentation. In: Proceedings of the IEEE International Conference on Image Processing (ICIP) (2016)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2013)
Google Scholar
Wang, Q., Zhang, X., Zhang, Y., Yi, Q.: AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM (2013)
Google Scholar
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2), 249–257 (2006)
Article Google Scholar
Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 766–782. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_47
Chapter Google Scholar
Zhuang, Y., Rui, Y., Huang, T., Mehrotra, S.: Adaptive key frame extraction using unsupervised clustering. In: International Conference on Image Processing (ICIP), pp. 866–870. IEEE (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Ioannis Mademlis, Anastasios Tefas & Ioannis Pitas

Authors

Ioannis Mademlis
View author publications
You can also search for this author in PubMed Google Scholar
Anastasios Tefas
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Pitas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ioannis Mademlis .

Editor information

Editors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Ioannis Kompatsiaris
EURECOM, Sophia Antipolis, France
Benoit Huet
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Vasileios Mezaris
Dublin City University, Dublin, Ireland
Cathal Gurrin
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Stefanos Vrochidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mademlis, I., Tefas, A., Pitas, I. (2019). Greedy Salient Dictionary Learning for Activity Video Summarization. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11295. Springer, Cham. https://doi.org/10.1007/978-3-030-05710-7_48

Download citation

DOI: https://doi.org/10.1007/978-3-030-05710-7_48
Published: 08 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05709-1
Online ISBN: 978-3-030-05710-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics