Skip to main content

Greedy Salient Dictionary Learning for Activity Video Summarization

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11295))

Abstract

Automated video summarization is well-suited to the task of analysing human activity videos (e.g., from surveillance feeds), mainly as a pre-processing step, due to the large volume of such data and the small percentage of actually important video frames. Although key-frame extraction remains the most popular way to summarize such footage, its successful application for activity videos is obstructed by the lack of editing cuts and the heavy inter-frame visual redundancy. Salient dictionary learning, recently proposed for activity video key-frame extraction, models the problem as the identification of a small number of video frames that, simultaneously, can best reconstruct the entire video stream and are salient compared to the rest. In previous work, the reconstruction term was modelled as a Column Subset Selection Problem (CSSP) and a numerical, SVD-based algorithm was adapted for solving it, while video frame saliency, in the fastest algorithm proposed up to now, was also estimated using SVD. In this paper, the numerical CSSP method is replaced by a greedy, iterative one, properly adapted for salient dictionary learning, while the SVD-based saliency term is retained. As proven by the extensive empirical evaluation, the resulting approach significantly outperforms all competing key-frame extraction methods with regard to speed, without sacrificing summarization accuracy. Additionally, computational complexity analysis of all salient dictionary learning and related methods is presented.

The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement numbers 287674 (3DTVS) and 316564 (IMPART).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 16:1–16:43 (2011)

    Article  Google Scholar 

  2. Arai, H., Maung, C., Schweitzer, H.: Optimal column subset selection by A-star search. In: AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  3. Boutsidis, C., Mahoney, M.W., Drineas, P.: An improved approximation algorithm for the column subset selection problem. In: Symposium on Discrete Algorithms, pp. 968–977 (2009)

    Google Scholar 

  4. Cernekova, Z., Pitas, I., Nikou, C.: Information theory-based shot cut/fade detection and video summarization. IEEE Trans. Circuits Syst. Video Technol. 16(1), 82–91 (2006)

    Article  Google Scholar 

  5. Chan, T.F., Hansen, P.C.: Low-rank revealing QR factorizations. Numer. Linear Algebra Appl. 1(1), 33–44 (1994)

    Article  MathSciNet  Google Scholar 

  6. Cong, Y., Yuan, J., Luo, J.: Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans. Multimed. 14(1), 66–75 (2012)

    Article  Google Scholar 

  7. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: European Conference on Computer Vision (ECCV), pp. 1–2 (2004)

    Google Scholar 

  8. Dang, C., Radha, H.: RPCA-KFE: key frame extraction for video using robust principal component analysis. IEEE Trans. Image Process. 24(11), 3742–3753 (2015)

    Article  MathSciNet  Google Scholar 

  9. De Avilla, S.E.F., Lopes, A.P.B., Luz, A.L.J., Araujo, A.A.: VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn. Lett. 32(1), 56–68 (2011)

    Article  Google Scholar 

  10. Elhamifar, E., Sapiro, G., Vidal, R.: See all by looking at a few: Sparse modeling for finding representative objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

    Google Scholar 

  11. Farahat, A.K., Ghodsi, A., Kamel, M.S.: Efficient greedy feature selection for unsupervised learning. Knowl. Inf. Syst. 35(2), 285–310 (2013)

    Article  Google Scholar 

  12. Gkalelis, N., Kim, H., Hilton, A., Nikolaidis, N., Pitas, I.: The i3DPOST multi-view and 3D human action/interaction database. In: Proceedings of the IEEE Conference for Visual Media Production (CVMP), pp. 159–168 (2009)

    Google Scholar 

  13. Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision (ICCV), pp. 1150–1157. IEEE(1999)

    Google Scholar 

  14. Mademlis, I., Nikolaidis, N., Pitas, I.: Stereoscopic video description for key-frame extraction in movie summarization. In: European Signal Processing Conference (EUSIPCO), pp. 819–823. IEEE (2015)

    Google Scholar 

  15. Mademlis, I., Tefas, A., Nikolaidis, N., Pitas, I.: Compact video description and representation for automated summarization of human activities. In: Angelov, P., Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds.) INNS 2016. AISC, vol. 529, pp. 18–28. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-47898-2_3

    Chapter  Google Scholar 

  16. Mademlis, I., Tefas, A., Nikolaidis, N., Pitas, I.: Movie shot selection preserving narrative properties. In: Proceedings of the IEEE International Workshop on Multimedia Signal Processing (MMSP) (2016)

    Google Scholar 

  17. Mademlis, I., Tefas, A., Nikolaidis, N., Pitas, I.: Multimodal stereoscopic movie summarization conforming to narrative characteristics. IEEE Trans. Image Process. 25(12), 5828–5840 (2016)

    Article  MathSciNet  Google Scholar 

  18. Mademlis, I., Tefas, A., Nikolaidis, N., Pitas, I.: Summarization of human activity videos via low-rank approximation. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)

    Google Scholar 

  19. Mademlis, I., Tefas, A., Pitas, I.: Summarization of human activity videos using a salient dictionary. In: Proceedings of the IEEE International Conference on Image Processing (ICIP) (2017)

    Google Scholar 

  20. Mademlis, I., Tefas, A., Pitas, I.: Regularized SVD-based video frame saliency for activity summarization. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018)

    Google Scholar 

  21. Mahasseni, B., Lam, M., Todorovic, S.: Unsupervised video summarization with adversarial LSTM networks. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  22. Mei, S., Guan, G., Wang, Z., Wan, S., He, M., Feng, D.D.: Video summarization via minimum sparse reconstruction. Pattern Recogn. 48(2), 522–533 (2015)

    Article  Google Scholar 

  23. Otani, M., Nakashima, Y., Rahtu, E., Heikkilä, J., Yokoya, N.: Video summarization using deep semantic features. arXiv preprint arXiv:1609.08758 (2016)

  24. Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_11

    Chapter  Google Scholar 

  25. Sanderson, C., Curtin, R.: Armadillo: a template-based C++ library for linear algebra. J. Open Source Softw. 1, 26 (2016)

    Article  Google Scholar 

  26. Sener, F., Yao, A.: Unsupervised learning and segmentation of complex activities from video. arXiv preprint arXiv:1803.09490 (2018)

  27. Song, X., Sun, L., Lei, J., Tao, D., Yuan, G., Song, M.: Event-based large scale surveillance video summarization. Neurocomputing 187, 66–74 (2016)

    Article  Google Scholar 

  28. Theodoridis, T., Tefas, A., Pitas, I.: Multi-view semantic temporal video segmentation. In: Proceedings of the IEEE International Conference on Image Processing (ICIP) (2016)

    Google Scholar 

  29. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2013)

    Google Scholar 

  30. Wang, Q., Zhang, X., Zhang, Y., Yi, Q.: AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM (2013)

    Google Scholar 

  31. Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2), 249–257 (2006)

    Article  Google Scholar 

  32. Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 766–782. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_47

    Chapter  Google Scholar 

  33. Zhuang, Y., Rui, Y., Huang, T., Mehrotra, S.: Adaptive key frame extraction using unsupervised clustering. In: International Conference on Image Processing (ICIP), pp. 866–870. IEEE (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ioannis Mademlis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mademlis, I., Tefas, A., Pitas, I. (2019). Greedy Salient Dictionary Learning for Activity Video Summarization. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11295. Springer, Cham. https://doi.org/10.1007/978-3-030-05710-7_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05710-7_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05709-1

  • Online ISBN: 978-3-030-05710-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics