Abstract
This paper presents a novel method of key-frame selection for video summarization based on multidimensional time series analysis. In the proposed scheme, the given video is first segmented into a set of sequential clips containing a number of similar frames. Then the key frames are selected by a clustering procedure as the frames closest to the cluster centres in each resulting video clip. The proposed algorithm is implemented experimentally on a wide range of testing data, and compared with state-of-the-art approaches in the literature, which demonstrates excellent performance and outperforms existing methods on frame selection in terms of fidelity-based metric and subjective perception.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Here, without going into details, the use of clustering for key-frame selection in a resulted clip is straightforward.
We computed the Euclidean distance of each video frame to the grouping centre and then collected all the resulting distances to compute standard deviation for measuring video complexity. Without going into details, standard-deviation based measure is straightforward, i.e., a larger value of standard-deviation implies more complicated contents are contained in the video.
References
Almeida, J., Leite, N. J., & Torres, R. S. (2012). Vison: Video summarization for online applications. Pattern Recognition Letters, 33(4), 397–409.
Barbič, J., Safonova, A., Pan, J. Y., Faloutsos, C., Hodgins, J. K., & Pollard, N. S. (2004). Segmenting motion capture data into distinct behaviors. In Proceedings of graphics interface, 2004, pp. 185–194.
Cernekova, Z., Pitas, I., & Nikou, C. (2006). Information theory-based shot cut/fade detection and video summarization. IEEE Transactions on Circuits and Systems for Video Technology, 16(1), 82–91.
Chakraborty, S., Tickoo, O., & Iyer, R. (2015). Adaptive keyframe selection for video summarization. In 2015 IEEE winter conference on applications of computer vision, pp. 702–709.
Chatzigiorgaki, M., & Skodras, A. N. (2009). Real-time keyframe extraction towards video content identification. In 2009 16th International conference on digital signal processing, pp. 1–6.
Cho, H., & Fryzlewicz, P. (2011). Multiscale interpretation of taut string estimation and its connection to unbalanced haar wavelets. Statistics and Computing, 21(4), 671–681.
Cho, H., & Fryzlewicz, P. (2015). Multiple-change-point detection for high dimensional time series via sparsified binary segmentation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(2), 475–507.
Chua, J. L., Chang, Y. C., & Lim, W. K. (2015). A simple vision-based fall detection technique for indoor video surveillance. Signal, Image and Video Processing, 9(3), 623–633.
Cong, Y., Yuan, J., & Luo, J. (2012). Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Transactions on Multimedia, 14(1), 66–75.
Dang, C. T., & Radha, H. (2014). Heterogeneity image patch index and its application to consumer video summarization. IEEE Transactions on Image Processing, 23(6), 2704–2718.
De Avila, S. E. F., Lopes, A. P. B., da Luz, A., & de Albuquerque, Araújo A. (2011). Vsumm: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters, 32(1), 56–68.
DOrazio, T., & Leo, M. (2010). A review of vision-based systems for soccer video analysis. Pattern Recognition, 43(8), 2911–2926.
Ejaz, N., Tariq, T. B., & Baik, S. W. (2012). Adaptive key frame extraction for video summarization using an aggregation mechanism. Journal of Visual Communication and Image Representation, 23(7), 1031–1040.
Ejaz, N., Mehmood, I., & Baik, S. W. (2013). Efficient visual attention based framework for extracting key frames from videos. Signal Processing: Image Communication, 28(1), 34–44.
Elhamifar, E., Sapiro, G., & Vidal, R. (2012). See all by looking at a few: Sparse modeling for finding representative objects. In 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp. 1600–1607.
Evangelio, R. H., Senst, T., Keller, I., & Sikora, T. (2013). Video indexing and summarization as a tool for privacy protection. In 2013 18th international conference on digital signal processing (DSP), pp. 1–6
Fu, Y., Guo, Y., Zhu, Y., Liu, F., Song, C., & Zhou, Z. H. (2010). Multi-view video summarization. IEEE Transactions on Multimedia, 12(7), 717–729.
Furini, M., Geraci, F., Montangero, M., & Pellegrini, M. (2010). Stimo: Still and moving video storyboard for the web scenario. Multimedia Tools and Applications, 46(1), 47–69.
Groen, J. J., Kapetanios, G., & Price, S. (2013). Multivariate methods for monitoring structural change. Journal of Applied Econometrics, 28(2), 250–274.
Guan, G., Wang, Z., Lu, S., Da Deng, J., & Feng, D. D. (2013). Keypoint-based keyframe selection. IEEE Transactions on Circuits and Systems for Video Technology, 23(4), 729–734.
Horváth, L., & Hušková, M. (2012). Change-point detection in panel data. Journal of Time Series Analysis, 33(4), 631–648.
Hsu, R. L., Abdel-Mottaleb, M., & Jain, A. K. (2002). Face detection in color images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 696–706.
Janvier, B., Bruno, E., Pun, T., & Marchand-Maillet, S. (2006). Information-theoretic temporal segmentation of video and applications: Multiscale keyframes selection and shot boundaries detection. Multimedia Tools and Applications, 30(3), 273–288.
Ji, Q. G., Fang, Z. D., Xie, Z. H., & Lu, Z. M. (2013). Video abstraction based on the visual attention model and online clustering. Signal Processing: Image Communication, 28(3), 241–253.
Jiang, W., Cotton, C., & Loui, A. C. (2011). Automatic consumer video summarization by audio and visual analysis. In 2011 IEEE international conference on multimedia and expo, pp. 1–6.
Khosla, A., Hamid, R., Lin, C., & Sundaresan, N. (2013). Large-scale video summarization using web-image priors. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2698–2705.
Kuanar, S. K., Panda, R., & Chowdhury, A. S. (2013). Video key frame extraction through dynamic delaunay clustering with a structural constraint. Journal of Visual Communication and Image Representation, 24(7), 1212–1227.
Kumar, K. S., Prasad, S., Banwral, S., & Semwal, V. B. (2010). Sports video summarization using priority curve algorithm. International Journal on Computer Science & Engineering, 2(9), 2996–3002.
Lai, J. L., & Yi, Y. (2012). Key frame extraction based on visual attention model. Journal of Visual Communication and Image Representation, 23(1), 114–125.
Lu, G., Kudo, M., & Toyama, J. (2011). Hierarchical foreground detection in dynamic background. In International conference on computer analysis of images and patterns, pp. 413–420.
Lu, G., Zhou, Y., Li, X., & Yan, P. (2017). Unsupervised, efficient and scalable key-frame selection for automatic summarization of surveillance videos. Multimedia Tools and Applications, 76(5), 6309–6331.
Mahmoud, K. M., Ismail, M. A., & Ghanem, N. M. (2013). Vscan: An enhanced video summarization using density-based spatial clustering. In International conference on image analysis and processing, pp. 733–742.
Mei, S., Guan, G., Wang, Z., He, M., Hua, X., & Feng, D. D. (2014). \(l_{2,0}\) constrained sparse dictionary selection for video summarization. In 2014 IEEE international conference on multimedia and expo (ICME), pp. 1–6.
Mei, S., Guan, G., Wang, Z., Wan, S., He, M., & Feng, D. D. (2015). Video summarization via minimum sparse reconstruction. Pattern Recognition, 48(2), 522–533.
Mundur, P., Rao, Y., & Yesha, Y. (2006). Keyframe-based video summarization using delaunay clustering. International Journal on Digital Libraries, 6(2), 219–232.
Ngo, C. W., Ma, Y. F., & Zhang, H. J. (2005). Video summarization and scene detection by graph modeling. IEEE Transactions on Circuits and Systems for Video Technology, 15(2), 296–305.
Peng, W. T., Chu, W. T., Chang, C. H., Chou, C. N., Huang, W. J., Chang, W. Y., et al. (2011). Editing by viewing: automatic home video summarization by viewing behavior analysis. IEEE Transactions on Multimedia, 13(3), 539–550.
Porter, S. V., Mirmehdi, M., & Thomas, B. T. (2003). A shortest path representation for video summarisation. In 12th International conference on image analysis and processing, 2003. Proceedings, pp. 460–465.
Potapov, D., Douze, M., Harchaoui, Z., & Schmid, C. (2014). Category-specific video summarization. In European conference on computer vision, pp 540–555.
Rajendra, S. P., & Keshaveni, N. (2014). A survey of automatic video summarization techniques. International Journal of Electronics, Electrical and Computational System, 3(1), 1–6.
Sun, L., Ai, H., & Lao, S. (2013). The dynamic videobook: A hierarchical summarization for surveillance video. In 2013 IEEE international conference on image processing, pp. 3963–3966.
Tu, Z., Sun, D., & Luo, B. (2013). Video summarization by robust low-rank subspace segmentation. In Proceedings of The eighth international conference on bio-inspired computing: Theories and applications (BIC-TA), 2013, pp. 929–937.
Ventura, C., Giro-i Nieto, X., Vilaplana, V., Giribet, D., & Carasusan, E. (2013). Automatic keyframe selection based on mutual reinforcement algorithm. In 2013 11th International workshop on content-based multimedia indexing (CBMI), pp. 29–34.
Vezzani, R., & Cucchiara, R. (2010). Video surveillance online repository (visor): An integrated framework. Multimedia Tools and Applications, 50(2), 359–380.
Yang, S., & Lin, X. (2005). Key frame extraction using unsupervised clustering based on a statistical model. Tsinghua Science & Technology, 10(2), 169–173.
Acknowledgements
This work is supported in part by National Natural Science Foundation of China (61403232, 61327003), Natural Science Foundation of Shandong Province, China (ZR2014FQ025), and Young Scholars Program of Shandong University (YSPSDU, 2015WLJH30).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gao, Z., Lu, G. & Yan, P. Key-frame selection for video summarization: an approach of multidimensional time series analysis. Multidim Syst Sign Process 29, 1485–1505 (2018). https://doi.org/10.1007/s11045-017-0513-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11045-017-0513-9