Abstract
Video summarization techniques aim to generate a concise but complete synopsis of a video by choosing the most informative frames of the video content without loss of interpretability. Given the abundance of video content and its complex nature, there has always been a huge demand for an effective video summarization technique to analyze various dynamic posture centric videos. Yoga session video summarization is one of the interesting application areas of dynamic posture centric video analysis that is lately drawing the attention of computer vision researchers. The majority of available general video summarizing methods fail to detect key yoga poses in a yoga session video effectively, as they do not consider posture-centric information while extracting key frames. In this paper, we propose a machine learning based video summarization framework, which is capable of extracting a series of key postures in a yoga session video by tracking a few key-posture points corresponding to vital parts of the human body. Compared to the widely used FFMPEG tool, the proposed method appears to have a higher proportion of matched keyframes but a lower proportion of missing key-frames and redundant non key-frames with respect to the ground truth set, demonstrating its potential as an effective yoga posture video summarizer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ajmal, M., Ashraf, M.H., Shakir, M., Abbas, Y., Shah, F.A.: Video summarization: techniques and classification. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds.) ICCVG 2012. LNCS, vol. 7594, pp. 1–13. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33564-8_1
Basavarajaiah, M., Sharma, P.: Survey of compressed domain video summarization techniques. ACM Comput. Surv. 52(6), 1–29 (2019)
Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T., Zhang, F., Grundmann, M.: Blazepose: on-device real-time body pose tracking. arXiv preprint arXiv:2006.10204 (2020)
Chao, G.C., Tsai, Y.P., Jeng, S.K.: Augmented 3-d keyframe extraction for surveillance videos. IEEE Trans. Circuits Syst. Video Technol. 20(11), 1395–1408 (2010)
Gygli, M., Grabner, H., Riemenschneider, H., Van Gool, L.: Creating summaries from user videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 505–520. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_33
Hannane, R., Elboushaki, A., Afdel, K., Naghabhushan, P., Javed, M.: An efficient method for video shot boundary detection and keyframe extraction using sift-point distribution histogram. Int. J. Multim. Inf. Retriev. 5(2), 89–104 (2016)
Liu, G., Zhao, J.: Key frame extraction from mpeg video stream. In: 2010 Third International Symposium on Information Processing, pp. 423–427. IEEE (2010)
Long, C., Jo, E., Nam, Y.: Development of a yoga posture coaching system using an interactive display based on transfer learning. J. Supercomput. 78(4), 5269–5284 (2022)
Luo, J., Papin, C., Costello, K.: Towards extracting semantically meaningful key frames from personal video clips: from humans to computers. IEEE Trans. Circuits Syst. Video Technol. 19(2), 289–301 (2008)
Moir, G.L., Graham, B.W., Davis, S.E., Guers, J.J., Witmer, C.A.: An efficient method of key-frame extraction based on a cluster algorithm. J. Hum. Kinet. 39(1), 15–23 (2013)
Nasreen, A., Roy, K., Roy, K., Shobha, G.: Key frame extraction and foreground modelling using k-means clustering. In: 2015 7th International Conference on Computational Intelligence, Communication Systems and Networks, pp. 141–145. IEEE (2015)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Niedermayer, M., Sabatini, S., Giovara, V.: Ffmpeg documentation (2012). http://ffmpeg.org/ffmpeg-all.html#Video-Encoders
Peng, B., Lei, J., Fu, H., Jia, Y., Zhang, Z., Li, Y.: Deep video action clustering via spatio-temporal feature learning. Neurocomputing 456, 519–527 (2021)
Sigal, L.: Human pose estimation. In: Computer Vision: A Reference Guide, pp. 573–592. Springer (2021)
Sze, K.W., Lam, K.M., Qiu, G.: A new key frame representation for video segment retrieval. IEEE Trans. Circuits Syst. Video Technol. 15(9), 1148–1155 (2005)
Yan, C., Li, X., Li, G.: A new action recognition framework for video highlights summarization in sporting events. In: 2021 16th International Conference on Computer Science and Education (ICCSE), pp. 653–666. IEEE (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Afreen, S., Ghosh, T., Bhattacharyya, S., Bhar, A., Saha, S. (2024). A Machine Learning Based Video Summarization Framework for Yoga-Posture Video. In: Dasgupta, K., Mukhopadhyay, S., Mandal, J.K., Dutta, P. (eds) Computational Intelligence in Communications and Business Analytics. CICBA 2023. Communications in Computer and Information Science, vol 1956. Springer, Cham. https://doi.org/10.1007/978-3-031-48879-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-48879-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48878-8
Online ISBN: 978-3-031-48879-5
eBook Packages: Computer ScienceComputer Science (R0)