Abstract
Existing Labanotation generation methods have some drawbacks due to low efficiency and incapability to recognize existing videos, which can also be affected by the quality of hardware equipment. To address the issues in existing methods, we propose a new Labanotation generation method for folk dance videos based on pose estimation. Specifically, our method first extracts the key frame images from the fork dance video using temporal differences. Afterward, the 2D joint points of a dancer can be detected from key frame images by using multi-scale fusion of high-resolution net (HRNet), then maps the 2D–3D joint point sequence of the dancer using a pose projection generative adversarial network (pose projection GAN) to predict the coordinates of the 3D joint point position. Finally, the corresponding Labanotation can be generated by analyzing the estimate posture. Experimental results show that the method can achieve the conversion of dance movements in folk dance videos into digital Labanotation, and the automatic generation is much more efficient than manual recording. This method can quickly record endangered folk dances and contribute to the preservation and transmission of movement-based intangible cultural heritage.
Similar content being viewed by others
Data availability
The datasets generated during and analyzed during the current study will be made available upon reasonable academic request within the limitations of informed consent by the corresponding author upon acceptance.
References
Jiang P, Qin XL (2018) Adaptive video keyframe extraction based on visual attention model. J Image Gr 14(8):1650–1655
He J, Zhang C, He XL et al (2020) Visual recognition of traffic police gestures with convolutional pose machine and handcrafted features. Neurocomputing 390(5):248–259. https://doi.org/10.1016/j.neucom.2019.07.103
Zhang XK, Zhang RF, Liu YH (2020) Human pose estimation based on quadratic generative antagonism. Laser Optoelectron Prog 679(20):335–343. https://doi.org/10.3788/LOP57.201509
Glas S, Kiesel R, Kolkmann S et al (2020) Intraday renewable electricity trading: advanced modeling and numerical optimal control. J Math Ind 10(2):49–85. https://doi.org/10.1186/s13362-020-0071-x
Feng GM, Liu YJ (2021) Visual algorithm for on-the-job behavior analysis. Comput Eng Des 42(6):1668–1676
Lian RM, Liu Y, Yu P et al (2019) Video based human pose detection methods and their applications. Comput Program Skills Maint 9:127–129. https://doi.org/10.3969/j.issn.1006-4052.2019.09.046
Zhou KY (2021) Fitness action recognition system based on deep learning. Ind Control Comput 34(6):37–39
Baltaoglu S, Tong L, Zhao Q (2018) Algorithmic bidding for virtual trading in electricity markets. IEEE Trans Power Syst 34(21):535–543. https://doi.org/10.1109/TPWRS.2018.2862246
Cai Z, Shi T (2021) Distributed query processing in the edge-assisted IoT data monitoring system. IEEE Internet Things J 8(16):12679–12693. https://doi.org/10.1109/JIOT.2020.3026988
Toshev A, Szegedy C (2014) DeepPose: human pose estimation via deep neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1653–1660. https://doi.org/10.1109/CVPR.2014.214
Wei S, Ramakrishna V, Kanade T et al (2016) Convolutional pose machines. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 4724–4732. https://doi.org/10.1109/CVPR.2016.511
Cao Z, Simon T, Wei SE et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1302–1310. https://doi.org/10.48550/arXiv.1611.08050
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision (ECCV), pp 483–499. https://doi.org/10.1007/978-3-319-46484-8_29
Fang HS, Xie S, Tai YW et al (2017) RMPE: regional multi-person pose estimation. In: IEEE International conference on computer vision (ICCV), pp 2353–2362. https://doi.org/10.1109/ICCV.2017.256
Sun K, Xiao B, Liu D et al (2019) Deep high-resolution representation learning for human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5686–5696. https://doi.org/10.1109/CVPR.2019.00584
Shen L, Chen Y (2020) End-to-end unlabeled human pose estimation network based on high-dimensional information encoding and decoding with feature monitoring. Acta Electron Sin 48(8):1528–1537. https://doi.org/10.3969/j.issn.0372-2112.2020.08.010
Xu J, Wan H, Chen ZY (2019) Sharp skirt bandpass filter-integrated single-pole double-throw switch with absorptive OFF-state. IEEE Trans Microw Theory Tech 67(2):704–711. https://doi.org/10.1109/TMTT.2018.2880914
Feng T (2019) Three-dimensional human pose estimation based on monocular vision. Harbin Institute of Technology, Harbin. https://doi.org/10.27061/d.cnki.ghgdu.2019.000896
Fan SR, Jia YT, Liu JH (2019) Feature selection for human pose recognition based on three-axis acceleration sensor. Chin J Appl Sci 37(03):427–436. https://doi.org/10.3969/j.issn.0255-8297.2019.03.013
Kanazawa A, Black MJ, Jacobs DW et al (2018) End-to-end recovery of human shape and pose. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7122–7131. https://doi.org/10.1109/CVPR.2018.00744
Mehta D, Sridhar S, Sotnychenko O et al (2017) Vnect: real-time 3d human pose estimation with a single RGB camera. ACM Trans Gr 36:44.1-44.14
Cai Z, Esposito C, Dargahi T et al (2022) Graph-powered learning for social networks. Neurocomputing 501:244–245. https://doi.org/10.1016/j.neucom.2022.05.029
Cai XQ, Wang T, Bai X et al (2022) Pogt: a peking opera gesture training system using infrared sensors. Int J Pattern Recognit Artif Intell 36(6):2256011. https://doi.org/10.1142/S0218001422560110
Martinez J, Hossain R, Romero J et al (2017) A simple yet effective baseline for 3d human pose estimation. In: IEEE international conference on computer vision (ICCV), pp 2659–2668. https://doi.org/10.1109/ICCV.2017.288
Hossain M, Little J (2018) Exploiting temporal information for 3d human pose estimation. In: European conference on computer vision (ECCV), pp 69–86. https://doi.org/10.1007/978-3-030-01249-6_5
Pavllo D, Feichtenhofer C, Grangier D et al (2019) 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7753–7762.https://doi.org/10.1109/CVPR.2019.00794
Hachimura K, Nakamura M (2001) Method of generating coded description of human body motion from motion-captured data. In: IEEE International workshop on robot and human interactive communication (ROMAN), pp 122–127. https://doi.org/10.1109/ROMAN.2001.981889
Chen H, Qian G, James J (2005) An autonomous dance scoring system using marker-based motion capture. In: Workshop on multimedia signal processing (MMSP), pp 1–4. https://doi.org/10.1109/MMSP.2005.248666
Choensawat W, Nakamura M, Hachimura K (2015) GenLaban: a tool for generating Labanotation from motion capture data. Multimed Tools Appl 74:10823–10846. https://doi.org/10.1007/s11042-014-2209-6
Guo H (2015) Research on automatic generation of Labanotation based on human motion capture data. Beijing Jiaotong Univ, Beijing. https://doi.org/10.7666/d.Y2916406
Guo H, Miao ZJ, Zhu FY et al (2014) Automatic labanotation generation based on human motion capture data. In: Chinese conference on pattern recognition (CCPR), pp 426–435. https://doi.org/10.1007/978-3-662-45646-0_44
Zhou ZM, Miao ZJ, Wang JJ (2016) A system for automatic generation of Labanotation from motion capture data. In: International conference on signal processing (ICSP), pp 1031–1034. https://doi.org/10.1109/ICSP.2016.7877986
Zhou ZM (2017) Research on automatic generation of Labanotation based on dynamic programming. Beijing Jiaotong University, Beijing
Acknowledgements
This work was supported by the Funding Project of Humanities and Social Sciences of the Ministry of Education in China (22YJAZH002), the Funding Project of Beijing Social Science Foundation (Nos. 19YTC043, 20YTB011). We would like to thank those who care of this paper and our projects. Also, we would like to thank everyone who spent time on reading early versions of this paper, including the anonymous reviewers.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interests regarding the publication of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cai, X., Wang, T., Lu, R. et al. Automatic generation of Labanotation based on human pose estimation in folk dance videos. Neural Comput & Applic 35, 24755–24771 (2023). https://doi.org/10.1007/s00521-023-08206-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08206-8