Skip to main content
Log in

Automatic generation of Labanotation based on human pose estimation in folk dance videos

  • S.I.: Applications and Techniques in Cyber Intelligence (ATCI2022)
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Existing Labanotation generation methods have some drawbacks due to low efficiency and incapability to recognize existing videos, which can also be affected by the quality of hardware equipment. To address the issues in existing methods, we propose a new Labanotation generation method for folk dance videos based on pose estimation. Specifically, our method first extracts the key frame images from the fork dance video using temporal differences. Afterward, the 2D joint points of a dancer can be detected from key frame images by using multi-scale fusion of high-resolution net (HRNet), then maps the 2D–3D joint point sequence of the dancer using a pose projection generative adversarial network (pose projection GAN) to predict the coordinates of the 3D joint point position. Finally, the corresponding Labanotation can be generated by analyzing the estimate posture. Experimental results show that the method can achieve the conversion of dance movements in folk dance videos into digital Labanotation, and the automatic generation is much more efficient than manual recording. This method can quickly record endangered folk dances and contribute to the preservation and transmission of movement-based intangible cultural heritage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data availability

The datasets generated during and analyzed during the current study will be made available upon reasonable academic request within the limitations of informed consent by the corresponding author upon acceptance.

References

  1. Jiang P, Qin XL (2018) Adaptive video keyframe extraction based on visual attention model. J Image Gr 14(8):1650–1655

    Google Scholar 

  2. He J, Zhang C, He XL et al (2020) Visual recognition of traffic police gestures with convolutional pose machine and handcrafted features. Neurocomputing 390(5):248–259. https://doi.org/10.1016/j.neucom.2019.07.103

    Article  Google Scholar 

  3. Zhang XK, Zhang RF, Liu YH (2020) Human pose estimation based on quadratic generative antagonism. Laser Optoelectron Prog 679(20):335–343. https://doi.org/10.3788/LOP57.201509

    Article  Google Scholar 

  4. Glas S, Kiesel R, Kolkmann S et al (2020) Intraday renewable electricity trading: advanced modeling and numerical optimal control. J Math Ind 10(2):49–85. https://doi.org/10.1186/s13362-020-0071-x

    Article  MathSciNet  MATH  Google Scholar 

  5. Feng GM, Liu YJ (2021) Visual algorithm for on-the-job behavior analysis. Comput Eng Des 42(6):1668–1676

    Google Scholar 

  6. Lian RM, Liu Y, Yu P et al (2019) Video based human pose detection methods and their applications. Comput Program Skills Maint 9:127–129. https://doi.org/10.3969/j.issn.1006-4052.2019.09.046

    Article  Google Scholar 

  7. Zhou KY (2021) Fitness action recognition system based on deep learning. Ind Control Comput 34(6):37–39

    Google Scholar 

  8. Baltaoglu S, Tong L, Zhao Q (2018) Algorithmic bidding for virtual trading in electricity markets. IEEE Trans Power Syst 34(21):535–543. https://doi.org/10.1109/TPWRS.2018.2862246

    Article  Google Scholar 

  9. Cai Z, Shi T (2021) Distributed query processing in the edge-assisted IoT data monitoring system. IEEE Internet Things J 8(16):12679–12693. https://doi.org/10.1109/JIOT.2020.3026988

    Article  Google Scholar 

  10. Toshev A, Szegedy C (2014) DeepPose: human pose estimation via deep neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1653–1660. https://doi.org/10.1109/CVPR.2014.214

  11. Wei S, Ramakrishna V, Kanade T et al (2016) Convolutional pose machines. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 4724–4732. https://doi.org/10.1109/CVPR.2016.511

  12. Cao Z, Simon T, Wei SE et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1302–1310. https://doi.org/10.48550/arXiv.1611.08050

  13. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision (ECCV), pp 483–499. https://doi.org/10.1007/978-3-319-46484-8_29

  14. Fang HS, Xie S, Tai YW et al (2017) RMPE: regional multi-person pose estimation. In: IEEE International conference on computer vision (ICCV), pp 2353–2362. https://doi.org/10.1109/ICCV.2017.256

  15. Sun K, Xiao B, Liu D et al (2019) Deep high-resolution representation learning for human pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5686–5696. https://doi.org/10.1109/CVPR.2019.00584

  16. Shen L, Chen Y (2020) End-to-end unlabeled human pose estimation network based on high-dimensional information encoding and decoding with feature monitoring. Acta Electron Sin 48(8):1528–1537. https://doi.org/10.3969/j.issn.0372-2112.2020.08.010

    Article  MathSciNet  Google Scholar 

  17. Xu J, Wan H, Chen ZY (2019) Sharp skirt bandpass filter-integrated single-pole double-throw switch with absorptive OFF-state. IEEE Trans Microw Theory Tech 67(2):704–711. https://doi.org/10.1109/TMTT.2018.2880914

    Article  Google Scholar 

  18. Feng T (2019) Three-dimensional human pose estimation based on monocular vision. Harbin Institute of Technology, Harbin. https://doi.org/10.27061/d.cnki.ghgdu.2019.000896

  19. Fan SR, Jia YT, Liu JH (2019) Feature selection for human pose recognition based on three-axis acceleration sensor. Chin J Appl Sci 37(03):427–436. https://doi.org/10.3969/j.issn.0255-8297.2019.03.013

    Article  Google Scholar 

  20. Kanazawa A, Black MJ, Jacobs DW et al (2018) End-to-end recovery of human shape and pose. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7122–7131. https://doi.org/10.1109/CVPR.2018.00744

  21. Mehta D, Sridhar S, Sotnychenko O et al (2017) Vnect: real-time 3d human pose estimation with a single RGB camera. ACM Trans Gr 36:44.1-44.14

    Article  Google Scholar 

  22. Cai Z, Esposito C, Dargahi T et al (2022) Graph-powered learning for social networks. Neurocomputing 501:244–245. https://doi.org/10.1016/j.neucom.2022.05.029

    Article  Google Scholar 

  23. Cai XQ, Wang T, Bai X et al (2022) Pogt: a peking opera gesture training system using infrared sensors. Int J Pattern Recognit Artif Intell 36(6):2256011. https://doi.org/10.1142/S0218001422560110

    Article  Google Scholar 

  24. Martinez J, Hossain R, Romero J et al (2017) A simple yet effective baseline for 3d human pose estimation. In: IEEE international conference on computer vision (ICCV), pp 2659–2668. https://doi.org/10.1109/ICCV.2017.288

  25. Hossain M, Little J (2018) Exploiting temporal information for 3d human pose estimation. In: European conference on computer vision (ECCV), pp 69–86. https://doi.org/10.1007/978-3-030-01249-6_5

  26. Pavllo D, Feichtenhofer C, Grangier D et al (2019) 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7753–7762.https://doi.org/10.1109/CVPR.2019.00794

  27. Hachimura K, Nakamura M (2001) Method of generating coded description of human body motion from motion-captured data. In: IEEE International workshop on robot and human interactive communication (ROMAN), pp 122–127. https://doi.org/10.1109/ROMAN.2001.981889

  28. Chen H, Qian G, James J (2005) An autonomous dance scoring system using marker-based motion capture. In: Workshop on multimedia signal processing (MMSP), pp 1–4. https://doi.org/10.1109/MMSP.2005.248666

  29. Choensawat W, Nakamura M, Hachimura K (2015) GenLaban: a tool for generating Labanotation from motion capture data. Multimed Tools Appl 74:10823–10846. https://doi.org/10.1007/s11042-014-2209-6

    Article  Google Scholar 

  30. Guo H (2015) Research on automatic generation of Labanotation based on human motion capture data. Beijing Jiaotong Univ, Beijing. https://doi.org/10.7666/d.Y2916406

    Book  Google Scholar 

  31. Guo H, Miao ZJ, Zhu FY et al (2014) Automatic labanotation generation based on human motion capture data. In: Chinese conference on pattern recognition (CCPR), pp 426–435. https://doi.org/10.1007/978-3-662-45646-0_44

  32. Zhou ZM, Miao ZJ, Wang JJ (2016) A system for automatic generation of Labanotation from motion capture data. In: International conference on signal processing (ICSP), pp 1031–1034. https://doi.org/10.1109/ICSP.2016.7877986

  33. Zhou ZM (2017) Research on automatic generation of Labanotation based on dynamic programming. Beijing Jiaotong University, Beijing

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Funding Project of Humanities and Social Sciences of the Ministry of Education in China (22YJAZH002), the Funding Project of Beijing Social Science Foundation (Nos. 19YTC043, 20YTB011). We would like to thank those who care of this paper and our projects. Also, we would like to thank everyone who spent time on reading early versions of this paper, including the anonymous reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xingquan Cai.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, X., Wang, T., Lu, R. et al. Automatic generation of Labanotation based on human pose estimation in folk dance videos. Neural Comput & Applic 35, 24755–24771 (2023). https://doi.org/10.1007/s00521-023-08206-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08206-8

Keywords

Navigation