A strong benchmark for yoga action recognition based on lightweight pose estimation model

Zhou, Liangtai; Zhang, Weiwei; Zhang, Banghui; Li, Xiaobin; Zhu, Jianqing

doi:10.1007/s00530-024-01646-9

A strong benchmark for yoga action recognition based on lightweight pose estimation model

Regular Paper
Published: 14 January 2025

Volume 31, article number 66, (2025)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Liangtai Zhou¹,
Weiwei Zhang¹,
Banghui Zhang¹,
Xiaobin Li¹ &
…
Jianqing Zhu¹

116 Accesses
Explore all metrics

Abstract

Yoga action recognition is crucial for enabling precise motion analysis and providing effective training guidance, which in turn facilitates the optimization of physical health and skill enhancement. However, current methods struggle to maintain high accuracy and real-time performance when dealing with the complex poses and occlusions. Additionally, these methods neglect the dynamic characteristics and temporal sequence information inherent in yoga actions. Therefore, this paper proposes a two-stage action recognition method tailored for yoga scenarios. The method initially employs pose estimation technology based on knowledge distillation to optimize the accuracy and efficiency of lightweight models in detecting complex poses and occlusions. Subsequently, a lightweight 3D convolutional neural network (3D-CNN) is utilized for action recognition, achieving seamless integration of the two stages through heat maps, thereby enhancing recognition accuracy and precisely capturing spatiotemporal features in video sequences. Experimental results indicate that on the COCO dataset, the DistillPose-m model achieves a 2.5% improvement in Average Precision (AP) compared to RTMPose-m. In the yoga action recognition task, our model exhibites approximately a 2% improvement over traditional Graph Convolutional Network (GCN) methods on both the Deepyoga and 3Dyoga90 datasets. This study enhances the performance and accuracy of pose estimation in yoga scenarios, addressing the challenges of bodily occlusions and complex postures. By fully leveraging the spatiotemporal information inherent in yoga movements, it improves the accuracy of yoga action recognition. This research provides critical insights and support for motion training and analysis systems in other dynamic activities, such as martial arts and dance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Yoga Pose Estimation Using MoveNet Preprocessor and Deep Learning Technique

3D-Yoga: A 3D Yoga Dataset for Visual-Based Hierarchical Sports Action Analysis

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Article 03 June 2022

Data availability

No datasets were generated or analysed during the current study.

References

Govindaraj, R., Karmani, S., Varambally, S., Gangadhar, B.: Yoga and physical exercise-a review and comparison. Int. Rev. Psychiatry 28(3), 242–253 (2016)
Article Google Scholar
Duan, H., Zhao, Y., Chen, K., Lin, D., Dai, B.: Revisiting skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2969–2978 (2022)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: Openpose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2021). https://doi.org/10.1109/TPAMI.2019.2929257
Article Google Scholar
Xu, Y., Zhang, J., Zhang, Q., Tao, D.: Vitpose: Simple vision transformer baselines for human pose estimation. Adv. Neural. Inf. Process. Syst. 35, 38571–38584 (2022)
MATH Google Scholar
Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T., Zhang, F., Grundmann, M.: Blazepose: On-device real-time body pose tracking. arXiv preprint arXiv:2006.10204 (2020)
Bajpai, R., Joshi, D.: Movenet: A deep neural network for joint profile prediction across variable walking speeds and slopes. IEEE Trans. Instrum. Meas. 70, 1–11 (2021)
MATH Google Scholar
Zheng, C., Wu, W., Chen, C., Yang, T., Zhu, S., Shen, J., Kehtarnavaz, N., Shah, M.: Deep learning-based human pose estimation: A survey. ACM Comput. Surv. 56(1), 1–37 (2023)
Article MATH Google Scholar
Jiang, T., Lu, P., Zhang, L., Ma, N., Han, R., Lyu, C., Li, Y., Chen, K.: Rtmpose: Real-time multi-person pose estimation based on mmpose. arXiv e-prints, 2303 (2023)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as Points (2019)
Shi, D., Wei, X., Li, L., Ren, Y., Tan, W.: End-to-end multi-person pose estimation with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11069–11078 (2022)
Li, J., Bian, S., Zeng, A., Wang, C., Pang, B., Liu, W., Lu, C.: Human pose regression with residual log-likelihood estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11025–11034 (2021)
Ye, S., Zhang, Y., Hu, J., Cao, L., Zhang, S., Shen, L., Wang, J., Ding, S., Ji, R.: Distilpose: Tokenized pose regression with heatmap distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2163–2172 (2023)
Li, Y., Yang, S., Liu, P., Zhang, S., Wang, Y., Wang, Z., Yang, W., Xia, S.-T.: Simcc: A simple coordinate classification perspective for human pose estimation. In: European Conference on Computer Vision, pp. 89–106 (2022). Springer
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Zhang, F., Zhu, X., Ye, M.: Fast human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3517–3526 (2019)
Li, Z., Ye, J., Song, M., Huang, Y., Pan, Z.: Online knowledge distillation for efficient pose estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11740–11750 (2021)
Weinzaepfel, P., Brégier, R., Combaluzier, H., Leroy, V., Rogez, G.: Dope: Distillation of part experts for whole-body 3d pose estimation in the wild. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16, pp. 380–397 (2020). Springer
Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C.-L., Grundmann, M.: MediaPipe Hands: On-device Real-time Hand Tracking (2020)
Wu, W., Yin, W., Guo, F.: Learning and self-instruction expert system for yoga. In: 2010 2nd International Workshop on Intelligent Systems and Applications, pp. 1–4 (2010). IEEE
Luo, Z., Yang, W., Ding, Z.Q., Liu, L., Chen, I.-M., Yeo, S.H., Ling, K.V., Duh, H.B.-L.: “left arm up!” interactive yoga training in virtual environment. In: 2011 IEEE Virtual Reality Conference, pp. 261–262 (2011). IEEE
Agrawal, Y., Shah, Y., Sharma, A.: Implementation of machine learning technique for identification of yoga poses. In: 2020 IEEE 9th International Conference on Communication Systems and Network Technologies (CSNT), pp. 40–43 (2020). Ieee
Chasmai, M., Das, N., Bhardwaj, A., Garg, R.: A view independent classification framework for yoga postures. SN computer science 3(6), 476 (2022)
Article MATH Google Scholar
Liaqat, S., Dashtipour, K., Arshad, K., Assaleh, K., Ramzan, N.: A hybrid posture detection framework: Integrating machine learning and deep neural networks. IEEE Sens. J. 21(7), 9515–9522 (2021)
Article Google Scholar
Narayanan, S.S., Misra, D.K., Arora, K., Rai, H.: Yoga pose detection using deep learning techniques. In: Proceedings of the International Conference on Innovative Computing & Communication (ICICC) (2021)
Garg, S., Saxena, A., Gupta, R.: Yoga pose classification: a cnn and mediapipe inspired deep learning approach for real-world application. J. Ambient. Intell. Humaniz. Comput. 14(12), 16551–16562 (2023)
Article MATH Google Scholar
Bera, A., Nasipuri, M., Krejcar, O., Bhattacharjee, D.: Fine-grained sports, yoga, and dance postures recognition: A benchmark analysis. IEEE Transactions on Instrumentation and Measurement (2023)
Srinivasan, T.: Dynamic and static asana practices. Medknow (2016)
Ullah, A., Ahmad, J., Muhammad, K., Sajjad, M., Baik, S.W.: Action recognition in video sequences using deep bi-directional lstm with cnn features. IEEE access 6, 1155–1166 (2017)
Article MATH Google Scholar
Sun, B., Ye, X., Yan, T., Wang, Z., Li, H., Wang, Z.: Fine-grained action recognition with robust motion representation decoupling and concentration. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 4779–4788 (2022)
Kim, S.: 3DYoga90: A Hierarchical Video Dataset for Yoga Pose Understanding (2023)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12026–12035 (2019)
Duan, H., Wang, J., Chen, K., Lin, D.: Pyskl: Towards good practices for skeleton action recognition. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 7351–7354 (2022)
Chen, C.-F.R., Panda, R., Ramakrishnan, K., Feris, R., Cohn, J., Oliva, A., Fan, Q.: Deep analysis of cnn-based spatio-temporal representations for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6165–6175 (2021)
Noor, N., Park, I.K.: A lightweight skeleton-based 3d-cnn for real-time fall detection and action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2179–2188 (2023)
Sun, B., Ye, X., Wang, Z., Li, H., Wang, Z.: Exploring coarse-to-fine action token localization and interaction for fine-grained video action recognition. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 5070–5078 (2023)
Ahn, D., Kim, S., Hong, H., Ko, B.C.: Star-transformer: a spatio-temporal cross attention transformer for human action recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3330–3339 (2023)
Sun, B., Ye, X., Yan, T., Wang, Z., Li, H., Wang, Z.: Discriminative segment focus network for fine-grained video action recognition. ACM Trans. Multimed. Comput. Commun. Appl. 20(7), 1–20 (2024)
Article MATH Google Scholar
Zagoruyko, S., Komodakis, N.: Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer (2017)
Cao, Z., Simon, T., Wei, S.-E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755 (2014). Springer
Yang, J., Zeng, A., Liu, S., Li, F., Zhang, R., Zhang, L.: Explicit box detection unifies end-to-end multi-person pose estimation. In: International Conference on Learning Representations (2023). https://openreview.net/forum?id=s4WVupnJjmX
Jiang, T., Lu, P., Zhang, L., Ma, N., Han, R., Lyu, C., Li, Y., Chen, K.: RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose. arXiv (2023). https://doi.org/10.48550/ARXIV.2303.07399 . https://arxiv.org/abs/2303.07399
Yadav, S.K., Singh, A., Gupta, A., Raheja, J.L.: Real-time yoga recognition using deep learning. Neural Comput. Appl. 31, 9349–9361 (2019)
Article Google Scholar
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of Machine Learning Research 9(11) (2008)

Download references

Acknowledgements

This work was supported by the Natural Science Foundation for Outstanding Young Scholars of Fujian Province (grant number 2022J06023); Fujian Province Science and Technology Empowering Police Research Initiative (grant number 2024Y0064) and the High-level Talent Innovation and Entrepreneurship Project of Quanzhou City (grant number 2023C013R).

Author information

Authors and Affiliations

College of Engineering, Huaqiao University, Chenghua North Road, Quanzhou, 362021, Fujian, China
Liangtai Zhou, Weiwei Zhang, Banghui Zhang, Xiaobin Li & Jianqing Zhu

Authors

Liangtai Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Weiwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Banghui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobin Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianqing Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.T. Zhou was responsible for the conception of the research, data collection, experimental design and implementation, and manuscript writing. W.W. Zhang contributed to the conception of the research and experimental design, and reviewed and edited the manuscript. B.H. Zhang and X.B. Li were responsible for the analysis of experimental data and the preparation of figures. J.Q. Zhu was responsible for the analysis of experimental data and reviewed the manuscript.

Corresponding author

Correspondence to Weiwei Zhang.

Ethics declarations

Conflict of interest

All authors of this research paper declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, L., Zhang, W., Zhang, B. et al. A strong benchmark for yoga action recognition based on lightweight pose estimation model. Multimedia Systems 31, 66 (2025). https://doi.org/10.1007/s00530-024-01646-9

Download citation

Received: 03 September 2024
Accepted: 25 December 2024
Published: 14 January 2025
DOI: https://doi.org/10.1007/s00530-024-01646-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A strong benchmark for yoga action recognition based on lightweight pose estimation model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Yoga Pose Estimation Using MoveNet Preprocessor and Deep Learning Technique

3D-Yoga: A 3D Yoga Dataset for Visual-Based Hierarchical Sports Action Analysis

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A strong benchmark for yoga action recognition based on lightweight pose estimation model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Yoga Pose Estimation Using MoveNet Preprocessor and Deep Learning Technique

3D-Yoga: A 3D Yoga Dataset for Visual-Based Hierarchical Sports Action Analysis

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation