MovePose: A High-Performance Human Pose Estimation Algorithm on Mobile and Edge Devices

Yu, Dongyang; Zhang, Haoyue; Zhao, Ruisheng; Chen, Guoqi; An, Wangpeng; Yang, Yanhong

doi:10.1007/978-3-031-72338-4_11

Dongyang Yu¹¹,
Haoyue Zhang¹²,
Ruisheng Zhao¹³,
Guoqi Chen¹⁴,
Wangpeng An¹⁵ &
…
Yanhong Yang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15018))

Included in the following conference series:

International Conference on Artificial Neural Networks

Abstract

We present MovePose, an optimized lightweight convolutional neural network designed specifically for real-time body pose estimation on CPU-based mobile devices. The current solutions do not provide satisfactory accuracy and speed for human posture estimation, and MovePose addresses this gap. It aims to maintain real-time performance while improving the accuracy of human posture estimation for mobile devices. Our MovePose algorithm has attained an Mean Average Precision (mAP) score of 67.7 on the COCO [10] validation dataset. The MovePose algorithm displayed efficiency with a performance of 69+ frames per second (fps) when run on an Intel i9-10920x CPU. Additionally, it showcased an increased performance of 452+ fps on an NVIDIA RTX3090 GPU. On an Android phone equipped with a Snapdragon 8 + 4G processor, the fps reached above 11. To enhance accuracy, we incorporated three techniques: deconvolution, large kernel convolution, and coordinate classification methods. Compared to basic upsampling, deconvolution is trainable, improves model capacity, and enhances the receptive field. Large kernel convolution strengthens these properties at a decreased computational cost. In summary, MovePose provides high accuracy and real-time performance, marking it a potential tool for a variety of applications, including those focused on mobile-side human posture estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693, June 2014. https://doi.org/10.1109/CVPR.2014.471
Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T., Zhang, F., Grundmann, M.: Blazepose: On-device real-time body pose tracking. arXiv preprint arXiv:2006.10204 (2020)
Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1611.08050 (2016)
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. Comput. Vision Pattern Recogn. (2017)
Google Scholar
Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5386–5395 (2020)
Google Scholar
Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: Rmpe: Regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision. pp. 2334–2343 (2017)
Google Scholar
Geng, Z., Sun, K., Xiao, B., Zhang, Z., Wang, J.: Bottom-up human pose estimation via disentangled keypoint regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14676–14686 (2021)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. arXiv preprint arXiv:1703.06870 (2017)
Li, Y., Yang, S., Liu, P., Zhang, S., Wang, Y., Wang, Z., Yang, W., Xia, S.T.: Simcc: A simple coordinate classification perspective for human pose estimation. In: European Conference on Computer Vision, pp. 89–106. Springer (2022)
Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Newell, A., Huang, Z., Deng, J.: Associative embedding: End-to-end learning for joint detection and grouping. Advances in neural information processing systems 30 (2017)
Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)
Google Scholar
Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2020)
Article Google Scholar
Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. computer vision and pattern recognition (2016)
Google Scholar
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. European Conference on Computer Vision (2018)
Google Scholar
Xu, Y., Zhang, J., Zhang, Q., Tao, D.: Vitpose: Simple vision transformer baselines for human pose estimation. arXiv preprint arXiv:2204.12484 (2022)
Yu, C., Xiao, B., Gao, C., Yuan, L., Zhang, L., Sang, N., Wang, J.: Lite-hrnet: a lightweight high-resolution network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10440–10450 (2021)
Google Scholar
Yu, D., Xie, Y., An, W., Li, Z., Yao, Y.: Joint coordinate regression and association for multi-person pose estimation, a pure neural network approach. In: Proceedings of the 5th ACM International Conference on Multimedia in Asia, pp. 1–8 (2023)
Google Scholar
Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7093–7102 (2020)
Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)

Download references

Author information

Authors and Affiliations

Beijing Rigour Technology Co., Ltd., Beijing, China
Dongyang Yu
Cornell University, Ithaca, USA
Haoyue Zhang
AiNaDoctor Inc, San Jose, USA
Ruisheng Zhao
Beijing Ronghui Jinxin Information Technology Co., Ltd., Beijing, China
Guoqi Chen
Tiktok Inc., Culver, USA
Wangpeng An
Capital University of Economics and Business, Beijing, China
Yanhong Yang

Authors

Dongyang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Haoyue Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ruisheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Guoqi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wangpeng An
View author publications
You can also search for this author in PubMed Google Scholar
Yanhong Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wangpeng An or Yanhong Yang .

Editor information

Editors and Affiliations

IDSIA USI-SUPSI, Lugano, Switzerland
Michael Wand
Comenius University, Bratislava, Slovakia
Kristína Malinovská
KAUST Center of Generative AI, Thuwal, Saudi Arabia
Jürgen Schmidhuber
Helmholtz Zentrum München, Neuherberg, Germany
Igor V. Tetko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, D., Zhang, H., Zhao, R., Chen, G., An, W., Yang, Y. (2024). MovePose: A High-Performance Human Pose Estimation Algorithm on Mobile and Edge Devices. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15018. Springer, Cham. https://doi.org/10.1007/978-3-031-72338-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-72338-4_11
Published: 17 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72337-7
Online ISBN: 978-3-031-72338-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MovePose: A High-Performance Human Pose Estimation Algorithm on Mobile and Edge Devices