Skip to main content
Log in

Joint-wise 2D to 3D lifting for hand pose estimation from a single RGB image

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

For monocular RGB based 3D hand pose estimation task, z coordinates are more difficult to estimate compared to the 2D hand joint coordinates due to the intrinsic depth ambiguity, thus some works firstly estimate the 2D hand joint coordinates and then apply a 2D to 3D lifting module to estimate the z coordinates. In this paper, we propose a new 2D to 3D lifting module. Differ from existing methods which estimate z coordinates of all hand joints simultaneously, we propose to estimate the z coordinate of each hand joint individually with its 2D joint features and the global image features as input. It can divide the complex task into simple sub-tasks, which makes it easier to lift the 2D coordinates to 3D. Besides, our 2D to 3D lifting module use only convolutional operation with shared convolutional kernel, which has fewer network parameters compared with existing methods usually with fully connected layers. Furthermore, we introduce a new inter joint attention module in our model to learn the correlation between every two hand joints. We conduct experiments on two popular hand pose datasets. From the experimental results we can see, our model gets state-of-the-art performance compared with existing methods. Ablation study also verifies the validity of each components proposed in our model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

Data openly available in a public repository.

References

  1. Cai Y, Ge L, Cai J, Yuan J (2018) Weakly-supervised 3d hand pose estimation from monocular rgb images. In: Proceedings of the European conference on computer vision (ECCV), pp 666–682

  2. Chatzis T, Stergioulas A, Konstantinidis D, Dimitropoulos K, Daras P (2020) A comprehensive study on deep learning-based 3d hand pose estimation methods. Appl Sci 10(19):6850

    Article  Google Scholar 

  3. Chen Z, Du K, Sun Y, Lin X, Ma X (2020) Hierarchical neural network for hand pose estimation. Signal Process Image Commun 115909:87

    Google Scholar 

  4. Choi H, Moon G, Lee KM (2020) Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In: European conference on computer vision. Springer, pp 769–787

  5. Ge L, Ren Z, Li Y, Xue Z, Wang Y, Cai J, Yuan J (2019) 3d hand shape and pose estimation from a single rgb image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10833–10842

  6. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  7. Hu Z, Hu Y, Wu B, Liu J, Han D, Kurfess T (2018) Hand pose estimation with multi-scale network. Appl Intell 48(8):2501–2515

    Article  Google Scholar 

  8. Iqbal U, Molchanov P, Breuel Juergen Gall T, Kautz J (2018) Hand pose estimation via latent 2.5 d heatmap regression. In: Proceedings of the European conference on computer vision (ECCV), pp 118–134

  9. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  10. Kulon D, Guler RA, Kokkinos I, Bronstein MM, Zafeiriou S (2020) Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4990–5000

  11. Li M, Gao Y, Sang N (2021) Exploiting learnable joint groups for hand pose estimation. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 1921–1929

  12. Li R, Liu Z, Tan J (2019) A survey on 3d hand pose estimation: Cameras, methods, and datasets. Pattern Recogn 93:251–272

    Article  Google Scholar 

  13. Lin F, Wilhelm C, Martinez T (2021) Two-hand global 3d pose estimation using monocular rgb. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2373–2381

  14. Madadi M, Escalera S, Baró X, Gonzalez J (2017) End-to-end global to local cnn learning for hand pose recovery in depth data. arXiv:170509606

  15. Malik J, Abdelaziz I, Elhayek A, Shimada S, Ali SA, Golyanik V, Theobalt C, Stricker D (2020) Handvoxnet: deep voxel-based network for 3d hand shape and pose estimation from a single depth map. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7113–7122

  16. Moon G, Lee KM (2020) I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image. arXiv:200803713

  17. Mueller F, Bernard F, Sotnychenko O, Mehta D, Sridhar S, Casas D, Theobalt C (2018) Ganerated hands for real-time 3d hand tracking from monocular rgb. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49–59

  18. Panteleris P, Oikonomidis I, Argyros A (2018) Using a single rgb frame for real time 3d hand pose estimation in the wild. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 436–445

  19. Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660

  20. Spurr A, Song J, Park S, Hilliges O (2018) Cross-modal deep variational hand pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 89–98

  21. Sridhar S, Mueller F, Zollhöfer M, Casas D, Oulasvirta A, Theobalt C (2016) Real-time joint tracking of a hand manipulating an object from rgb-d input. In: European conference on computer vision. Springer, pp 294–310

  22. Sun X, Wei Y, Liang S, Tang X, Sun J (2015) Cascaded hand pose regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 824–832

  23. Sun X, Xiao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In: Proceedings of the European conference on computer vision (ECCV), pp 529–545

  24. Supančič JS, Rogez G, Yang Y, Shotton J, Ramanan D (2018) Depth-based hand pose estimation: methods, data, and challenges. Int J Comput Vis 126(11):1180–1198

    Article  Google Scholar 

  25. Tang D, Chang HJ, Tejani A, Kim TK (2017) Latent regression forest: structured estimation of 3d hand poses. IEEE Trans Pattern Anal Mach Intell 39(7):1374–1387

    Article  Google Scholar 

  26. Tang W, Wu Y (2019) Does learning specific features for related parts help human pose estimation?. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1107–1116

  27. Tompson J, Stein M, Lecun Y, Perlin K (2014) Real-time continuous pose recovery of human hands using convolutional networks. ACM Transactions on Graphics (ToG) 33(5):169

    Article  Google Scholar 

  28. Yang L, Yao A (2019) Disentangling latent hands for image synthesis and pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9877–9886

  29. Yang L, Li S, Lee D, Yao A (2019) Aligning latent spaces for 3d hand pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2335– 2343

  30. Zhang J, Jiao J, Chen M, Qu L, Xu X, Yang Q (2016) 3d hand pose tracking and estimation using stereo matching. arXiv:161007214

  31. Zhou Y, Lu J, Du K, Lin X, Sun Y, Ma X (2018) Hbe: hand branch ensemble network for real-time 3d hand pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 501–516

  32. Zimmermann C, Brox T (2017) Learning to estimate 3d hand pose from single rgb images. In: Proceedings of the IEEE international conference on computer vision, pp 4903–4911

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Sun.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Sun, Y. Joint-wise 2D to 3D lifting for hand pose estimation from a single RGB image. Appl Intell 53, 6421–6431 (2023). https://doi.org/10.1007/s10489-022-03764-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03764-1

Keywords

Navigation