Joint-wise 2D to 3D lifting for hand pose estimation from a single RGB image

Chen, Zheng; Sun, Yi

doi:10.1007/s10489-022-03764-1

Joint-wise 2D to 3D lifting for hand pose estimation from a single RGB image

Published: 08 July 2022

Volume 53, pages 6421–6431, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zheng Chen¹ &
Yi Sun¹

389 Accesses
Explore all metrics

Abstract

For monocular RGB based 3D hand pose estimation task, z coordinates are more difficult to estimate compared to the 2D hand joint coordinates due to the intrinsic depth ambiguity, thus some works firstly estimate the 2D hand joint coordinates and then apply a 2D to 3D lifting module to estimate the z coordinates. In this paper, we propose a new 2D to 3D lifting module. Differ from existing methods which estimate z coordinates of all hand joints simultaneously, we propose to estimate the z coordinate of each hand joint individually with its 2D joint features and the global image features as input. It can divide the complex task into simple sub-tasks, which makes it easier to lift the 2D coordinates to 3D. Besides, our 2D to 3D lifting module use only convolutional operation with shared convolutional kernel, which has fewer network parameters compared with existing methods usually with fully connected layers. Furthermore, we introduce a new inter joint attention module in our model to learn the correlation between every two hand joints. We conduct experiments on two popular hand pose datasets. From the experimental results we can see, our model gets state-of-the-art performance compared with existing methods. Ablation study also verifies the validity of each components proposed in our model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 4

Computer vision-based hand gesture recognition for human-robot interaction: a review

Article Open access 19 July 2023

Jing Qi, Li Ma, … Yushu Yu

Visual attention network

Article Open access 28 July 2023

Meng-Hao Guo, Cheng-Ze Lu, … Shi-Min Hu

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

Article 17 August 2020

Guoguang Du, Kai Wang, … Kaiyong Zhao

Data Availability

Data openly available in a public repository.

References

Cai Y, Ge L, Cai J, Yuan J (2018) Weakly-supervised 3d hand pose estimation from monocular rgb images. In: Proceedings of the European conference on computer vision (ECCV), pp 666–682
Chatzis T, Stergioulas A, Konstantinidis D, Dimitropoulos K, Daras P (2020) A comprehensive study on deep learning-based 3d hand pose estimation methods. Appl Sci 10(19):6850
Article Google Scholar
Chen Z, Du K, Sun Y, Lin X, Ma X (2020) Hierarchical neural network for hand pose estimation. Signal Process Image Commun 115909:87
Google Scholar
Choi H, Moon G, Lee KM (2020) Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In: European conference on computer vision. Springer, pp 769–787
Ge L, Ren Z, Li Y, Xue Z, Wang Y, Cai J, Yuan J (2019) 3d hand shape and pose estimation from a single rgb image. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10833–10842
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hu Z, Hu Y, Wu B, Liu J, Han D, Kurfess T (2018) Hand pose estimation with multi-scale network. Appl Intell 48(8):2501–2515
Article Google Scholar
Iqbal U, Molchanov P, Breuel Juergen Gall T, Kautz J (2018) Hand pose estimation via latent 2.5 d heatmap regression. In: Proceedings of the European conference on computer vision (ECCV), pp 118–134
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Kulon D, Guler RA, Kokkinos I, Bronstein MM, Zafeiriou S (2020) Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4990–5000
Li M, Gao Y, Sang N (2021) Exploiting learnable joint groups for hand pose estimation. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 1921–1929
Li R, Liu Z, Tan J (2019) A survey on 3d hand pose estimation: Cameras, methods, and datasets. Pattern Recogn 93:251–272
Article Google Scholar
Lin F, Wilhelm C, Martinez T (2021) Two-hand global 3d pose estimation using monocular rgb. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2373–2381
Madadi M, Escalera S, Baró X, Gonzalez J (2017) End-to-end global to local cnn learning for hand pose recovery in depth data. arXiv:170509606
Malik J, Abdelaziz I, Elhayek A, Shimada S, Ali SA, Golyanik V, Theobalt C, Stricker D (2020) Handvoxnet: deep voxel-based network for 3d hand shape and pose estimation from a single depth map. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7113–7122
Moon G, Lee KM (2020) I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image. arXiv:200803713
Mueller F, Bernard F, Sotnychenko O, Mehta D, Sridhar S, Casas D, Theobalt C (2018) Ganerated hands for real-time 3d hand tracking from monocular rgb. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49–59
Panteleris P, Oikonomidis I, Argyros A (2018) Using a single rgb frame for real time 3d hand pose estimation in the wild. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 436–445
Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652–660
Spurr A, Song J, Park S, Hilliges O (2018) Cross-modal deep variational hand pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 89–98
Sridhar S, Mueller F, Zollhöfer M, Casas D, Oulasvirta A, Theobalt C (2016) Real-time joint tracking of a hand manipulating an object from rgb-d input. In: European conference on computer vision. Springer, pp 294–310
Sun X, Wei Y, Liang S, Tang X, Sun J (2015) Cascaded hand pose regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 824–832
Sun X, Xiao B, Wei F, Liang S, Wei Y (2018) Integral human pose regression. In: Proceedings of the European conference on computer vision (ECCV), pp 529–545
Supančič JS, Rogez G, Yang Y, Shotton J, Ramanan D (2018) Depth-based hand pose estimation: methods, data, and challenges. Int J Comput Vis 126(11):1180–1198
Article Google Scholar
Tang D, Chang HJ, Tejani A, Kim TK (2017) Latent regression forest: structured estimation of 3d hand poses. IEEE Trans Pattern Anal Mach Intell 39(7):1374–1387
Article Google Scholar
Tang W, Wu Y (2019) Does learning specific features for related parts help human pose estimation?. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1107–1116
Tompson J, Stein M, Lecun Y, Perlin K (2014) Real-time continuous pose recovery of human hands using convolutional networks. ACM Transactions on Graphics (ToG) 33(5):169
Article Google Scholar
Yang L, Yao A (2019) Disentangling latent hands for image synthesis and pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9877–9886
Yang L, Li S, Lee D, Yao A (2019) Aligning latent spaces for 3d hand pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2335– 2343
Zhang J, Jiao J, Chen M, Qu L, Xu X, Yang Q (2016) 3d hand pose tracking and estimation using stereo matching. arXiv:161007214
Zhou Y, Lu J, Du K, Lin X, Sun Y, Ma X (2018) Hbe: hand branch ensemble network for real-time 3d hand pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 501–516
Zimmermann C, Brox T (2017) Learning to estimate 3d hand pose from single rgb images. In: Proceedings of the IEEE international conference on computer vision, pp 4903–4911

Download references

Author information

Authors and Affiliations

School of Information and Communication Engineering, Dalian University of Technology, Dalian, China
Zheng Chen & Yi Sun

Authors

Zheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yi Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Sun.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Sun, Y. Joint-wise 2D to 3D lifting for hand pose estimation from a single RGB image. Appl Intell 53, 6421–6431 (2023). https://doi.org/10.1007/s10489-022-03764-1

Download citation

Accepted: 10 May 2022
Published: 08 July 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10489-022-03764-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint-wise 2D to 3D lifting for hand pose estimation from a single RGB image

Abstract

Access this article

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

Visual attention network

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Joint-wise 2D to 3D lifting for hand pose estimation from a single RGB image

Abstract

Access this article

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

Visual attention network

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation