3D hand pose estimation using RGBD images and hybrid deep learning networks

Mofarreh-Bonab, Mohammad; Seyedarabi, Hadi; Mozaffari Tazehkand, Behzad; Kasaei, Shohreh

doi:10.1007/s00371-021-02263-7

3D hand pose estimation using RGBD images and hybrid deep learning networks

Original article
Published: 24 July 2021

Volume 38, pages 2023–2032, (2022)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Mohammad Mofarreh-Bonab¹,
Hadi Seyedarabi¹,
Behzad Mozaffari Tazehkand¹ &
…
Shohreh Kasaei²

631 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Hand pose estimation is one of the most attractive research areas for image processing. Among the human body parts, hands are particularly important for human–machine interactions. The advent of commercial depth cameras along with the rapid growth of deep learning has made great progress in all image processing fields, especially in hand pose estimation. In this study, using depth data, we introduce two hybrid deep neural networks to estimate 3D hand poses with fewer computations and higher accuracy compared with their counterparts. Due to the fact that the dimensions of data are reduced while passing through successive layers of networks, which causes data to be lost, we use the concept of residual network to compensate this phenomenon. By incorporating data from several views, the estimated poses are more robust in the occlusions. Evaluation results show the superiority of the proposed networks in terms of accuracy and implementation complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hand Pose Estimation Based on Deep Learning

A graph-based approach for absolute 3D hand pose estimation using a single RGB image

Article 25 March 2022

A hybrid network for estimating 3D interacting hand pose from a single RGB image

Article 21 February 2024

Data Availability Statement

Dataset is available on web.

References

Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: a review. Comput. Vis. Image Underst. 108(1–2), 52–73 (2007)
Article Google Scholar
Doosti, B.: Hand pose estimation: a survey. arXiv preprint arXiv:1903.01013 (2019)
Kakumanu, P., Makrogiannis, S., Bourbakis, N.: A survey of skin-color modeling and detection methods. Pattern Recogn. 40(3), 1106–1122 (2007)
Article Google Scholar
Zhang, C., Tian, Y.: Histogram of 3d facets: a depth descriptor for human action and hand gesture recognition. Comput. Vis. Image Underst. 139, 29–39 (2015)
Article Google Scholar
Ren, Z., Yuan, J., Meng, J., Zhang, Z.: Robust part-based hand gesture recognition using kinect sensor. IEEE Trans. Multimedia 15(5), 1110–1120 (2013)
Article Google Scholar
Pedersoli, F., Benini, S., Adami, N., Leonardi, R.: Xkin: an open source framework for hand pose and gesture recognition using kinect. Vis. Comput. 30(10), 1107–1122 (2014)
Article Google Scholar
Zhang, S., He, F.: DRCDN: learning deep residual convolutional dehazing networks. Vis. Comput. 36(9), 1797–1808 (2020)
Article Google Scholar
Chen, Y., He, F., Li, H., Zhang, D., Wu, Y.: A full migration bbo algorithm with enhanced population quality bounds for multimodal biomedical image registration. Appl. Soft Comput. 93, 106335 (2020)
Article Google Scholar
Yu, H., He, F., Pan, Y.: A scalable region-based level set method using adaptive bilateral filter for noisy image segmentation. Multimed. Tools Appl. 79(9), 5743–5765 (2020)
Article Google Scholar
Chen, X., He, F., Yu, H.: A matting method based on full feature coverage. Multimed. Tools Appl. 78(9), 11173–11201 (2019)
Article Google Scholar
Wang, R.Y., Popovic, J.: Real-time hand-tracking with a color glove. ACM Trans. Graph. (TOG) 28(3), 1–8 (2009)
Google Scholar
Von Marcard, T., Pons-Moll, G., Rosenhahn, B.: Human pose estimation from video and imus. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1533–1547 (2016)
Article Google Scholar
Jung, H.Y., Suh, Y., Moon, G., Lee, K.M.: A sequential approach to 3d human pose estimation: Separation of localization and identification of body joints. In: European Conference on Computer Vision, pp. 747–761. Springer (2016)
Gilbert, A., Trumble, M., Malleson, C., Hilton, A., Collomosse, J.: Fusing visual and inertial sensors with semantics for 3d human pose estimation. Int. J. Comput. Vis. 127(4), 381–397 (2019)
Article Google Scholar
Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (ToG) 33(5), 1–10 (2014)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Nalepa, J., Grzejszczak, T., Kawulok, M.: Wrist localization in color images for hand gesture recognition. In: Man–Machine Interactions 3, pp. 79–86. Springer (2014)
Liang, H., Yuan, J., Thalmann, D., Zhang, Z.: Model-based hand pose estimation via spatial–temporal hand parsing and 3d fingertip localization. Vis. Comput. 29(6), 837–848 (2013)
Article Google Scholar
Breuer, P., Eckes, C., Muller, S.: Hand gesture recognition with a novel ir time-of-flight range camera-a pilot study. In: International Conference on Computer Vision/Computer Graphics Collaboration Techniques and Applications, pp. 247–260. Springer (2007)
Rasim, A., Alexander, T.: Hand detection based on skin color segmentation and classification of image local features. Tem J. 2(2), 150–155 (2013)
Google Scholar
Vinh, T.Q., Tri, N.T.: Hand gesture recognition based on depth image using kinect sensor. In: 2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS), pp. 34–39. IEEE (2015)
Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807 (2015)
Supancic, J.S., Rogez, G., Yang, Y., Shotton, J., Ramanan, D.: Depthbased hand pose estimation: data, methods, and challenges. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1868–1876 (2015)
Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., Fitzgibbon, A.: Learning an efficient model of hand shape variation from depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2540–2548 (2015)
Tan, D.J., Cashman, T., Taylor, J., Fitzgibbon, A., Tarlow, D., Khamis, S., Izadi, S., Shotton, J.: Fits like a glove: Rapid and reliable hand shape personalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5610–5619 (2016)
Oberweger, M., Lepetit, V.: Deepprior++: improving fast and accurate 3d hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 585–594 (2017)
Ye, Q., Kim, T.K.: Occlusion-aware hand pose estimation using hierarchical mixture density network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–817 (2018)
Ye, Q., Yuan, S., Kim, T.K.: Spatial attention deep net with partial PSO for hierarchical hybrid hand pose estimation. In: European Conference on Computer Vision, pp. 346–361. Springer (2016)
Fan, Q., Shen, X., Hu, Y.: Detail-preserved real-time hand motion regression from depth. Vis. Comput. 34(9), 1145–1154 (2018)
Article Google Scholar
Guo, H., Wang, G., Chen, X., Zhang, C.: Towards good practices for deep 3d hand pose estimation. arXiv preprint arXiv:1707.07248 (2017)
Chen, X., Wang, G., Guo, H., Zhang, C.: Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing 395, 138–149 (2020)
Article Google Scholar
De La Gorce, M., Fleet, D.J., Paragios, N.: Model-based 3d hand pose estimation from monocular video. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1793–1805 (2011)
Article Google Scholar
Wu, Y., Lin, J., Huang, T.S.: Analyzing and capturing articulated hand motion in image sequences. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1910–1922 (2005)
Article Google Scholar
Cai, Y., Ge, L., Cai, J., Magnenat-Thalmann, N., Yuan, J.: 3d hand pose estimation using synthetic data and weakly labeled RGB images. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Tang, D., Jin Chang, H., Tejani, A., Kim, T.K.: Latent regression forest: Structured estimation of 3d articulated hand posture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3786–3793 (2014)
Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using RGB and depth data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2456–2463 (2013)
Xu, D., Chen, Y.L., Wu, X., Ou, Y., Xu, Y.: Integrated approach of skincolor detection and depth information for hand and face localization. In: 2011 IEEE International Conference on Robotics and Biomimetics, pp. 952–956. IEEE (2011)
Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3d hand pose estimation from monocular RGB images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 666–682 (2018)
Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 89–98 (2018)
Zimmermann, C., Brox, T.: Learning to estimate 3d hand pose from single RGB images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4903–4911 (2017)
Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D., Theobalt, C.: Ganerated hands for real-time 3d hand tracking from monocular rgb. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–59 (2018)
Iqbal, U., Molchanov, P., Gall, T.B.J., Kautz, J.: Hand pose estimation via latent 2.5 d heatmap regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 118–134 (2018)
Rad, M., Oberweger, M., Lepetit, V.: Domain transfer for 3d pose estimation from color images without manual annotations. In: Asian Conference on Computer Vision, pp. 69–84. Springer (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708 (2017)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

Download references

Acknowledgements

We would like to show our gratitude to Mrs. Fahimeh Fooladgar for sharing her valuable experiences with us during this research.

Funding

Not applicable.

Author information

Authors and Affiliations

Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
Mohammad Mofarreh-Bonab, Hadi Seyedarabi & Behzad Mozaffari Tazehkand
Computer Engineering, Sharif University of Technology, Tehran, Iran
Shohreh Kasaei

Authors

Mohammad Mofarreh-Bonab
View author publications
You can also search for this author in PubMed Google Scholar
Hadi Seyedarabi
View author publications
You can also search for this author in PubMed Google Scholar
Behzad Mozaffari Tazehkand
View author publications
You can also search for this author in PubMed Google Scholar
Shohreh Kasaei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hadi Seyedarabi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mofarreh-Bonab, M., Seyedarabi, H., Mozaffari Tazehkand, B. et al. 3D hand pose estimation using RGBD images and hybrid deep learning networks. Vis Comput 38, 2023–2032 (2022). https://doi.org/10.1007/s00371-021-02263-7

Download citation

Accepted: 14 July 2021
Published: 24 July 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s00371-021-02263-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D hand pose estimation using RGBD images and hybrid deep learning networks

Abstract

Access this article

Similar content being viewed by others

Hand Pose Estimation Based on Deep Learning

A graph-based approach for absolute 3D hand pose estimation using a single RGB image

A hybrid network for estimating 3D interacting hand pose from a single RGB image

Data Availability Statement

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

3D hand pose estimation using RGBD images and hybrid deep learning networks

Abstract

Access this article

Similar content being viewed by others

Hand Pose Estimation Based on Deep Learning

A graph-based approach for absolute 3D hand pose estimation using a single RGB image

A hybrid network for estimating 3D interacting hand pose from a single RGB image

Data Availability Statement

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation