Skip to main content
Log in

3D hand pose estimation using RGBD images and hybrid deep learning networks

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Hand pose estimation is one of the most attractive research areas for image processing. Among the human body parts, hands are particularly important for human–machine interactions. The advent of commercial depth cameras along with the rapid growth of deep learning has made great progress in all image processing fields, especially in hand pose estimation. In this study, using depth data, we introduce two hybrid deep neural networks to estimate 3D hand poses with fewer computations and higher accuracy compared with their counterparts. Due to the fact that the dimensions of data are reduced while passing through successive layers of networks, which causes data to be lost, we use the concept of residual network to compensate this phenomenon. By incorporating data from several views, the estimated poses are more robust in the occlusions. Evaluation results show the superiority of the proposed networks in terms of accuracy and implementation complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability Statement

Dataset is available on web.

References

  1. Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: a review. Comput. Vis. Image Underst. 108(1–2), 52–73 (2007)

    Article  Google Scholar 

  2. Doosti, B.: Hand pose estimation: a survey. arXiv preprint arXiv:1903.01013 (2019)

  3. Kakumanu, P., Makrogiannis, S., Bourbakis, N.: A survey of skin-color modeling and detection methods. Pattern Recogn. 40(3), 1106–1122 (2007)

    Article  Google Scholar 

  4. Zhang, C., Tian, Y.: Histogram of 3d facets: a depth descriptor for human action and hand gesture recognition. Comput. Vis. Image Underst. 139, 29–39 (2015)

    Article  Google Scholar 

  5. Ren, Z., Yuan, J., Meng, J., Zhang, Z.: Robust part-based hand gesture recognition using kinect sensor. IEEE Trans. Multimedia 15(5), 1110–1120 (2013)

    Article  Google Scholar 

  6. Pedersoli, F., Benini, S., Adami, N., Leonardi, R.: Xkin: an open source framework for hand pose and gesture recognition using kinect. Vis. Comput. 30(10), 1107–1122 (2014)

    Article  Google Scholar 

  7. Zhang, S., He, F.: DRCDN: learning deep residual convolutional dehazing networks. Vis. Comput. 36(9), 1797–1808 (2020)

    Article  Google Scholar 

  8. Chen, Y., He, F., Li, H., Zhang, D., Wu, Y.: A full migration bbo algorithm with enhanced population quality bounds for multimodal biomedical image registration. Appl. Soft Comput. 93, 106335 (2020)

    Article  Google Scholar 

  9. Yu, H., He, F., Pan, Y.: A scalable region-based level set method using adaptive bilateral filter for noisy image segmentation. Multimed. Tools Appl. 79(9), 5743–5765 (2020)

    Article  Google Scholar 

  10. Chen, X., He, F., Yu, H.: A matting method based on full feature coverage. Multimed. Tools Appl. 78(9), 11173–11201 (2019)

    Article  Google Scholar 

  11. Wang, R.Y., Popovic, J.: Real-time hand-tracking with a color glove. ACM Trans. Graph. (TOG) 28(3), 1–8 (2009)

    Google Scholar 

  12. Von Marcard, T., Pons-Moll, G., Rosenhahn, B.: Human pose estimation from video and imus. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1533–1547 (2016)

    Article  Google Scholar 

  13. Jung, H.Y., Suh, Y., Moon, G., Lee, K.M.: A sequential approach to 3d human pose estimation: Separation of localization and identification of body joints. In: European Conference on Computer Vision, pp. 747–761. Springer (2016)

  14. Gilbert, A., Trumble, M., Malleson, C., Hilton, A., Collomosse, J.: Fusing visual and inertial sensors with semantics for 3d human pose estimation. Int. J. Comput. Vis. 127(4), 381–397 (2019)

    Article  Google Scholar 

  15. Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (ToG) 33(5), 1–10 (2014)

    Article  Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  17. Nalepa, J., Grzejszczak, T., Kawulok, M.: Wrist localization in color images for hand gesture recognition. In: Man–Machine Interactions 3, pp. 79–86. Springer (2014)

  18. Liang, H., Yuan, J., Thalmann, D., Zhang, Z.: Model-based hand pose estimation via spatial–temporal hand parsing and 3d fingertip localization. Vis. Comput. 29(6), 837–848 (2013)

    Article  Google Scholar 

  19. Breuer, P., Eckes, C., Muller, S.: Hand gesture recognition with a novel ir time-of-flight range camera-a pilot study. In: International Conference on Computer Vision/Computer Graphics Collaboration Techniques and Applications, pp. 247–260. Springer (2007)

  20. Rasim, A., Alexander, T.: Hand detection based on skin color segmentation and classification of image local features. Tem J. 2(2), 150–155 (2013)

    Google Scholar 

  21. Vinh, T.Q., Tri, N.T.: Hand gesture recognition based on depth image using kinect sensor. In: 2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS), pp. 34–39. IEEE (2015)

  22. Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807 (2015)

  23. Supancic, J.S., Rogez, G., Yang, Y., Shotton, J., Ramanan, D.: Depthbased hand pose estimation: data, methods, and challenges. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1868–1876 (2015)

  24. Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., Fitzgibbon, A.: Learning an efficient model of hand shape variation from depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2540–2548 (2015)

  25. Tan, D.J., Cashman, T., Taylor, J., Fitzgibbon, A., Tarlow, D., Khamis, S., Izadi, S., Shotton, J.: Fits like a glove: Rapid and reliable hand shape personalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5610–5619 (2016)

  26. Oberweger, M., Lepetit, V.: Deepprior++: improving fast and accurate 3d hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 585–594 (2017)

  27. Ye, Q., Kim, T.K.: Occlusion-aware hand pose estimation using hierarchical mixture density network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–817 (2018)

  28. Ye, Q., Yuan, S., Kim, T.K.: Spatial attention deep net with partial PSO for hierarchical hybrid hand pose estimation. In: European Conference on Computer Vision, pp. 346–361. Springer (2016)

  29. Fan, Q., Shen, X., Hu, Y.: Detail-preserved real-time hand motion regression from depth. Vis. Comput. 34(9), 1145–1154 (2018)

    Article  Google Scholar 

  30. Guo, H., Wang, G., Chen, X., Zhang, C.: Towards good practices for deep 3d hand pose estimation. arXiv preprint arXiv:1707.07248 (2017)

  31. Chen, X., Wang, G., Guo, H., Zhang, C.: Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing 395, 138–149 (2020)

    Article  Google Scholar 

  32. De La Gorce, M., Fleet, D.J., Paragios, N.: Model-based 3d hand pose estimation from monocular video. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1793–1805 (2011)

    Article  Google Scholar 

  33. Wu, Y., Lin, J., Huang, T.S.: Analyzing and capturing articulated hand motion in image sequences. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1910–1922 (2005)

    Article  Google Scholar 

  34. Cai, Y., Ge, L., Cai, J., Magnenat-Thalmann, N., Yuan, J.: 3d hand pose estimation using synthetic data and weakly labeled RGB images. IEEE Trans. Pattern Anal. Mach. Intell. (2020)

  35. Tang, D., Jin Chang, H., Tejani, A., Kim, T.K.: Latent regression forest: Structured estimation of 3d articulated hand posture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3786–3793 (2014)

  36. Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using RGB and depth data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2456–2463 (2013)

  37. Xu, D., Chen, Y.L., Wu, X., Ou, Y., Xu, Y.: Integrated approach of skincolor detection and depth information for hand and face localization. In: 2011 IEEE International Conference on Robotics and Biomimetics, pp. 952–956. IEEE (2011)

  38. Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3d hand pose estimation from monocular RGB images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 666–682 (2018)

  39. Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 89–98 (2018)

  40. Zimmermann, C., Brox, T.: Learning to estimate 3d hand pose from single RGB images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4903–4911 (2017)

  41. Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D., Theobalt, C.: Ganerated hands for real-time 3d hand tracking from monocular rgb. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–59 (2018)

  42. Iqbal, U., Molchanov, P., Gall, T.B.J., Kautz, J.: Hand pose estimation via latent 2.5 d heatmap regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 118–134 (2018)

  43. Rad, M., Oberweger, M., Lepetit, V.: Domain transfer for 3d pose estimation from color images without manual annotations. In: Asian Conference on Computer Vision, pp. 69–84. Springer (2018)

  44. Simonyan, K., Zisserman, A.: Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  45. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708 (2017)

  46. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  47. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

Download references

Acknowledgements

We would like to show our gratitude to Mrs. Fahimeh Fooladgar for sharing her valuable experiences with us during this research.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hadi Seyedarabi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mofarreh-Bonab, M., Seyedarabi, H., Mozaffari Tazehkand, B. et al. 3D hand pose estimation using RGBD images and hybrid deep learning networks. Vis Comput 38, 2023–2032 (2022). https://doi.org/10.1007/s00371-021-02263-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02263-7

Keywords

Navigation