Abstract
Hand pose estimation is one of the most attractive research areas for image processing. Among the human body parts, hands are particularly important for human–machine interactions. The advent of commercial depth cameras along with the rapid growth of deep learning has made great progress in all image processing fields, especially in hand pose estimation. In this study, using depth data, we introduce two hybrid deep neural networks to estimate 3D hand poses with fewer computations and higher accuracy compared with their counterparts. Due to the fact that the dimensions of data are reduced while passing through successive layers of networks, which causes data to be lost, we use the concept of residual network to compensate this phenomenon. By incorporating data from several views, the estimated poses are more robust in the occlusions. Evaluation results show the superiority of the proposed networks in terms of accuracy and implementation complexity.
Similar content being viewed by others
Data Availability Statement
Dataset is available on web.
References
Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: a review. Comput. Vis. Image Underst. 108(1–2), 52–73 (2007)
Doosti, B.: Hand pose estimation: a survey. arXiv preprint arXiv:1903.01013 (2019)
Kakumanu, P., Makrogiannis, S., Bourbakis, N.: A survey of skin-color modeling and detection methods. Pattern Recogn. 40(3), 1106–1122 (2007)
Zhang, C., Tian, Y.: Histogram of 3d facets: a depth descriptor for human action and hand gesture recognition. Comput. Vis. Image Underst. 139, 29–39 (2015)
Ren, Z., Yuan, J., Meng, J., Zhang, Z.: Robust part-based hand gesture recognition using kinect sensor. IEEE Trans. Multimedia 15(5), 1110–1120 (2013)
Pedersoli, F., Benini, S., Adami, N., Leonardi, R.: Xkin: an open source framework for hand pose and gesture recognition using kinect. Vis. Comput. 30(10), 1107–1122 (2014)
Zhang, S., He, F.: DRCDN: learning deep residual convolutional dehazing networks. Vis. Comput. 36(9), 1797–1808 (2020)
Chen, Y., He, F., Li, H., Zhang, D., Wu, Y.: A full migration bbo algorithm with enhanced population quality bounds for multimodal biomedical image registration. Appl. Soft Comput. 93, 106335 (2020)
Yu, H., He, F., Pan, Y.: A scalable region-based level set method using adaptive bilateral filter for noisy image segmentation. Multimed. Tools Appl. 79(9), 5743–5765 (2020)
Chen, X., He, F., Yu, H.: A matting method based on full feature coverage. Multimed. Tools Appl. 78(9), 11173–11201 (2019)
Wang, R.Y., Popovic, J.: Real-time hand-tracking with a color glove. ACM Trans. Graph. (TOG) 28(3), 1–8 (2009)
Von Marcard, T., Pons-Moll, G., Rosenhahn, B.: Human pose estimation from video and imus. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1533–1547 (2016)
Jung, H.Y., Suh, Y., Moon, G., Lee, K.M.: A sequential approach to 3d human pose estimation: Separation of localization and identification of body joints. In: European Conference on Computer Vision, pp. 747–761. Springer (2016)
Gilbert, A., Trumble, M., Malleson, C., Hilton, A., Collomosse, J.: Fusing visual and inertial sensors with semantics for 3d human pose estimation. Int. J. Comput. Vis. 127(4), 381–397 (2019)
Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (ToG) 33(5), 1–10 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Nalepa, J., Grzejszczak, T., Kawulok, M.: Wrist localization in color images for hand gesture recognition. In: Man–Machine Interactions 3, pp. 79–86. Springer (2014)
Liang, H., Yuan, J., Thalmann, D., Zhang, Z.: Model-based hand pose estimation via spatial–temporal hand parsing and 3d fingertip localization. Vis. Comput. 29(6), 837–848 (2013)
Breuer, P., Eckes, C., Muller, S.: Hand gesture recognition with a novel ir time-of-flight range camera-a pilot study. In: International Conference on Computer Vision/Computer Graphics Collaboration Techniques and Applications, pp. 247–260. Springer (2007)
Rasim, A., Alexander, T.: Hand detection based on skin color segmentation and classification of image local features. Tem J. 2(2), 150–155 (2013)
Vinh, T.Q., Tri, N.T.: Hand gesture recognition based on depth image using kinect sensor. In: 2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS), pp. 34–39. IEEE (2015)
Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807 (2015)
Supancic, J.S., Rogez, G., Yang, Y., Shotton, J., Ramanan, D.: Depthbased hand pose estimation: data, methods, and challenges. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1868–1876 (2015)
Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., Fitzgibbon, A.: Learning an efficient model of hand shape variation from depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2540–2548 (2015)
Tan, D.J., Cashman, T., Taylor, J., Fitzgibbon, A., Tarlow, D., Khamis, S., Izadi, S., Shotton, J.: Fits like a glove: Rapid and reliable hand shape personalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5610–5619 (2016)
Oberweger, M., Lepetit, V.: Deepprior++: improving fast and accurate 3d hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 585–594 (2017)
Ye, Q., Kim, T.K.: Occlusion-aware hand pose estimation using hierarchical mixture density network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–817 (2018)
Ye, Q., Yuan, S., Kim, T.K.: Spatial attention deep net with partial PSO for hierarchical hybrid hand pose estimation. In: European Conference on Computer Vision, pp. 346–361. Springer (2016)
Fan, Q., Shen, X., Hu, Y.: Detail-preserved real-time hand motion regression from depth. Vis. Comput. 34(9), 1145–1154 (2018)
Guo, H., Wang, G., Chen, X., Zhang, C.: Towards good practices for deep 3d hand pose estimation. arXiv preprint arXiv:1707.07248 (2017)
Chen, X., Wang, G., Guo, H., Zhang, C.: Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing 395, 138–149 (2020)
De La Gorce, M., Fleet, D.J., Paragios, N.: Model-based 3d hand pose estimation from monocular video. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1793–1805 (2011)
Wu, Y., Lin, J., Huang, T.S.: Analyzing and capturing articulated hand motion in image sequences. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1910–1922 (2005)
Cai, Y., Ge, L., Cai, J., Magnenat-Thalmann, N., Yuan, J.: 3d hand pose estimation using synthetic data and weakly labeled RGB images. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Tang, D., Jin Chang, H., Tejani, A., Kim, T.K.: Latent regression forest: Structured estimation of 3d articulated hand posture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3786–3793 (2014)
Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using RGB and depth data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2456–2463 (2013)
Xu, D., Chen, Y.L., Wu, X., Ou, Y., Xu, Y.: Integrated approach of skincolor detection and depth information for hand and face localization. In: 2011 IEEE International Conference on Robotics and Biomimetics, pp. 952–956. IEEE (2011)
Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3d hand pose estimation from monocular RGB images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 666–682 (2018)
Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 89–98 (2018)
Zimmermann, C., Brox, T.: Learning to estimate 3d hand pose from single RGB images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4903–4911 (2017)
Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D., Theobalt, C.: Ganerated hands for real-time 3d hand tracking from monocular rgb. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–59 (2018)
Iqbal, U., Molchanov, P., Gall, T.B.J., Kautz, J.: Hand pose estimation via latent 2.5 d heatmap regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 118–134 (2018)
Rad, M., Oberweger, M., Lepetit, V.: Domain transfer for 3d pose estimation from color images without manual annotations. In: Asian Conference on Computer Vision, pp. 69–84. Springer (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708 (2017)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Acknowledgements
We would like to show our gratitude to Mrs. Fahimeh Fooladgar for sharing her valuable experiences with us during this research.
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mofarreh-Bonab, M., Seyedarabi, H., Mozaffari Tazehkand, B. et al. 3D hand pose estimation using RGBD images and hybrid deep learning networks. Vis Comput 38, 2023–2032 (2022). https://doi.org/10.1007/s00371-021-02263-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-021-02263-7