Hand pose estimation with multi-scale network

Hu, Zhongxu; Hu, Youmin; Wu, Bo; Liu, Jie; Han, Dongmin; Kurfess, Thomas

doi:10.1007/s10489-017-1092-z

Hand pose estimation with multi-scale network

Published: 06 December 2017

Volume 48, pages 2501–2515, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zhongxu Hu¹,
Youmin Hu¹,
Bo Wu¹,
Jie Liu¹,
Dongmin Han² &
…
Thomas Kurfess²

642 Accesses
14 Citations
Explore all metrics

Abstract

Hand pose estimation plays an important role in human-computer interaction. Because it is a problem of high-dimensional nonlinear regression, the accuracy achieved by the existing methods of hand pose estimation are still unsatisfactory. With the development of deep neural networks, more and more people have begun to adopt the method involving deep neural network.We proposed a multi-scale convolutional neural network for the single depth image of the hand. The network, which is end-to-end, directly calculates the three-dimensional coordinates of the joints of the hand,and the multi-scale structure enhances the convergence speed and the output accuracy of the network. In addition, an output function for the output layer, called Stair Rectified Linear Units, is used to limit the output value. As a result of experiments, the optimization method with momentum is found not suitable for hand pose estimation because it is a task of unstable regression. Finally our proposed method has state-of-the-art performance on the NYU Hand Pose Dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computer vision-based hand gesture recognition for human-robot interaction: a review

Article Open access 19 July 2023

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Article 03 June 2022

References

Keskin C, Kirac F, Kara YE, Akarun L (2011) Real time hand pose estimation using depth sensors. In: 2011 IEEE international conference on computer vision workshops (ICCV Workshops). IEEE, pp 1228–1234
Supancic JS, Rogez G, Yang Y, Shotton J, Ramanan D (2015) Depth-based hand pose estimation: data, methods, and challenges. In: Proceedings of the IEEE international conference on computer vision, pp 1868–1876
Oberweger M, Wohlhart P, Lepetit V (2015) Hands deep in deep learning for hand pose estimation. In: Computer vision winter workshop
Xu C, Cheng L (2013) Efficient hand pose estimation from a single depth image. In: Proceedings of the IEEE international conference on computer vision, pp 3456–3462
Kirac F, Kara Y E, Akarun L (2014) Hierarchically constrained 3D hand pose estimation using regression forests from single frame depth data. Pattern Recogn Lett 50:91–100
Article Google Scholar
Li P, Ling H, Li X, Liao C (2015) 3d hand pose estimation using randomized decision forest with segmentation index points. In: Proceedings of the IEEE international conference on computer vision, pp 819–827
Qian C, Sun X, Wei Y, Tang X, Sun J (2014) Realtime and robust hand tracking from depth. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1106–1113
Sharp T, Keskin C, Robertson D, Taylor J, Shotton J, Kim D, Freedman D (2015) Accurate, robust, and flexible real-time hand tracking. In: Proceedings of the 33rd annual ACM conference on human factors in computing system. ACM, pp 3633–3642
Sridhar S, Oulasvirta A, Theobalt C (2013) Interactive markerless articulated hand motion tracking using RGB and depth data. In: Proceedings of the IEEE international conference on computer vision, pp 2456–2463
Tzionas D, Srikantha A, Aponte P, Gall J (2014) Capturing hand motion with an RGB-D sensor, fusing a generative model with salient points. In: German conference on pattern recognition. Springer, Cham, pp 277–289
Coleca F, State A, Klement S, Barth E, Martinetz T (2015) Self-organizing maps for hand and full body tracking. Neurocomputing 147:174–184
Article Google Scholar
Tompson J, Stein M, Lecun Y, Perlin K (2014) Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans Graph (ToG) 33(5):169
Article Google Scholar
Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660
Sinha A, Choi C, Ramani K (2016) Deephand: robust hand pose estimation by completing a matrix imputed with deep features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4150–4158
Neverova N, Wolf C, Nebout F, Taylor GW (2017) Hand pose estimation through semi-supervised and weakly-supervised learning. Computer Vision and Image Understanding. In press, Corrected Proof
Rautaray S S, Agrawal A (2015) Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev 43(1):1–54
Article Google Scholar
Hasan H, Abdul-Kareem S (2014) Static hand gesture recognition using neural networks. Artif Intell Rev 1–35
Molchanov P, Gupta S, Kim K, Kautz J (2015) Hand gesture recognition with 3D convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1–7
Ozturk O, Aksac A, Ozyer T, Alhajj R (2015) Boosting real-time recognition of hand posture and gesture for virtual mouse operations with segmentation. Appl Intell 43(4):786
Article Google Scholar
Tripathi B K (2017) On the complex domain deep machine learning for face recognition. Appl Intell 1–15
Dinh D L, Lim M J, Thang N D, Lee S, Kim T S (2014) Real-time 3D human pose recovery from a single depth image using principal direction analysis. Appl Intell 41(2):473
Article Google Scholar
Keskin C, Kıraç F, Kara Y, Akarun L (2012) Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: Computer vision ICCV 2012, pp 852–863
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1725–1732
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Szegedy C, Ioffe S, Vanhoucke V, Alemi A A (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning. In: AAAI, pp 4278–4284
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Nair V, Hinton G E (2010) Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Melax S, Keselman L, Orsten S (2013) Dynamics based 3D skeletal hand tracking. In: Proceedings of graphics interface 2013. Canadian Information Processing Society, pp 63–70
Oikonomidis I, Kyriazis N, Argyros A A (2011) Efficient model-based 3D tracking of hand articulations using Kinect. In: BmVC, vol 1(2), p 3
Liang H, Wang J, Sun Q, Liu Y J, Yuan J, Luo J, He Y (2016) Barehanded music: real-time hand interaction for virtual piano. In: Proceedings of the 20th ACM SIGGRAPH symposium on interactive 3D graphics and games. ACM, pp 87–94
Tang D, Jin Chang H, Tejani A, Kim T K (2014) Latent regression forest: structured estimation of 3d articulated hand posture. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3786–3793
Sun X, Wei Y, Liang S, Tang X, Sun J (2015) Cascaded hand pose regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 824–832
Tang D, Yu T H, Kim T K (2013) Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: Proceedings of the IEEE international conference on computer vision, pp 3224–3231
Fourure D, Emonet R, Fromont E, Muselet D, Neverova N, Tremeau A, Wolf C (2017) Multi-task, multi-domain learning: application to semantic segmentation and pose regression. Neurocomputing 251:68–80
Article Google Scholar
Ge L, Liang H, Yuan J, Thalmann D (2016) Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3593–3601
Hafiz A R, Al-Nuaimi A Y, Amin M F, Murase K (2015) Classification of skeletal wireframe representation of hand gesture using complex-valued neural network. Neural Process Lett 42(3):649–664
Article Google Scholar
Taylor J, Shotton J, Sharp T, Fitzgibbon A (2012) The vitruvian manifold: inferring dense correspondences for one-shot human pose estimation. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 103–110
LeCun Y, Cortes C, Burges CJ (2010) MNIST handwritten digit database. AT&T Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2

Download references

Acknowledgements

This work was supported by the National Key Technology R&D Program of China (No.2015BAF01B00) and the National Key R&D Program of China (No.2017YFD0400405).

Author information

Authors and Affiliations

School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China
Zhongxu Hu, Youmin Hu, Bo Wu & Jie Liu
Georgia Institute of Technology, George W. Woodruff School of Mechanical Engineering, Atlanta, GA, USA
Dongmin Han & Thomas Kurfess

Authors

Zhongxu Hu
View author publications
You can also search for this author in PubMed Google Scholar
Youmin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Bo Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Dongmin Han
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Kurfess
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Wu.

Appendix

$$ \delta=\frac{\partial L}{\partial u},u^{l}=W^{l}x^{l-1}+b^{l} $$

(A.1)

For the bias b in the parameter, since ∂ u/∂ b = 1, by the chain derived rule:

$$ \frac{\partial L}{\partial b^{l}}=\frac{\partial L}{\partial u^{l}}\frac{\partial u^{l}}{\partial b^{l}}=\delta^{l} $$

(A.2)

The partial derivative of the cost function L for the weight W in the parameter:

$$ \frac{\partial L}{\partial W^{l}}=\frac{\partial L}{\partial u^{l}}\frac{\partial u^{l}}{\partial W^{l}}=\delta^{l}(x^{l-1})^{T} $$

(A.3)

The sensitivity of each layer is not the same, can be calculated:

$$\begin{array}{@{}rcl@{}} \delta^{l}&=&\frac{\partial L}{\partial u^{l}}=\frac{\partial L}{\partial u^{l + 1}}\frac{\partial u^{l + 1}}{\partial u^{l}}\\ &=&\delta^{l + 1}\frac{\partial(W^{l + 1}x^{l}+b)}{\partial u^{l}}\\ &=&\delta^{l + 1}\frac{\partial(W^{l + 1}f(u^{l})+b)}{\partial u^{l}}\\ &=&(W^{l + 1})^{T}\delta^{l + 1}\cdot f^{\prime}(u^{l}) \end{array} $$

(A.4)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, Z., Hu, Y., Wu, B. et al. Hand pose estimation with multi-scale network. Appl Intell 48, 2501–2515 (2018). https://doi.org/10.1007/s10489-017-1092-z

Download citation

Published: 06 December 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s10489-017-1092-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hand pose estimation with multi-scale network

Abstract

Access this article

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

Convolutional neural network: a review of models, methodologies and applications to object detection

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hand pose estimation with multi-scale network

Abstract

Access this article

Similar content being viewed by others

Computer vision-based hand gesture recognition for human-robot interaction: a review

Convolutional neural network: a review of models, methodologies and applications to object detection

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation