Hand Pose Estimation with Attention-and-Sequence Network

Hu, Tianping; Wang, Wenhai; Lu, Tong

doi:10.1007/978-3-030-00776-8_51

Tianping Hu¹⁸,
Wenhai Wang¹⁸ &
Tong Lu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11164))

Included in the following conference series:

Pacific Rim Conference on Multimedia

3936 Accesses
2 Citations

Abstract

Hand pose estimation from depth images is an essential topic in computer vision. Despite the recent advancements in this area promoted by Convolutional Neural Network, accurate hand pose estimation is still a challenging problem. In this paper, we analyse the spatial relationship among hand joints, and discover that: (1) there exists independence of joints from different fingers, and (2) there also exists strong correlation among adjacent joints in the same finger. Based on this, we present a novel Attention-and-Sequence Network (ASNet) embedded with finger attention and joint sequence mechanisms. Here the finger attention mechanism is proposed to ensure the independence of joints from different fingers, while the joint sequence mechanism is employed to make use of strong correlation among adjacent joints in the same finger. The proposed ASNet achieves an average 3D error of 5.6 mm on ICVL, 10.3 mm on NYU, 7.3 mm on MSRA, and these competitive results further confirm the great effectiveness of ASNet.

T. Hu and W. Wang—Authors contributed equally

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, X., Wang, G., Guo, H., Zhang, C.: Pose guided structured region ensemble network for cascaded hand pose estimation. arXiv preprint arXiv:1708.03416 (2017)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Fourure, D., Emonet, R., Fromont, E., Muselet, D., Neverova, N., Trémeau, A., Wolf, C.: Multi-task, multi-domain learning: application to semantic segmentation and pose regression. Neurocomputing 251, 68–80 (2017)
Article Google Scholar
Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: CVPR (2016)
Google Scholar
Ge, L., Liang, H., Yuan, J., Thalmann, D.: 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: CVPR (2017)
Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: ICAIS (2011)
Google Scholar
Guo, H., Wang, G., Chen, X., Zhang, C., Qiao, F., Yang, H.: Region ensemble network: improving convolutional network for hand pose estimation. In: ICIP (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: CVPR (2017)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
Google Scholar
Ji, S., Wei, X., Yang, M., Kai, Y.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Oberweger, M., Lepetit, V.: Deepprior++: improving fast and accurate 3D hand pose estimation. In: ICCV Workshop (2017)
Google Scholar
Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807 (2015)
Oberweger, M., Wohlhart, P., Lepetit, V.: Training a feedback loop for hand pose estimation. In: ICCV (2015)
Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: CVPR (2015)
Google Scholar
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: ICML (2013)
Google Scholar
Tang, D., Jin Chang, H., Tejani, A., Kim, T.-K.: Latent regression forest: structured estimation of 3D articulated hand posture. In: CVPR (2014)
Google Scholar
Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (ToG) 33(5), 169 (2014)
Article Google Scholar
Wan, C., Probst, T., Van Gool, L., Yao, A.: Crossing Nets: combining GANs and VAEs with a shared latent space for hand pose estimation. In: CVPR (2017)
Google Scholar
Ye, Q., Yuan, S., Kim, T.-K.: Spatial attention deep net with partial PSO for hierarchical hybrid hand pose estimation. In: ECCV (2016)
Google Scholar
Zhang, Y., Xu, C., Cheng, L.: Learning to search on manifolds for 3D pose estimation of articulated objects. arXiv preprint arXiv:1612.00596 (2016)
Zhou, X., Wan, Q., Zhang, W., Xue, X., Wei, Y.: Model-based deep hand pose estimation. In: IJCAI (2016)
Google Scholar

Download references

Acknowledgements

The authors would like to thank the editor and the anonymous reviewers for their critical and constructive comments and suggestions. The work was supported by the Natural Science Foundation of China under Grant No. 61672273, No. 61272218 and No. 61321491, the Science Foundation for Distinguished Young Scholars of Jiangsu under Grant No. BK20160021, Scientific Foundation of State Grid Corporation of China (Research on Ice-wind Disaster Feature Recognition and Prediction by Few-shot Machine Learning in Transmission Lines).

Author information

Authors and Affiliations

National Key Lab for Novel Software Technology, Nanjing University, Nanjing, China
Tianping Hu, Wenhai Wang & Tong Lu

Authors

Tianping Hu
View author publications
You can also search for this author in PubMed Google Scholar
Wenhai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tong Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tong Lu .

Editor information

Editors and Affiliations

Hefei University of Technology, Hefei, China
Richang Hong
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
University of Tokyo, Tokyo, Japan
Toshihiko Yamasaki
Hefei University of Technology, Hefei, China
Meng Wang
City University of Hong Kong, Hong Kong, Hong Kong
Chong-Wah Ngo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, T., Wang, W., Lu, T. (2018). Hand Pose Estimation with Attention-and-Sequence Network. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_51

Download citation

DOI: https://doi.org/10.1007/978-3-030-00776-8_51
Published: 19 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00775-1
Online ISBN: 978-3-030-00776-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hand Pose Estimation with Attention-and-Sequence Network