Skip to main content

Hand Pose Estimation with Attention-and-Sequence Network

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing – PCM 2018 (PCM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11164))

Included in the following conference series:

Abstract

Hand pose estimation from depth images is an essential topic in computer vision. Despite the recent advancements in this area promoted by Convolutional Neural Network, accurate hand pose estimation is still a challenging problem. In this paper, we analyse the spatial relationship among hand joints, and discover that: (1) there exists independence of joints from different fingers, and (2) there also exists strong correlation among adjacent joints in the same finger. Based on this, we present a novel Attention-and-Sequence Network (ASNet) embedded with finger attention and joint sequence mechanisms. Here the finger attention mechanism is proposed to ensure the independence of joints from different fingers, while the joint sequence mechanism is employed to make use of strong correlation among adjacent joints in the same finger. The proposed ASNet achieves an average 3D error of 5.6 mm on ICVL, 10.3 mm on NYU, 7.3 mm on MSRA, and these competitive results further confirm the great effectiveness of ASNet.

T. Hu and W. Wang—Authors contributed equally

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, X., Wang, G., Guo, H., Zhang, C.: Pose guided structured region ensemble network for cascaded hand pose estimation. arXiv preprint arXiv:1708.03416 (2017)

  2. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  3. Fourure, D., Emonet, R., Fromont, E., Muselet, D., Neverova, N., Trémeau, A., Wolf, C.: Multi-task, multi-domain learning: application to semantic segmentation and pose regression. Neurocomputing 251, 68–80 (2017)

    Article  Google Scholar 

  4. Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: CVPR (2016)

    Google Scholar 

  5. Ge, L., Liang, H., Yuan, J., Thalmann, D.: 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: CVPR (2017)

    Google Scholar 

  6. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: ICAIS (2011)

    Google Scholar 

  7. Guo, H., Wang, G., Chen, X., Zhang, C., Qiao, F., Yang, H.: Region ensemble network: improving convolutional network for hand pose estimation. In: ICIP (2017)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  9. Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: CVPR (2017)

    Google Scholar 

  10. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)

    Google Scholar 

  11. Ji, S., Wei, X., Yang, M., Kai, Y.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)

    Article  Google Scholar 

  12. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  13. Oberweger, M., Lepetit, V.: Deepprior++: improving fast and accurate 3D hand pose estimation. In: ICCV Workshop (2017)

    Google Scholar 

  14. Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807 (2015)

  15. Oberweger, M., Wohlhart, P., Lepetit, V.: Training a feedback loop for hand pose estimation. In: ICCV (2015)

    Google Scholar 

  16. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)

  17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

    Google Scholar 

  18. Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: CVPR (2015)

    Google Scholar 

  19. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: ICML (2013)

    Google Scholar 

  20. Tang, D., Jin Chang, H., Tejani, A., Kim, T.-K.: Latent regression forest: structured estimation of 3D articulated hand posture. In: CVPR (2014)

    Google Scholar 

  21. Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (ToG) 33(5), 169 (2014)

    Article  Google Scholar 

  22. Wan, C., Probst, T., Van Gool, L., Yao, A.: Crossing Nets: combining GANs and VAEs with a shared latent space for hand pose estimation. In: CVPR (2017)

    Google Scholar 

  23. Ye, Q., Yuan, S., Kim, T.-K.: Spatial attention deep net with partial PSO for hierarchical hybrid hand pose estimation. In: ECCV (2016)

    Google Scholar 

  24. Zhang, Y., Xu, C., Cheng, L.: Learning to search on manifolds for 3D pose estimation of articulated objects. arXiv preprint arXiv:1612.00596 (2016)

  25. Zhou, X., Wan, Q., Zhang, W., Xue, X., Wei, Y.: Model-based deep hand pose estimation. In: IJCAI (2016)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editor and the anonymous reviewers for their critical and constructive comments and suggestions. The work was supported by the Natural Science Foundation of China under Grant No. 61672273, No. 61272218 and No. 61321491, the Science Foundation for Distinguished Young Scholars of Jiangsu under Grant No. BK20160021, Scientific Foundation of State Grid Corporation of China (Research on Ice-wind Disaster Feature Recognition and Prediction by Few-shot Machine Learning in Transmission Lines).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tong Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hu, T., Wang, W., Lu, T. (2018). Hand Pose Estimation with Attention-and-Sequence Network. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00776-8_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00775-1

  • Online ISBN: 978-3-030-00776-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics