An enhanced self-attention and A2J approach for 3D hand pose estimation

Ng, Mei-Ying; Chng, Chin-Boon; Koh, Wai-Kin; Chui, Chee-Kong; Chua, Matthew Chin-Heng

doi:10.1007/s11042-021-11020-w

An enhanced self-attention and A2J approach for 3D hand pose estimation

1181: Multimedia-based Healthcare Systems using Computational Intelligence
Published: 15 June 2021

Volume 81, pages 41661–41676, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mei-Ying Ng¹,
Chin-Boon Chng ORCID: orcid.org/0000-0002-0566-6069²,
Wai-Kin Koh¹,
Chee-Kong Chui² &
…
Matthew Chin-Heng Chua¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Three dimensional (3D) hand pose estimation is the task of estimating the 3D location of hand keypoints. In recent years, this task has received much research attention due to its diverse applications in human-computer interaction and virtual reality. To the best of our knowledge, there has been limited studies that model self-attention in 3D hand pose estimation despite its use in various computer vision tasks. Hence, we propose augmenting convolution with self-attention to capture long-range dependencies in a depth image. In addition, motivated by a recent work which uses anchor points set on a depth image, we extend anchor points to the depth dimension to regress 3D hand joint locations. Validation experiments using the proposed approaches are performed on various hand pose datasets, and we obtain performances that are comparable to other state-of-the-art methods. The results demonstrate the potential of these approaches in a hand-based recognition system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Residual Attention Regression for 3D Hand Pose Estimation

Hand Pose Estimation with Attention-and-Sequence Network

Spatial Attention Deep Net with Partial PSO for Hierarchical Hybrid Hand Pose Estimation

References

Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Bello I, Zoph B, Vaswani A, Shlens J, Le QV (2019) Attention augmented convolutional networks. In: Proceedings of the IEEE international conference on computer vision. pp 3286–3295
Bouchacourt D, Mudigonda PK, Nowozin S (2016) Disco nets: Dissimilarity coefficients networks. In: Advances in neural information processing systems. pp 352–360
Cejnog LWX, Cesar RM, de Campos TE, Elui VMC (2019) Hand range of motion evaluation for rheumatoid arthritis patients. In: 2019 14th IEEE international conference on automatic face & gesture recognition (FG 2019). IEEE, pp 1–5
Chen X, Wang G, Guo H, Zhang C (2020) Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing 395:138–149
Article Google Scholar
Chen X, Wang G, Zhang C, Kim Tae-Kyun, Ji X (2018) Shpr-net: Deep semantic hand pose regression from point clouds. IEEE Access 6:43425–43439
Article Google Scholar
Deng X, Yang S, Zhang Y, Tan P, Chang L, Wang H (2017) Hand3d: Hand pose estimation using 3d neural network. arXiv:1704.02224
Fourure D, Emonet Rémi, Fromont E, Muselet D, Neverova N, Trémeau A., Wolf C (2017) Multi-task, multi-domain learning: application to semantic segmentation and pose regression. Neurocomputing 251:68–80
Article Google Scholar
Garcia-Hernando G, Yuan S, Baek S, Kim T-K (2018) First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 409–419
Ge L, Cai Y, Weng J, Yuan J (2018) Hand pointnet: 3d hand pose estimation using point sets. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8417–8426
Ge L, Liang H, Yuan J, Thalmann D (2016) Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3593–3601
Ge L, Ren Z, Yuan J (2018) Point-to-point regression pointnet for 3d hand pose estimation. In: Proceedings of the European conference on computer vision (ECCV). pp 475–491
Girdhar R, Carreira J, Doersch C, Zisserman A (2019) Video action transformer network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 244–253
Guo F, He Z, Zhang S, Zhao X, Tan J (2020) Attention-based pose sequence machine for 3d hand pose estimation. IEEE Access 8:18258–18269
Article Google Scholar
Guo H, Wang G, Chen X, Zhang C (2017) Towards good practices for deep 3d hand pose estimation. arXiv:1707.07248
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp 2961–2969
Huang L, Yuan Y, Guo J, Zhang C, Chen X, Wang J (2019) Interlaced sparse self-attention for semantic segmentation. arXiv:1907.12273
Imura S, Hosobe H (2018) A hand gesture-based method for biometric authentication. In: International conference on human-computer interaction. Springer, pp 554–566
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Kuo D u, Lin X, Yi S, Ma X (2019) Crossinfonet: Multi-task information sharing based hand pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 9896–9905
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Article Google Scholar
Li W-J, Hsieh C-Y, Lin L-F, Chu W-C (2017) Hand gesture recognition for post-stroke rehabilitation using leap motion. In: 2017 international conference on applied system innovation (ICASI). IEEE, pp 386–388
Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv:1508.04025
Madadi M, Escalera S, Baró X, Gonzalez J (2017) End-to-end global to local cnn learning for hand pose recovery in depth data. arXiv:1705.09606
Madadi M, Escalera S, Carruesco A, Andujar C, Baró X, Gonzalez J (2017) Occlusion aware hand pose recovery from sequences of depth images. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE, pp 230–237
Moon G, Ju YC, Lee KM (2018) V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In: Proceedings of the IEEE conference on computer vision and pattern Recognition. pp 5079–5088
Oberweger M, Lepetit V (2017) Deepprior+ +: Improving fast and accurate 3d hand pose estimation. In: Proceedings of the IEEE international conference on computer vision workshops. pp 585–594
Oberweger M, Wohlhart P, Lepetit V (2015) Hands deep in deep learning for hand pose estimation. arXiv:1502.06807
Parmar N, Vaswani A, Uszkoreit J, Kaiser Łukasz, Shazeer N, Alexander K u, Tran D (2018) Image transformer. arXiv:1802.05751
Poier G, Opitz M, Schinagl D, Bischof H (2019) Murauer: Mapping unlabeled real data for label austerity. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1393–1402
Ramachandran P, Parmar N, Vaswani A, Bello I, Levskaya A, Shlens J (2019) Stand-alone self-attention in vision models. arXiv:1906.05909
Ren P, Sun H, Qi Q i, Wang J, Huang W (2019) Srn: Stacked regression network for real-time 3d hand pose estimation. In: BMVC, page 112
Showers A, Si M (2018) Pointing estimation for human-robot interaction using hand pose, verbal cues, and confidence heuristics. In: International conference on social computing and social media. Springer, pp 403–412
Sun X, Wei Y, Liang S, Tang X, Sun J (2015) Cascaded hand pose regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 824–832
Tang D, Chang HJ, Tejani A, Kim T-K (2014) Latent regression forest: Structured estimation of 3d articulated hand posture. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3786–3793
Tian Y, Zhang Y, Di Z, Cheng G, Chen W-G, Wang R (2020) Triple attention network for video segmentation. Neurocomputing 417:202–211
Article Google Scholar
Tompson J, Stein M, Lecun Y, Perlin K (2014) Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans Graph (ToG) 33(5):1–10
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems. pp 5998–6008
Wan C, Probst T, Gool LV, Yao A (2017) Crossing nets: Combining gans and vaes with a shared latent space for hand pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 680–689
Wan C, Probst T, Gool LV, Yao A (2018) Dense 3d regression for hand pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5147–5156
Wang X, Jiang J, Guo Y, Kang L, Wei Y, Li D (2020) Cfam: Estimating 3d hand poses from a single rgb image with attention. Appl Sci 10(2):618
Article Google Scholar
Xiong F, Zhang B, Xiao Y, Cao Z, Yu T, Zhou JT, Yuan J (2019) A2j: Anchor-to-joint regression network for 3d articulated pose estimation from a single depth image. In: Proceedings of the IEEE international conference on computer vision. pp 793–802
Xu C, Govindarajan LN, Yu Z, Li C (2017) Lie-x: Depth image based articulated object pose estimation, tracking, and action recognition on lie groups. Int J Comput Vis 123(3):454–478
Article MathSciNet MATH Google Scholar
Yuan S, Garcia-Hernando G, Stenger B, Moon G, Ju YC, Kyoung ML, Molchanov P, Kautz J, Honari S, Ge L et al (2018) Depth-based 3d hand pose estimation: From current achievements to future goals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2636–2645
Yuan S, Qi Y, Garcia-Hernando G, Kim T-K (2017) The 2017 hands in the million challenge on 3d hand pose estimation. arXiv:1707.02237
Yuan S, Ye Q, Stenger B, Jain S, Kim T-K (2017) Bighand2. 2m benchmark: Hand pose dataset and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4866–4874
Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: International conference on machine learning. PMLR, pp 7354–7363
Zhang Y, Meruvia-Pastor O (2017) Operating virtual panels with hand gestures in immersive vr games. In: International conference on augmented reality, virtual reality and computer graphics. Springer, pp 299–308
Zhou X, Wan Q, Zhang W, Xue X, Wei Y (2016) Model-based deep hand pose estimation. arXiv:1606.06854

Download references

Acknowledgements

This study was funded by Tote Board Enabling Lives Initiative Grant (Grant Number: GC62018NUSISS) and supported by SG Enable.

Author information

Authors and Affiliations

Institute of Systems Science, National University of Singapore, Singapore, Singapore
Mei-Ying Ng, Wai-Kin Koh & Matthew Chin-Heng Chua
Department of Mechanical Engineering, Faculty of Engineering, National University of Singapore, Singapore, Singapore
Chin-Boon Chng & Chee-Kong Chui

Authors

Mei-Ying Ng
View author publications
You can also search for this author inPubMed Google Scholar
Chin-Boon Chng
View author publications
You can also search for this author inPubMed Google Scholar
Wai-Kin Koh
View author publications
You can also search for this author inPubMed Google Scholar
Chee-Kong Chui
View author publications
You can also search for this author inPubMed Google Scholar
Matthew Chin-Heng Chua
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Chin-Boon Chng.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ng, MY., Chng, CB., Koh, WK. et al. An enhanced self-attention and A2J approach for 3D hand pose estimation. Multimed Tools Appl 81, 41661–41676 (2022). https://doi.org/10.1007/s11042-021-11020-w

Download citation

Received: 05 October 2020
Revised: 17 February 2021
Accepted: 05 May 2021
Published: 15 June 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11042-021-11020-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An enhanced self-attention and A2J approach for 3D hand pose estimation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Residual Attention Regression for 3D Hand Pose Estimation

Hand Pose Estimation with Attention-and-Sequence Network

Spatial Attention Deep Net with Partial PSO for Hierarchical Hybrid Hand Pose Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now