Abstract
With the emergence of consumer RGB-D sensors, discriminative modeling has been shown to perform well in estimating human body pose. However, articulated hand pose estimation remains a challenging problem, mostly due to its high flexibility, occlusions, noisy data, and small area of the fingertips. In this paper, we present an efficient discriminative-based scheme to improve the performance of hand pose estimation from a single depth image. The proposed scheme is inspired by decision forest-based framework, but with several well-motivated modifications. Specifically, we propose a method to estimate 2D in-plane orientation of the hand, which is then utilized to enforce the depth comparison features and make them invariant to in-plane rotation. Subsequently, we investigate the use of random decision forests (RDF) and mean shift algorithm to predict a primary version of hand parts and joint locations. Based on this primary prediction, an adaptive spatial clustering method is applied to correct the misclassified regions, and to deliver the final estimation of hand pose. Along with the proposed scheme, we further develop a new set of highly-distinctive features for static finger sign recognition by utilizing the estimated hand pose configurations and RGB information. The proposed features are straightforward and can effectively capture different aspects of hand pose, such as links from each joint to the closest joints and orientation of each hand part. Extensive experiments on several challenging datasets demonstrate that our approach, compared to decision forest-based methods, is able to provide more precise estimation of hand poses (with up to 21% improvement in joint localization accuracy), and can efficiently recognize more complex static finger signs (93.85% mean recognition accuracy on a challenging 34-finger sign dataset). Our approach is also robust to illumination, inter-hand occlusion, scale, and rotation variance.
Similar content being viewed by others
References
Aly W, Aly S, Almotairi S (2019) User-independent American Sign Language alphabet recognition based on depth image and PCANet features. IEEE Access 7:123138–123150
Barsoum E (2016) Articulated hand pose estimation review. arXiv:1604.06195
Bhuyan MK, MacDorman KF, Kar MK, Neog DR, Lovell BC, Gadde P (2015) Hand pose recognition from monocular images by geometrical and texture analysis. J Vis Lang Comput 28:39–55
Breiman L (2001) Random forests. Mach Learn 45:5–32
Chen X, Wang G, Guo H, Zhang C (2019) Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing
Cheng H, Dai Z, Liu Z, Zhao Y (2016) An image-to-class dynamic time warping approach for both 3D static and trajectory hand gesture recognition. Pattern Recogn 55:137–147
Cheng H, Yang L, Liu Z (2016) A survey on 3D hand gesture recognition. IEEE Trans Circ Sys Video Technol 9:1659–1673
Choi D, Cho H, Seo K, Lee S, Lee J, Ko J (2019) Designing hand pose aware virtual keyboard with hand drift tolerance. IEEE Access 7:96035–96047
Choi C, Sinha A, Choi JH, Jang S, Ramani K (2015) A collaborative filtering approach to real-time hand pose estimation. In: IEEE international conference on computer vision, pp 2336–2344
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24:603–619
Dominio F, Donadeo M, Marin G, Zanuttigh P, Cortelazzo GM (2013) Hand gesture recognition with depth data. In: 4th ACM/IEEE international workshop on analysis and retrieval of tracked events and motion in imagery stream, pp 9–16
Dominio F, Donadeo M, Zanuttigh P (2014) Combining multiple depth-based descriptors for hand gesture recognition. Pattern Recogn Lett 50:101–111
Dong C, Leu MC, Yin Z (2015) American sign language alphabet recognition using microsoft kinect. In: IEEE conference on computer vision and pattern recognition workshops, pp 44–52
Elboushaki A, Hannane R, Afdel K, Koutti L (2017) A robust approach for object matching and classification using partial dominant orientation descriptor. Pattern Recogn 64:168–186
Estrela BNS, Chavezy GC, Campos MFM (2013) Sign language recognition using partial least squares and RGB-d information. In: Visão computacional workshop (WVC)
Ferreira PM, Cardoso JS, Rebelo A (2019) On the role of multimodal learning in the recognition of sign language. Multimed Tools Appl 78:10035–10056
Fleishman S, Kliger M, Lerner A, Kutliroff G (2015) ICPIK: inverse kinematics based articulated-ICP. In: IEEE conference on computer vision and pattern recognition, pp 28–35
Ge L, Liang H, Yuan J, Thalmann D (2016) Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: IEEE conference on computer vision and pattern recognition
Grzejszczak T, Kawulok M, Galuszka A (2016) Hand landmarks detection and localization in color images. Multimed Tools Appl 75:16363–16387
Herrera D, Kannala J, Heikkilä J (2012) Joint depth and correction, color camera calibration with distortion. IEEE Trans Pattern Anal Mach Intell 34:2058–2064
Hou G, Cui R, Zhang C (2015) A real-time hand pose estimation system with retrieval. In: IEEE international conference on systems, man, and cybernetics, pp 1738–1744
Hu Z, Hu Y, Wu B, Liu J, Han D, Kurfess T (2017) Hand pose estimation with multi-scale network. Appl Intell, pp 1–15
Ji P, Song A, Xiong P, Yi P, Xu X, Li H (2017) Egocentric-vision based hand posture control system for reconnaissance robots. J Intell Robotic Sys 87:583–599
Keskin C, Kirac F, Kara YE, Akarun L (2011) Real time hand pose estimation using depth sensors. In: IEEE international conference on computer vision workshops, pp 1228–1234
Keskin C, Krac F, Kara YE, Akarun L (2012) Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: European conference on computer vision, pp 852–863
Kirac F, Kara YE, Akarun L (2014) Hierarchically constrained 3D hand pose estimation using regression forests from single frame depth data. Pattern Recogn Lett 50:91–100
Krejov P, Gilbert A, Bowden R (2015) Combining discriminative and model based approaches for hand pose estimation. In: 11th IEEE international conference and workshops on automatic face and gesture recognition, pp 1–7
Krejov P, Gilbert A, Bowden R (2017) Guided optimisation through classification and regression for hand pose estimation. Comput Vis Image Underst 155:124–138
Kuznetsova A, Taixe LL, Rosenhahn B (2013) Real-time sign language recognition using a consumer depth camera. In: IEEE international conference on computer vision workshops, pp 83–90
Li P, Ling H, Li X, Liao C (2015) 3D hand pose estimation using randomized decision forest with segmentation index points. In: IEEE international conference on computer vision, pp 819– 827
Li YT, Wachs JP (2014) HEGM: a hierarchical elastic graph matching for hand gesture recognition. Pattern Recogn 47:80–88
Liang H, Yuan J, Thalmann D (2014) Parsing the hand in depth images. IEEE Trans Multimed 16:1241–1253
Lowe DG (1999) Object recognition from local scale-invariant features. In: IEEE international conference on computer vision, pp 1150–1157
Makris A, Kyriazis N, Argyros AA (2015) Hierarchical particle filtering for 3D hand tracking. In: IEEE conference on computer vision and pattern recognition workshops, pp 8–17
Malik J, Elhayek A, Nunnari F, Stricker D (2019) Simple and effective deep hand shape and pose regression from a single depth image. Computers & Graphics 85:85–91
Martin E, Peter KH, Jörg S, Xiaowei X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Second international conference on knowledge discovery and data mining, pp 226–231
Media and Communication Lab, China, HUST American Sign Language. http://mclab.eic.hust.edu.cn/1333MClabManage/ProjDemo.aspx Accessed 06 Feb 2018
Mirehi N, Tahmasbi M, Targhi AT (2019) Hand gesture recognition using topological features. Multimed Tools Appl 78:13361–13386
Modanwal G, Sarawadekar K (2016) Towards hand gesture based writing support system for blinds. Pattern Recogn 57:50–60
Nai W, Liu Y, Rempel D, Wang Y (2017) Fast hand posture classification using depth features extracted from random line segments. Pattern Recogn 65:1–10
Oberweger M, Lepetit V (2017) Deepprior++: improving fast and accurate 3D hand pose estimation. In: IEEE international conference on computer vision, pp 585–594
Oberweger M, Wohlhart P, Lepetit V (2015) Hands deep in deep learning for hand pose estimation. In: 20th computer vision winter workshop
Oberweger M, Wohlhart P, Lepetit V (2015) Training a feedback loop for hand pose estimation. In: IEEE international conference on computer vision, pp 3316–3324
Oikonomidis I, Kyriazis N, Argyros AA (2011) Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints. In: IEEE international conference on computer vision, pp 2088–2095
Ozturk O, Aksac A, Ozyer T, Alhajj R (2015) Boosting real-time recognition of hand posture and gesture for virtual mouse operations with segmentation. Appl Intell 43:786–801
Paulo SF, Relvas F, Nicolau H, Rekik Y, Machado V, Botelho J, Mendes JJ, Grisoni L, Jorge J, Lopes DS (2019) Touchless interaction with medical images based on 3D hand cursors supported by single-foot input: a case study in dentistry. J Biomed Inform 100:103316
Pisharady PK, Saerbeck M (2015) Recent methods and databases in vision-based hand gesture recognition: a review. Comput Vis Image Underst 141:152–165
Poier G, Roditakis K, Schulter S, Michel D, Bischof H, Argyros AA (2015) Hybrid one-shot 3D hand pose estimation by exploiting uncertainties. arXiv:1510.08039
Priyal SP, Bora PK (2013) A robust static hand gesture recognition system using geometry based normalizations and Krawtchouk moments. Pattern Recogn 46:2202–2219
Pugeault N, Bowden R (2011) Spelling it out: real-time ASL fingerspelling recognition. In: IEEE international conference on computer vision workshops, pp 1114–1119
Qian C, Sun X, Wei Y, Tang X, Sun J (2014) Realtime and robust hand tracking from depth. In: IEEE conference on computer vision and pattern recognition, pp 1106–1113
Remelli E, Tkach A, Tagliasacchi A, Pauly M (2017) Low-dimensionality calibration through local anisotropic scaling for robust hand model personalization. In: IEEE international conference on computer vision, pp 2535–2543
Ren Y, Xie X, Li G, Wang Z (2016) Hand gesture recognition with multiscale weighted histogram of contour direction normalization for wearable applications. IEEE Trans Circ Sys Video Technol 28:364–377
Ren Z, Yuan J, Meng J, Zhang Z (2013) Robust part-based hand gesture recognition using kinect sensor. IEEE Trans Multimed 15:1110–1120
Rodriguez KO, Chavez GC (2013) Finger spelling recognition from RGB-d information using kernel descriptor. In: IEEE conference on graphics, patterns and images, pp 1–7
Sharp T, Keskin C, Robertson D, Taylor J, Shotton J, Kim D, Rhemann C, Leichter I, Vinnikov A, Wei Y, Freedman D, Kohli P, Krupka E, Fitzgibbon A, Izadi S (2015) Accurate, robust, and flexible real-time hand tracking. In: 33rd annual ACM conference on human factors in computing systems, pp 3633–3642
Shotton S, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth image. In: IEEE conference on computer vision and pattern recognition, pp 116–124
Shotton J, Girshick R, Fitzgibbon A, Sharp T, Cook M, Finocchio M, Moore R, Kohli P, Criminisi A, Kipman A, Blake A (2013) Efficient human pose estimation from single depth images. IEEE Trans Pattern Anal Mach Intell 35:2821–2840
Sridhar S, Mueller F, Oulasvirta A, Theobalt C (2015) Fast and robust hand tracking using detection-guided optimization. In: IEEE conference on computer vision and pattern recognition, pp 3213– 3221
Suau X, Alcoverro M, López-Méndez A, Ruiz-Hidalgo J, Casas JR (2014) Real-time fingertip localization conditioned on hand gesture classification. Image Vis Comput 32:522–532
Sun X, Wei Y, Liang S, Tang X, Sun J (2015) Cascaded hand pose regression. In: IEEE conference on computer vision and pattern recognition, pp 824–832
Supancic JS, Rogez G, Yang Y, Shotton J, Ramanan D (2015) Depth-based hand pose estimation: data, methods, and challenges. In: IEEE international conference on computer vision, pp 1868–1876
Tagliasacchi A, Schroeder M, Tkach A, Bouaziz S, Botsch M, Pauly M (2015) Robust articulated-ICP for real-time hand tracking. Computer Graphics Forum 34:101–114
Tang D, Chang HJ, Tejani A, Kim TK (2014) Latent regression forest: structured estimation of 3D articulated hand posture. In: IEEE conference on computer vision and pattern recognition, pp 3786–3793
Tang D, Taylor J, Kohli P, Keskin C, Kim TK, Shotton J (2015) Opening the black box: hierarchical sampling optimization for estimating human hand pose. In: IEEE international conference on computer vision, pp 3325–3333
Taylor J, Shotton J, Sharp T, Fitzgibbon A (2012) The vitruvian manifold: inferring dense correspondences for one-shot human pose estimation. In: IEEE conference on computer vision and pattern recognition, pp 103–110
Tkach A, Pauly M, Tagliasacchi A (2016) Sphere-meshes for real-time hand modeling and tracking. ACM Trans Graph 35:1–11
Tkach A, Tagliasacchi A, Remelli E, Pauly M, Fitzgibbon A (2017) Online generative model personalization for hand tracking. ACM Trans Graph 36:1–11
Tompson J, Stein M, Lecun Y, Perlin K (2014) Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans Graph 33:169
Wan C, Yao A, Gool LV (2016) Hand pose estimation from local surface normals. In: European conference on computer vision, pp 554–569
Xie B, He X, Li Y (2018) RGB-D static gesture recognition based on convolutional neural network. J Eng 16:1515–1520
Xu C, Cheng L (2013) Efficient hand pose estimation from a single depth image. In: IEEE international conference on computer vision, pp 3456–3462
Xu C, Nanjappa A, Zhang X, Cheng L (2016) Estimate hand poses efficiently from single depth images. Int J Comput Vis 116:21–45
Yao Y, Fu Y (2012) Real-time hand pose estimation from RGB-D sensor. In: IEEE international conference on multimedia and expo, pp 705–710
Ye Q, Yuan S, Kim TK (2016) Spatial attention deep net with partial PSO for hierarchical hybrid hand pose estimation. In: European conference on computer vision, pp 346–361
Zhang Y, Meruvia-Pastor O (2017) Virtual panels with hand gestures in immersive VR games. In: International conference on augmented reality, virtual reality and computer graphics, pp 299–308
Zhang C, Tian Y (2015) Histogram of 3D facets: a depth descriptor for human action and hand gesture recognition. Comput Vis Image Underst 139:29–39
Zhou Y, Jiang G, Lin Y (2016) A novel finger and hand pose estimation technique for real-time hand gesture recognition. Pattern Recogn 49:102–114
Zhou X, Wan Q, Zhang W, Xue X, Wei Y (2016) Model-based deep hand pose estimation. arXiv:1606.06854
Acknowledgements
The authors would like to thank the associate editors and the anonymous reviewers for their valuable and insightful comments and suggestions, which have contributed a lot towards improving the contents and presentation of this article. This work was supported by “Centre National pour la Recherche Scientifique et Technique (CNRST)” funded by Moroccan government under the Grant no: 14UIZ2015.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Elboushaki, A., Hannane, R., Afdel, K. et al. Improving articulated hand pose detection for static finger sign recognition in RGB-D images. Multimed Tools Appl 79, 28925–28969 (2020). https://doi.org/10.1007/s11042-020-09370-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09370-y