Abstract
In this work, we present a novel framework to perform single-shot hand pose estimation using depth data as input. The method follows a coarse to fine strategy and employs several radial basis function networks (RBFNs) that are trained on a dataset containing only synthetically generated depth maps. Thus, compared to most contemporary deep learning approaches, it does not require the laborious annotation of large, real-world datasets. At run time, an initialization RBFN is used to provide a rough estimation of the hand’s 3D pose. Subsequently, several specialized RBFNs are employed to improve that initial estimation in an iterative refinement scheme. To train the RBFNs, we select a set of hand poses from a real-world sequence that are as diverse as possible. We use this representative set, along with a dense sampling of all possible rotations, as a seed to generate a large synthetic training set. The method is parallelizable, taking advantage of the inherent data parallelism of RBFNs. Furthermore, the method requires few real-world data and virtually no manual annotation. We perform a quantitative evaluation of our method on a testing sequence of our own. We also present quantitative and qualitative results on a public dataset that is commonly used to evaluate hand pose estimation and tracking methods. We show that in all cases, our approach achieves promising results. Moreover, it can achieve comparable or even faster computational performance than current deep learning approaches but on a single CPU core, i.e., without requiring GPU processing.









Similar content being viewed by others
References
Angeline PJ (1998) Evolutionary optimization versus particle swarm optimization: philosophy and performance differences. Lecture Notes in Computer Science: Evolutionary Programming VII, 1447:601–610, ISSN: 16113349. https://doi.org/10.1007/BFb0040811
Athitsos V, Sclaroff S (2003) Estimating 3d hand pose from a cluttered image. In: CVPR. IEEE Computer Society. Los Alamitos, vol 2, p 432, http://doi.ieeecomputersociety.org/10.1109/CVPR.2003.1211500
Bellon R, Choi Y, Ekker N, Lepetit V, Mike OL, Sonntag D, Tősér Z, Yoo K, Lőrincz A (2016) Model based augmentation and testing of an annotated hand pose dataset. In: Joint German/Austrian conference on artificial intelligence (Künstliche Intelligenz), Springer, pp 17–29
Bray M, Koller-Meier E, Van Gool L (2004) Smart particle filtering for 3d hand tracking. IEEE int’l conference on automatic face and gesture recognition, pp 675–680, https://doi.org/10.1109/AFGR.2004.1301612. URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1301612
Broomhead DS, Lowe D (1988) Radial basis functions, multi-variable functional interpolation and adaptive networks. Technical report, Royal Signals and Radar Establishment Malvern(United Kingdom)
de Campos TE, Murray DW (2006) Regression-based Hand Pose Estimation from Multiple Cameras. In: 2006 IEEE computer society conference on computer vision and pattern recognition—Volume 1 (CVPR’06). IEEE, vol 1, pp 782–789. ISBN: 0-7695-2597-0. https://doi.org/10.1109/CVPR.2006.252. URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1640833
de La Gorce M, Fleet DJ, Paragios N (2011) Model-based 3d hand pose estimation from monocular video. IEEE Trans PAMI, pp 1–15, ISSN 1939-3539. https://doi.org/10.1109/TPAMI.2011.33. URL http://www.ncbi.nlm.nih.gov/pubmed/21339527 http://www.computer.org/portal/web/csdl/doi/10.1109/TPAMI.2011.33
Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the sixth international symposium on micro machine and human science, 1995. MHS’95. IEEE, pp 39–43
Erol A, Bebis G, Nicolescu M, Boyle RD, Twombly X (2007) Vision-based hand pose estimation: a review. CVIU, 108(1-2):52–73. URL http://linkinghub.elsevier.com/retrieve/pii/S1077314206002281
Fleishman S, Kliger M, Lerner A, Kutliroff G (2015) ICPIK: inverse kinematics based articulated-ICP. In: Computer vision and pattern recognition workshops, vol 2015, pp 28–35. ISBN: 9781467367592. https://doi.org/10.1109/CVPRW.2015.7301345
Ge L, Liang H, Yuan J, Thalmann D (2016) Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3593–3601
Ge L, Liang H, Yuan J, Thalmann D (2017) 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: Proceedings IEEE CVPR, pp 1991–2000. https://doi.org/10.1109/CVPR.2017.602
Heap T, Hogg D (1996) Towards 3D hand tracking using a deformable model. In: IEEE, vol 9, pp 140–145. ISBN: 0818677139
Katsavounidis I, Jay Kuo C-C, Zhang Z (1994) A new initialization technique for generalized Lloyd iteration. IEEE Signal Process Lett 1:144–146. https://doi.org/10.1109/97.329844
Le HNT, Quach KG, Zhu C, Duong CN, Luu K, Savvides M (2017) Robust hand detection and classification in vehicles and in the wild. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), 2017. IEEE, pp 1203–1210
Li P, Ling H, Li X, Liao C (2015) 3D hand pose estimation using randomized decision forest with segmentation index points. In: Proceedings of the IEEE international conference on computer vision, vol 2015 Inter, pp 819–827. ISBN: 9781467383912. https://doi.org/10.1109/ICCV.2015.100
Makris A, Kyriazis N, Argyros AA (2015) Hierarchical particle filtering for 3D hand tracking. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), 2015. pp 8–17. ISBN: 9781467367592. https://doi.org/10.1109/CVPRW.2015.7301343. URL http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7301343
Makris A, Argyros A (2015) Model-based 3D hand tracking with on-line shape adaptation. In: British machine vision conference, pp 77.1–77.12. ISBN: 1-901725-53-7. https://doi.org/10.5244/C.29.77. URL http://www.bmva.org/bmvc/2015/papers/paper077/index.html
McCormick C (2013) Radial basis function network (rbfn) tutorial. http://mccormickml.com/2013/08/15/radial-basis-function-network-rbfn-tutorial
Mittal A, Zisserman A, Torr PHS (2011) Hand detection using multiple proposals. In: BMVC, pp 1–11
Oberweger M, Lepetit V (2017) DeepPrior++: improving fast and accurate 3D hand pose estimation. In: Proceedings of the IEEE international conference on computer vision workshop, vol 840. https://doi.org/10.1109/ICCVW.2017.75. arXiv:1708.08325
Oberweger M, Wohlhart P, Lepetit V (2015a) Hands deep in deep learning for hand pose estimation. In: Computer vision winter workshop, pp 1–10. arXiv:1502.06807
Oberweger M, Wohlhart P, Lepetit V (2015b) Training a feedback loop for hand pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 3316–3324
Oikonomidis I, Kyriazis N, Argyros AA (2011a) Efficient model-based 3D tracking of hand articulations using kinect. In: BMVC, Dundee
Oikonomidis I, Kyriazis N, Argyros AA (2011b) Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints. In: ICCV. IEEE, pp 2088–2095
Panteleris P, Argyros A (2017) Back to RGB: 3D tracking of hands and hand-object interactions based on short-baseline stereo. In: Proceedings of the IEEE international conference on computer vision workshop, pp 575–584. https://doi.org/10.1109/ICCVW.2017.74
Qian C, Sun X, Wei Y, Tang X, Sun J (2014) Realtime and robust hand tracking from depth. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1106–1113
Rehg J, Kanade T (1994a) Visual tracking of high dof articulated structures: an application to human hand tracking. ECCV, vol 35–46. URL http://www.springerlink.com/index/MK50G5121V7N6236.pdf
Rehg JM, Kanade T (1994b) Visual tracking of high dof articulated structures: an application to human hand tracking. In: ECCV. Springer
Romero J, Kjellstrom H, Kragic D (2009) Monocular real-time 3d articulated hand pose estimation. In: IEEE-RAS int’l conference on humanoid robots. https://doi.org/10.1109/ICHR.2009.5379596. URLhttp://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5379596
Sun X, Wei Y, Liang S, Tang X, Sun J (2015) Cascaded hand pose regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 824–832
Supancic JS, Rogez G, Yang Y, Shotton J, Ramanan D (2015) Depth-based hand pose estimation: data, methods, and challenges. In: Proceedings of the IEEE international conference on computer vision, pp 1868–1876
Tagliasacchi A, Schröder M, Tkach A, Bouaziz S, Botsch M, Pauly M (2015) Robust articulated-ICP for real-time hand tracking. In: Computer graphics forum
Tang D, Chang HJ, Tejani A, Kim T-K (2014) Latent regression forest: structured estimation of 3d articulated hand posture. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3786–3793
Tipping ME (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1(Jun):211–244
Tompson J, Stein M, Lecun Y, Perlin K (2014) Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans Graph 33:169
Wan C, Yao A, Van Gool L (2016) Direction matters: hand pose estimation from local surface normals. In: European conference on computer vision, pp 554–569, Springer. arXiv:1604.02657
Wan C, Probst T, Van Gool L, Yao A (2017) Crossing nets: combining GANs and VAEs with a shared latent space for hand pose estimation. In: CVPR. https://doi.org/10.1109/CVPR.2017.132. arXiv:1702.03431
Wang RY, Popović J (2009) Real-time hand-tracking with a color glove. ACM Trans Graph 28(3):1. ISSN: 07300301. https://doi.org/10.1145/1531326.1531369. URL http://portal.acm.org/citation.cfm?doid=1531326.1531369
Zhang J, Jiao J, Chen M, Qu L, Xu X, Yang Q (2016) 3D hand pose tracking and estimation using stereo matching. arXiv:1610.07214
Acknowledgements
This work was partially supported by the EU project Co4Robots (H2020-ICT-2016-1-731869). Also co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH – CREATE - INNOVATE (project code:T1EDK-01299 - HealthSign). The contribution of Paschalis Panteleris member of the CVRL/FORTH is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nicodemou, V.C., Oikonomidis, I. & Argyros, A. Single-shot 3D hand pose estimation using radial basis function networks trained on synthetic data. Pattern Anal Applic 23, 415–428 (2020). https://doi.org/10.1007/s10044-019-00801-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-019-00801-7