Skip to main content
Log in

Single-shot 3D hand pose estimation using radial basis function networks trained on synthetic data

  • Theoretical advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In this work, we present a novel framework to perform single-shot hand pose estimation using depth data as input. The method follows a coarse to fine strategy and employs several radial basis function networks (RBFNs) that are trained on a dataset containing only synthetically generated depth maps. Thus, compared to most contemporary deep learning approaches, it does not require the laborious annotation of large, real-world datasets. At run time, an initialization RBFN is used to provide a rough estimation of the hand’s 3D pose. Subsequently, several specialized RBFNs are employed to improve that initial estimation in an iterative refinement scheme. To train the RBFNs, we select a set of hand poses from a real-world sequence that are as diverse as possible. We use this representative set, along with a dense sampling of all possible rotations, as a seed to generate a large synthetic training set. The method is parallelizable, taking advantage of the inherent data parallelism of RBFNs. Furthermore, the method requires few real-world data and virtually no manual annotation. We perform a quantitative evaluation of our method on a testing sequence of our own. We also present quantitative and qualitative results on a public dataset that is commonly used to evaluate hand pose estimation and tracking methods. We show that in all cases, our approach achieves promising results. Moreover, it can achieve comparable or even faster computational performance than current deep learning approaches but on a single CPU core, i.e., without requiring GPU processing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Angeline PJ (1998) Evolutionary optimization versus particle swarm optimization: philosophy and performance differences. Lecture Notes in Computer Science: Evolutionary Programming VII, 1447:601–610, ISSN: 16113349. https://doi.org/10.1007/BFb0040811

    Google Scholar 

  2. Athitsos V, Sclaroff S (2003) Estimating 3d hand pose from a cluttered image. In: CVPR. IEEE Computer Society. Los Alamitos, vol 2, p 432, http://doi.ieeecomputersociety.org/10.1109/CVPR.2003.1211500

  3. Bellon R, Choi Y, Ekker N, Lepetit V, Mike OL, Sonntag D, Tősér Z, Yoo K, Lőrincz A (2016) Model based augmentation and testing of an annotated hand pose dataset. In: Joint German/Austrian conference on artificial intelligence (Künstliche Intelligenz), Springer, pp 17–29

    Chapter  Google Scholar 

  4. Bray M, Koller-Meier E, Van Gool L (2004) Smart particle filtering for 3d hand tracking. IEEE int’l conference on automatic face and gesture recognition, pp 675–680, https://doi.org/10.1109/AFGR.2004.1301612. URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1301612

  5. Broomhead DS, Lowe D (1988) Radial basis functions, multi-variable functional interpolation and adaptive networks. Technical report, Royal Signals and Radar Establishment Malvern(United Kingdom)

  6. de Campos TE, Murray DW (2006) Regression-based Hand Pose Estimation from Multiple Cameras. In: 2006 IEEE computer society conference on computer vision and pattern recognition—Volume 1 (CVPR’06). IEEE, vol 1, pp 782–789. ISBN: 0-7695-2597-0. https://doi.org/10.1109/CVPR.2006.252. URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1640833

  7. de La Gorce M, Fleet DJ, Paragios N (2011) Model-based 3d hand pose estimation from monocular video. IEEE Trans PAMI, pp 1–15, ISSN 1939-3539. https://doi.org/10.1109/TPAMI.2011.33. URL http://www.ncbi.nlm.nih.gov/pubmed/21339527 http://www.computer.org/portal/web/csdl/doi/10.1109/TPAMI.2011.33

    Article  Google Scholar 

  8. Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings of the sixth international symposium on micro machine and human science, 1995. MHS’95. IEEE, pp 39–43

  9. Erol A, Bebis G, Nicolescu M, Boyle RD, Twombly X (2007) Vision-based hand pose estimation: a review. CVIU, 108(1-2):52–73. URL http://linkinghub.elsevier.com/retrieve/pii/S1077314206002281

    Article  Google Scholar 

  10. Fleishman S, Kliger M, Lerner A, Kutliroff G (2015) ICPIK: inverse kinematics based articulated-ICP. In: Computer vision and pattern recognition workshops, vol 2015, pp 28–35. ISBN: 9781467367592. https://doi.org/10.1109/CVPRW.2015.7301345

  11. Ge L, Liang H, Yuan J, Thalmann D (2016) Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3593–3601

  12. Ge L, Liang H, Yuan J, Thalmann D (2017) 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: Proceedings IEEE CVPR, pp 1991–2000. https://doi.org/10.1109/CVPR.2017.602

  13. Heap T, Hogg D (1996) Towards 3D hand tracking using a deformable model. In: IEEE, vol 9, pp 140–145. ISBN: 0818677139

  14. Katsavounidis I, Jay Kuo C-C, Zhang Z (1994) A new initialization technique for generalized Lloyd iteration. IEEE Signal Process Lett 1:144–146. https://doi.org/10.1109/97.329844

    Article  Google Scholar 

  15. Le HNT, Quach KG, Zhu C, Duong CN, Luu K, Savvides M (2017) Robust hand detection and classification in vehicles and in the wild. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), 2017. IEEE, pp 1203–1210

  16. Li P, Ling H, Li X, Liao C (2015) 3D hand pose estimation using randomized decision forest with segmentation index points. In: Proceedings of the IEEE international conference on computer vision, vol 2015 Inter, pp 819–827. ISBN: 9781467383912. https://doi.org/10.1109/ICCV.2015.100

  17. Makris A, Kyriazis N, Argyros AA (2015) Hierarchical particle filtering for 3D hand tracking. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), 2015. pp 8–17. ISBN: 9781467367592. https://doi.org/10.1109/CVPRW.2015.7301343. URL http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7301343

  18. Makris A, Argyros A (2015) Model-based 3D hand tracking with on-line shape adaptation. In: British machine vision conference, pp 77.1–77.12. ISBN: 1-901725-53-7. https://doi.org/10.5244/C.29.77. URL http://www.bmva.org/bmvc/2015/papers/paper077/index.html

  19. McCormick C (2013) Radial basis function network (rbfn) tutorial. http://mccormickml.com/2013/08/15/radial-basis-function-network-rbfn-tutorial

  20. Mittal A, Zisserman A, Torr PHS (2011) Hand detection using multiple proposals. In: BMVC, pp 1–11

  21. Oberweger M, Lepetit V (2017) DeepPrior++: improving fast and accurate 3D hand pose estimation. In: Proceedings of the IEEE international conference on computer vision workshop, vol 840. https://doi.org/10.1109/ICCVW.2017.75. arXiv:1708.08325

  22. Oberweger M, Wohlhart P, Lepetit V (2015a) Hands deep in deep learning for hand pose estimation. In: Computer vision winter workshop, pp 1–10. arXiv:1502.06807

  23. Oberweger M, Wohlhart P, Lepetit V (2015b) Training a feedback loop for hand pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 3316–3324

  24. Oikonomidis I, Kyriazis N, Argyros AA (2011a) Efficient model-based 3D tracking of hand articulations using kinect. In: BMVC, Dundee

  25. Oikonomidis I, Kyriazis N, Argyros AA (2011b) Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints. In: ICCV. IEEE, pp 2088–2095

  26. Panteleris P, Argyros A (2017) Back to RGB: 3D tracking of hands and hand-object interactions based on short-baseline stereo. In: Proceedings of the IEEE international conference on computer vision workshop, pp 575–584. https://doi.org/10.1109/ICCVW.2017.74

  27. Qian C, Sun X, Wei Y, Tang X, Sun J (2014) Realtime and robust hand tracking from depth. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1106–1113

  28. Rehg J, Kanade T (1994a) Visual tracking of high dof articulated structures: an application to human hand tracking. ECCV, vol 35–46. URL http://www.springerlink.com/index/MK50G5121V7N6236.pdf

  29. Rehg JM, Kanade T (1994b) Visual tracking of high dof articulated structures: an application to human hand tracking. In: ECCV. Springer

  30. Romero J, Kjellstrom H, Kragic D (2009) Monocular real-time 3d articulated hand pose estimation. In: IEEE-RAS int’l conference on humanoid robots. https://doi.org/10.1109/ICHR.2009.5379596. URLhttp://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5379596

  31. Sun X, Wei Y, Liang S, Tang X, Sun J (2015) Cascaded hand pose regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 824–832

  32. Supancic JS, Rogez G, Yang Y, Shotton J, Ramanan D (2015) Depth-based hand pose estimation: data, methods, and challenges. In: Proceedings of the IEEE international conference on computer vision, pp 1868–1876

  33. Tagliasacchi A, Schröder M, Tkach A, Bouaziz S, Botsch M, Pauly M (2015) Robust articulated-ICP for real-time hand tracking. In: Computer graphics forum

  34. Tang D, Chang HJ, Tejani A, Kim T-K (2014) Latent regression forest: structured estimation of 3d articulated hand posture. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3786–3793

  35. Tipping ME (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1(Jun):211–244

    MathSciNet  MATH  Google Scholar 

  36. Tompson J, Stein M, Lecun Y, Perlin K (2014) Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans Graph 33:169

    Article  Google Scholar 

  37. Wan C, Yao A, Van Gool L (2016) Direction matters: hand pose estimation from local surface normals. In: European conference on computer vision, pp 554–569, Springer. arXiv:1604.02657

  38. Wan C, Probst T, Van Gool L, Yao A (2017) Crossing nets: combining GANs and VAEs with a shared latent space for hand pose estimation. In: CVPR. https://doi.org/10.1109/CVPR.2017.132. arXiv:1702.03431

  39. Wang RY, Popović J (2009) Real-time hand-tracking with a color glove. ACM Trans Graph 28(3):1. ISSN: 07300301. https://doi.org/10.1145/1531326.1531369. URL http://portal.acm.org/citation.cfm?doid=1531326.1531369

    Google Scholar 

  40. Zhang J, Jiao J, Chen M, Qu L, Xu X, Yang Q (2016) 3D hand pose tracking and estimation using stereo matching. arXiv:1610.07214

Download references

Acknowledgements

This work was partially supported by the EU project Co4Robots (H2020-ICT-2016-1-731869). Also co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH – CREATE - INNOVATE (project code:T1EDK-01299 - HealthSign). The contribution of Paschalis Panteleris member of the CVRL/FORTH is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vassilis C. Nicodemou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nicodemou, V.C., Oikonomidis, I. & Argyros, A. Single-shot 3D hand pose estimation using radial basis function networks trained on synthetic data. Pattern Anal Applic 23, 415–428 (2020). https://doi.org/10.1007/s10044-019-00801-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-019-00801-7

Keywords

Navigation