Skip to main content

Ar3dHands: A Dataset and Baseline for Real-Time 3D Hand Pose Estimation from Binocular Distorted Images

  • Conference paper
  • First Online:
Image and Graphics (ICIG 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14355))

Included in the following conference series:

  • 352 Accesses

Abstract

Hand pose estimation is an important technology for realtime human-computer interaction. Most existing methods neglect the distorted images captured by wide-angle cameras and tend to have high inference latency particularly without the acceleration of Graphic Process Units (GPUs). In this paper, we propose the first large multi-view distorted hand dataset, Ar3dHands, and develop a simple but effective 3D hand pose estimation algorithm for real-time binocular distorted images which make our method compatible with the wide-angled camera system equipped in miniature visual device like AR/VR glasses. Evaluation shows that our method can achieve state-of-the-art results on several datasets with lower mean 2D end point error and can realize real-time performance on embedded devices without GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. A2j: anchor-to-joint regression network for 3D articulated pose estimation from a single depth image. In: International Conference on Computer Vision (2019)

    Google Scholar 

  2. An, S., Zhang, X., Wei, D., Zhu, H., Yang, J., Tsintotas, K.A.: Fast monocular hand pose estimation on embedded systems. Cornell University - arXiv (2021)

    Google Scholar 

  3. Bouaziz, S., Tagliasacchi, A., Schroeder, M., Botsch, M., Tkach, A.: Robust articulated-ICP for real-time hand tracking. Comput. Graph. Forum: J. Eur. Assoc. Comput. Graph. 34(5), 101–114 (2015)

    Article  Google Scholar 

  4. Chen, X., et al.: MobRecon: mobile-friendly hand mesh reconstruction from monocular image (2023)

    Google Scholar 

  5. Chen, X., et al.: Camera-space hand mesh recovery via semantic aggregation and adaptive 2D-1D registration. In: Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  6. Fang, L., Liu, X., Liu, L., Xu, H., Kang, W.: JGR-P2O: joint graph reasoning based pixel-to-offset prediction network for 3D hand pose estimation from a single depth image (2020)

    Google Scholar 

  7. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  8. Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. arXiv Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  9. Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation from single depth images using multi-view CNNs. IEEE Trans. Image Process. 27, 4422–4436 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  10. Huang, W., Ren, P., Wang, J., Sun, H.: AWR: adaptive weighting regression for 3D hand pose estimation. arXiv e-prints (2020)

    Google Scholar 

  11. Iqbal, U., Molchanov, P., Breuel, T., Gall, J., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 125–143. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_8

    Chapter  Google Scholar 

  12. Kannala, J., Brandt, S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1335–1340 (2006). https://doi.org/10.1109/TPAMI.2006.153

    Article  Google Scholar 

  13. Li, M., Gao, Y., Sang, N.: Exploiting learnable joint groups for hand pose estimation. In: National Conference on Artificial Intelligence (2021)

    Google Scholar 

  14. Meng, H., et al.: 3D interacting hand pose estimation by hand de-occlusion and removal. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13666, pp. 380–397. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20068-7_22

    Chapter  Google Scholar 

  15. Moon, G., Yu, S.I., Wen, H., Shiratori, T., Lee, K.M.: InterHand2.6M: a dataset and baseline for 3D interacting hand pose estimation from a single RGB image. arXiv Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  16. Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. Cornell University - arXiv (2017)

    Google Scholar 

  17. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    Chapter  Google Scholar 

  18. Ni, K., Steedly, D., Dellaert, F.: Out-of-core bundle adjustment for large-scale 3D reconstruction. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007). https://doi.org/10.1109/ICCV.2007.4409085

  19. Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3D tracking of hand articulations using kinect (2011)

    Google Scholar 

  20. Panteleris, P., Oikonomidis, I., Argyros, A.: Using a single RGB frame for real time 3D hand pose estimation in the wild. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (2018)

    Google Scholar 

  21. Sifre, L., Mallat, S.: Rigid-motion scattering for texture classification (2014)

    Google Scholar 

  22. Sridhar, S., Mueller, F., Zollhöfer, M., Casas, D., Oulasvirta, A., Theobalt, C.: Real-time joint tracking of a hand manipulating an object from RGB-D input. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 294–310. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_19

    Chapter  Google Scholar 

  23. Tang, X., Wang, T., Fu, C.W.: Towards accurate alignment in real-time 3D hand-mesh reconstruction. Cornell University - arXiv (2021)

    Google Scholar 

  24. Wang, Y., Zhang, B., Peng, C.: SRHandNet: real-time 2D hand pose estimation with simultaneous region localization. IEEE Trans. Image Process. 29, 2977–2986 (2020)

    Article  MATH  Google Scholar 

  25. Xin, L., Wang, K., Wei, W., Yang, L.: A multiple object tracking method using Kalman filter. In: IEEE International Conference on Information & Automation (2010)

    Google Scholar 

  26. Yifei, C., et al.: Nonparametric structure regularization machine for 2D hand pose estimation. In: IEEE Conference Proceedings (2020)

    Google Scholar 

  27. Zhang, B., et al.: Interacting two-hand 3D pose and shape reconstruction from single color image. In: International Conference on Computer Vision (2021)

    Google Scholar 

  28. Zhang, F., et al.: MediaPipe hands: on-device real-time hand tracking (2020)

    Google Scholar 

  29. Zhang, J., Jiao, J., Chen, M., Qu, L., Xu, X., Yang, Q.: A hand pose tracking benchmark from stereo matching. In: International Conference on Image Processing (2017)

    Google Scholar 

  30. Zhang, Z.: Flexible camera calibration by viewing a plane from unknown orientations. In: International Conference on Computer Vision (1999)

    Google Scholar 

  31. Zhou, Y., Habermann, M., Xu, W., Habibie, I., Xu, F.: Monocular real-time hand shape and motion capture using multi-modal data. IEEE (2020)

    Google Scholar 

  32. Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. arXiv Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  33. Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: FreiHAND: a dataset for markerless capture of hand pose and shape from single RGB images. arXiv Computer Vision and Pattern Recognition (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenxiong Kang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2618 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gan, M., Lin, Y., Liu, X., Song, W., Zeng, J., Kang, W. (2023). Ar3dHands: A Dataset and Baseline for Real-Time 3D Hand Pose Estimation from Binocular Distorted Images. In: Lu, H., et al. Image and Graphics. ICIG 2023. Lecture Notes in Computer Science, vol 14355. Springer, Cham. https://doi.org/10.1007/978-3-031-46305-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46305-1_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46304-4

  • Online ISBN: 978-3-031-46305-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics