Abstract
Hand pose estimation is an important technology for realtime human-computer interaction. Most existing methods neglect the distorted images captured by wide-angle cameras and tend to have high inference latency particularly without the acceleration of Graphic Process Units (GPUs). In this paper, we propose the first large multi-view distorted hand dataset, Ar3dHands, and develop a simple but effective 3D hand pose estimation algorithm for real-time binocular distorted images which make our method compatible with the wide-angled camera system equipped in miniature visual device like AR/VR glasses. Evaluation shows that our method can achieve state-of-the-art results on several datasets with lower mean 2D end point error and can realize real-time performance on embedded devices without GPUs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
A2j: anchor-to-joint regression network for 3D articulated pose estimation from a single depth image. In: International Conference on Computer Vision (2019)
An, S., Zhang, X., Wei, D., Zhu, H., Yang, J., Tsintotas, K.A.: Fast monocular hand pose estimation on embedded systems. Cornell University - arXiv (2021)
Bouaziz, S., Tagliasacchi, A., Schroeder, M., Botsch, M., Tkach, A.: Robust articulated-ICP for real-time hand tracking. Comput. Graph. Forum: J. Eur. Assoc. Comput. Graph. 34(5), 101–114 (2015)
Chen, X., et al.: MobRecon: mobile-friendly hand mesh reconstruction from monocular image (2023)
Chen, X., et al.: Camera-space hand mesh recovery via semantic aggregation and adaptive 2D-1D registration. In: Computer Vision and Pattern Recognition (2021)
Fang, L., Liu, X., Liu, L., Xu, H., Kang, W.: JGR-P2O: joint graph reasoning based pixel-to-offset prediction network for 3D hand pose estimation from a single depth image (2020)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. arXiv Computer Vision and Pattern Recognition (2017)
Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation from single depth images using multi-view CNNs. IEEE Trans. Image Process. 27, 4422–4436 (2018)
Huang, W., Ren, P., Wang, J., Sun, H.: AWR: adaptive weighting regression for 3D hand pose estimation. arXiv e-prints (2020)
Iqbal, U., Molchanov, P., Breuel, T., Gall, J., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 125–143. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_8
Kannala, J., Brandt, S.: A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1335–1340 (2006). https://doi.org/10.1109/TPAMI.2006.153
Li, M., Gao, Y., Sang, N.: Exploiting learnable joint groups for hand pose estimation. In: National Conference on Artificial Intelligence (2021)
Meng, H., et al.: 3D interacting hand pose estimation by hand de-occlusion and removal. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13666, pp. 380–397. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20068-7_22
Moon, G., Yu, S.I., Wen, H., Shiratori, T., Lee, K.M.: InterHand2.6M: a dataset and baseline for 3D interacting hand pose estimation from a single RGB image. arXiv Computer Vision and Pattern Recognition (2020)
Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. Cornell University - arXiv (2017)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Ni, K., Steedly, D., Dellaert, F.: Out-of-core bundle adjustment for large-scale 3D reconstruction. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007). https://doi.org/10.1109/ICCV.2007.4409085
Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3D tracking of hand articulations using kinect (2011)
Panteleris, P., Oikonomidis, I., Argyros, A.: Using a single RGB frame for real time 3D hand pose estimation in the wild. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (2018)
Sifre, L., Mallat, S.: Rigid-motion scattering for texture classification (2014)
Sridhar, S., Mueller, F., Zollhöfer, M., Casas, D., Oulasvirta, A., Theobalt, C.: Real-time joint tracking of a hand manipulating an object from RGB-D input. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 294–310. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_19
Tang, X., Wang, T., Fu, C.W.: Towards accurate alignment in real-time 3D hand-mesh reconstruction. Cornell University - arXiv (2021)
Wang, Y., Zhang, B., Peng, C.: SRHandNet: real-time 2D hand pose estimation with simultaneous region localization. IEEE Trans. Image Process. 29, 2977–2986 (2020)
Xin, L., Wang, K., Wei, W., Yang, L.: A multiple object tracking method using Kalman filter. In: IEEE International Conference on Information & Automation (2010)
Yifei, C., et al.: Nonparametric structure regularization machine for 2D hand pose estimation. In: IEEE Conference Proceedings (2020)
Zhang, B., et al.: Interacting two-hand 3D pose and shape reconstruction from single color image. In: International Conference on Computer Vision (2021)
Zhang, F., et al.: MediaPipe hands: on-device real-time hand tracking (2020)
Zhang, J., Jiao, J., Chen, M., Qu, L., Xu, X., Yang, Q.: A hand pose tracking benchmark from stereo matching. In: International Conference on Image Processing (2017)
Zhang, Z.: Flexible camera calibration by viewing a plane from unknown orientations. In: International Conference on Computer Vision (1999)
Zhou, Y., Habermann, M., Xu, W., Habibie, I., Xu, F.: Monocular real-time hand shape and motion capture using multi-modal data. IEEE (2020)
Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. arXiv Computer Vision and Pattern Recognition (2017)
Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: FreiHAND: a dataset for markerless capture of hand pose and shape from single RGB images. arXiv Computer Vision and Pattern Recognition (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gan, M., Lin, Y., Liu, X., Song, W., Zeng, J., Kang, W. (2023). Ar3dHands: A Dataset and Baseline for Real-Time 3D Hand Pose Estimation from Binocular Distorted Images. In: Lu, H., et al. Image and Graphics. ICIG 2023. Lecture Notes in Computer Science, vol 14355. Springer, Cham. https://doi.org/10.1007/978-3-031-46305-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-46305-1_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46304-4
Online ISBN: 978-3-031-46305-1
eBook Packages: Computer ScienceComputer Science (R0)