Robust vision-based glove pose estimation for both hands in virtual reality

Hsu, Fu-Song; Wang, Te-Mei; Chen, Liang-Hsun

doi:10.1007/s10055-023-00860-6

Robust vision-based glove pose estimation for both hands in virtual reality

Original Article
Published: 15 September 2023

Volume 27, pages 3133–3148, (2023)
Cite this article

Virtual Reality Aims and scope Submit manuscript

406 Accesses
Explore all metrics

Abstract

In virtual reality (VR) applications, haptic gloves provide feedback and more direct control than bare hands do. Most VR gloves contain flex and inertial measurement sensors for tracking the finger joints of a single hand; however, they lack a mechanism for tracking two-hand interactions. In this paper, a vision-based method is proposed for improved two-handed glove tracking. The proposed method requires only one camera attached to a VR headset. A photorealistic glove data generation framework was established to synthesize large quantities of training data for identifying the left, right, or both gloves in images with complex backgrounds. We also incorporated the “glove pose hypothesis” in the training stage, in which spatial cues regarding relative joint positions were exploited for accurately predict glove positions under severe self-occlusion or motion blur. In our experiments, a system based on the proposed method achieved an accuracy of 94.06% on a validation set and achieved high-speed tracking at 65 fps on a consumer graphics processing unit.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proof-of-Concept MARG-Based Glove for Intuitive 3D Human-Computer Interaction

Integrating Sensor Fusion with Pose Estimation for Simulating Human Interactions in Virtual Reality

Depth-Based Hand Pose Estimation: Methods, Data, and Challenges

Article 12 April 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The authors confirm that the data supporting the findings of this study are available within the article.

References

Barron C, Kakadiaris IA (2000) Estimating anthropometry and pose from a single image. Proc IEEE Conf Comput vis Pattern Recognit 1:669–676. https://doi.org/10.1109/CVPR.2000.855884
Article MATH Google Scholar
Buxton W, Myers B (1986) A study in two-handed input. In: Proceedings of the SIGCHI conference on human factors in computing systems, Boston, Massachusetts, USA., 321–326. https://doi.org/10.1145/22627.22390
Buxton W (1995) Chunking and phrasing and the design of human-computer dialogues. In: Baecker RM, Grudin J, Buxton WAS, Greenberg S. (Eds), Readings in human–computer interaction, 494–499. https://doi.org/10.1016/B978-0-08-051574-8.50051-0
Chen W, Yu C, Tu C, Lyu Z, Tang J, Ou S, Fu Y, Xue Z (2020) A Survey on hand pose estimation with wearable sensors and computer-vision-based methods. Sensors 20(4):1074. https://doi.org/10.3390/s20041074
Article Google Scholar
Chen Y, Tu Z, Ge L, Zhang D, Chen R, Yuan J (2019) SO-HandNet: self-organizing network for 3D hand pose estimation with semi-supervised learning. In: Proceedings of the IEEE/CVF international conference on computer vision, 6961–6970
Chen Y, Tu Z, Kang D, Bao L, Zhang Y, Zhe X, Chen R, Yuan J (2021) Model-based 3D Hand Reconstruction via Self-Supervised Learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10451–10460. https://doi.org/10.48550/arXiv.2103.11703
Cheng W, Park JH, Ko JH (2021) HandFoldingNet: A 3D hand pose estimation network using multiscale-feature guided folding of a 2D hand skeleton. In: Proceedings of the IEEE/CVF international conference on computer vision, 11260–11269. https://doi.org/10.48550/arXiv.2108.05545
Doosti B, Naha S, Mirbagheri M, Crandall DJ (2020) Hope-net: a graph-based model for hand-object pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6608–6617. https://doi.org/10.48550/arXiv.2004.00060
Erol A, Bebis G, Nicolescu M, Boyle RD, Twombly X (2007) Vision-based hand pose estimation: a review. Comput vis Image Underst 108(1–2):52–73. https://doi.org/10.1016/j.cviu.2006.10.012
Article Google Scholar
Fang L, Liu X, Liu L, Xu H, Kang W (2020) JGR-P2O: Joint graph reasoning based pixel-to-offset prediction network for 3D hand pose estimation from a single depth image. In: European Conference Computer Vision, pp 120–137. https://doi.org/10.48550/arXiv.2007.04646
Garcia-Hernando G, Yuan S, Baek S, Kim TK (2018) First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 409–419. https://doi.org/10.48550/arXiv.1704.0246
Hinckley K, Pausch R, Proffitt D, Kassell NF (1998a) Two-handed virtual manipulation. ACM Trans Comput Hum Interact 5(3):260–302. https://doi.org/10.1145/292834.292849
Article Google Scholar
Hinckley K, Pausch R, Proffitt D, Kassell NF (1998b) Two-handed virtual manipulation. ACM Trans Comput Hum Interact (TOCHI) 5(3):260–302. https://doi.org/10.1145/292834.292849
Article Google Scholar
Hinckley K, Pausch R, Proffitt D (1997) Attention and visual feedback: the bimanual frame of reference. In: Proceedings of the 1997 symposium on interactive 3D graphics, Providence, Rhode Island, USA. 121–ff. https://doi.org/10.1145/253284.253318
Huber PJ (1992) Robust estimation of a location parameter. In: Breakthroughs in statistics, pp 492–518. https://doi.org/10.1007/978-1-4612-4380-9_35
Insafutdinov E, Pishchulin L, Andres B, Andriluka M, Schiele B (2016) DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: European conference on computer vision, pp 34–50. https://doi.org/10.48550/arXiv.1605.03170
Kotranza A, Quarles J, Lok B (2006) Mixed reality: are two hands better than one?. In: Proceedings of the ACM symposium on virtual reality software and technology, Limassol, Cyprus. pp 31–34. https://doi.org/10.1145/1180495.1180503
Lin F, Wilhelm C, Martinez T (2021) Two-hand global 3D pose estimation using monocular RGB. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2373–2381. https://doi.org/10.48550/arXiv.2006.01320
Liu S Jiang H, Xu J, Liu S, Wang X (2021) Semi-supervised 3D hand-object poses estimation with interactions in time. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14687–14697. https://doi.org/10.48550/arXiv.2106.05266
Moon G, Chang JY, Lee KM (2018) V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5079–5088. https://doi.org/10.48550/arXiv.1711.07399
Mueller F, Mehta D, Sotnychenko O, Sridhar S, Casas D, Theobalt C (2017) Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: Proceedings of the IEEE international conference on computer vision, pp 1154–1163. https://doi.org/10.48550/arXiv.1704.02201
Mueller F, Bernard F, Sotnychenko O, Mehta D, Sridhar S, Casas D, Theobalt C (2018) GANerated hands for real-time 3D hand tracking from monocular RGB. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49–59. https://doi.org/10.48550/arXiv.1712.01057
Pishchulin L, Insafutdinov E, Tang S, Andres B, Andriluka M, Gehler PV, Schiele B (2016) DeepCut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4929–4937. https://doi.org/10.48550/arXiv.1511.06645
Rad M, Oberweger M, Lepetit V (2018) Feature mapping for learning fast and accurate 3D pose inference from synthetic images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4663–4672. https://doi.org/10.48550/arXiv.1712.03904
Ren P, Sun H, Hao J, Wang J, Qi Q, Liao J (2022) Mining multi-view information: a strong self-supervised framework for depth-based 3D hand pose and mesh estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 20555–20565. https://doi.org/10.1109/CVPR52688.2022.01990
Rhodin H, Richardt C, Casas D, Insafutdinov E, Shafiei M, Seidel H-P, Schiele B, Theobalt C (2016) EgoCap: egocentric marker-less motion capture with two fisheye cameras. ACM Trans Grap 35(6):1–11. https://doi.org/10.48550/arXiv.1609.07306
Article Google Scholar
Rudnev V, Golyanik V, Wang J, Seidel HP, Mueller F, Elgharib M, Theobalt C (2021) Real-time neural 3D hand pose estimation from an event stream. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2385–12395. https://doi.org/10.48550/arXiv.2012.06475
Sapp B, Taskar B (2013) MODEC: multimodal decomposable models for human pose estimation. IEEE Conf Comput vis Pattern Recognit 2013:23–28. https://doi.org/10.1109/CVPR.2013.471
Article Google Scholar
Spurr A, Dahiya A, Wang X, Zhang X, Hilliges O (2021) Self-supervised 3D hand pose estimation from monocular RGB via contrastive learning.In: Proceedings of the IEEE/CVF international conference on computer vision, pp 11230–11239. https://doi.org/10.48550/arXiv.2106.05953
Tompson J, Stein M, Lecun Y, Perlin K (2014) Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans Grap 33(5):1–10. https://doi.org/10.1145/2629500
Article Google Scholar
Vogiatzidakis P, Koutsabasis P (2022) ‘Address and command’: two-handed mid-air interactions with multiple home devices. Int J Hum Comput Stud 159:102755. https://doi.org/10.1016/j.ijhcs.2021.102755
Article Google Scholar
Voigt-Antons J N, Kojic T, Ali D, Möller S (2020) Influence of hand tracking as a way of interaction in virtual reality on user experience. In: 2020 Twelfth international conference on quality of multimedia experience (QoMEX), Athlone, Ireland, pp 1–4. https://doi.org/10.1109/QoMEX48832.2020.9123085
Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 4724–4732. https://doi.org/10.48550/arXiv.1602.00134
Xiong F, Zhang B, Xiao Y, Cao Z, Yu T, Zhou JT, Yuan J (2019) A2J: anchor-to-joint regression network for 3D articulated pose estimation from a single depth image. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 793–802. https://doi.org/10.48550/arXiv.1908.09999
Yang L, Li S, Lee D, Yao A (2019) Aligning latent spaces for 3D hand pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2335–2343. https://doi.org/10.1109/ICCV.2019.00242
Yang L, Li K, Zhan X, Lv J, Xu W, Li J, Lu C (2022) ArtiBoost: boosting articulated 3D hand-object pose estimation via online exploration and synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2750–2760. https://doi.org/10.48550/arXiv.2109.05488
Zhao Z, Zhao X, Wang Y (2021) TravelNet: self-supervised physically plausible hand motion learning from monocular color images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 11666–11676. https://doi.org/10.1109/ICCV48922.2021.01146

Download references

Acknowledgements

This study was supported by the Industrial Technology Research Institute, the National Science and Technology Council, Taiwan (Grant Numbers: NSTC 111-2222-E-A49-008 and NSTC 112-2221-E-A49-129).

Author information

Authors and Affiliations

Institute of Communication Studies, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Fu-Song Hsu
Electronic and Optoelectronic System Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan
Te-Mei Wang
Institute of Multimedia Engineering, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Liang-Hsun Chen

Authors

Fu-Song Hsu
View author publications
You can also search for this author inPubMed Google Scholar
Te-Mei Wang
View author publications
You can also search for this author inPubMed Google Scholar
Liang-Hsun Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Fu-Song Hsu.

Ethics declarations

Conflict of interest

The authors have no relevant financial or nonfinancial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (MP4 83881 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hsu, FS., Wang, TM. & Chen, LH. Robust vision-based glove pose estimation for both hands in virtual reality. Virtual Reality 27, 3133–3148 (2023). https://doi.org/10.1007/s10055-023-00860-6

Download citation

Received: 02 February 2023
Accepted: 21 August 2023
Published: 15 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10055-023-00860-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust vision-based glove pose estimation for both hands in virtual reality

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Proof-of-Concept MARG-Based Glove for Intuitive 3D Human-Computer Interaction

Integrating Sensor Fusion with Pose Estimation for Simulating Human Interactions in Virtual Reality

Depth-Based Hand Pose Estimation: Methods, Data, and Challenges

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now