research-article

Reconstructing Hand Shape and Appearance for Accurate Tracking from Monocular Video

Author:
Pratik Kalshetti

Computer Science and Engineering, Indian Institute of Technology Bombay, India

Computer Science and Engineering, Indian Institute of Technology Bombay, India

0000-0001-9003-6485
View Profile

SA '23: SIGGRAPH Asia 2023 Doctoral ConsortiumNovember 2023Article No.: 9Pages 1–4https://doi.org/10.1145/3623053.3623371

Published:28 November 2023Publication History

SA '23: SIGGRAPH Asia 2023 Doctoral Consortium

Pages 1–4

ABSTRACT

A virtual animatable hand avatar capable of representing a user’s hand shape and appearance, and tracking the articulated motion is essential for an immersive experience in AR/VR. Recent approaches use implicit representations to capture geometry and appearance combined with neural rendering. However, they fail to generalize to unseen shapes, don’t handle lighting leading to baked-in illumination and self-shadows, and cannot capture complex poses. In this thesis, we 1) introduce a novel hand shape model that augments a data-driven shape model and adapt its local scale to represent unseen hand shapes, 2) propose a method to reconstruct a detailed hand avatar from monocular RGB video captured under real-world environment lighting by jointly optimizing shape, appearance, and lighting parameters using a realistic shading model in a differentiable rendering framework incorporating Monte Carlo path tracing, and 3) present a robust hand tracking framework that accurately registers our hand model to monocular depth data utilizing a modified skinning function with blend shapes. Our evaluation demonstrates that our approach outperforms existing hand shape and appearance reconstruction methods on all commonly used metrics. Further, our tracking framework improves over existing generative and discriminative hand pose estimation methods.

Supplemental Material

video.mp4

mp4

28 MB

Download

References

Adnane Boukhayma, Rodrigo de Bem, and Philip HS Torr. 2019. 3d hand shape and pose from images in the wild. In CVPR.Google Scholar
Xingyu Chen, Baoyuan Wang, and Heung-Yeung Shum. 2023. Hand Avatar: Free-Pose Hand Animation and Rendering From Monocular Video. In CVPR.Google Scholar
Philip-William Grassal, Malte Prinzler, Titus Leistner, Carsten Rother, Matthias Nießner, and Justus Thies. 2022. Neural head avatars from monocular RGB videos. In CVPR.Google Scholar
Jon Hasselgren, Nikolai Hofmann, and Jacob Munkberg. 2022. Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising. In NeurIPS.Google Scholar
Weiting Huang, Pengfei Ren, Jingyu Wang, Qi Qi, and Haifeng Sun. 2020. AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation. In AAAI.Google Scholar
Boyi Jiang, Yang Hong, Hujun Bao, and Juyong Zhang. 2022. SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video. In CVPR.Google Scholar
Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. 2023. InstantAvatar: Learning Avatars From Monocular Video in 60 Seconds. In CVPR.Google Scholar
Pratik Kalshetti and Parag Chaudhuri. 2019. Unsupervised Incremental Learning for Hand Shape and Pose Estimation. In ACM SIGGRAPH 2019 Posters.Google Scholar
Pratik Kalshetti and Parag Chaudhuri. 2022a. Local Scale Adaptation for Augmenting Hand Shape Models. In ACM SIGGRAPH 2022 Posters.Google Scholar
Pratik Kalshetti and Parag Chaudhuri. 2022b. Local Scale Adaptation to Hand Shape Model for Accurate and Robust Hand Tracking. Computer Graphics Forum 41, 8 (2022), 219–229.Google ScholarCross Ref
Pratik Kalshetti and Parag Chaudhuri. 2024. Intrinsic Hand Avatar: Illumination-aware Hand Appearance and Shape Reconstruction from Monocular RGB Video. In WACV. (to appear).Google Scholar
Korrawe Karunratanakul, Sergey Prokudin, Otmar Hilliges, and Siyu Tang. 2023. HARP: Personalized Hand Reconstruction From a Monocular RGB Video. In CVPR.Google Scholar
Yuwei Li, Longwen Zhang, Zesong Qiu, Yingwenqi Jiang, Nianyi Li, Yuexin Ma, Yuyao Zhang, Lan Xu, and Jingyi Yu. 2022. NIMBLE: A Non-Rigid Hand Model with Bones and Muscles. ACM TOG 41, 4, Article 120 (2022).Google ScholarDigital Library
Stephen McAuley, Stephen Hill, Naty Hoffman, Yoshiharu Gotanda, Brian Smits, Brent Burley, and Adam Martinez. 2012. Practical Physically-Based Shading in Film and Game Production. In ACM SIGGRAPH 2012 Courses.Google Scholar
Gyeongsik Moon, Shoou-I Yu, He Wen, Takaaki Shiratori, and Kyoung Mu Lee. 2020. InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image. In ECCV.Google Scholar
Neng Qian, Jiayi Wang, Franziska Mueller, Florian Bernard, Vladislav Golyanik, and Christian Theobalt. 2020. HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization. In ECCV.Google Scholar
Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied Hands: Modeling and Capturing Hands and Bodies Together. ACM TOG 36, 6 (2017), 245:1–245:17.Google ScholarDigital Library
Xiao Sun, Yichen Wei, Shuang Liang, Xiaoou Tang, and Jian Sun. 2015. Cascaded hand pose regression. In CVPR.Google Scholar
Andrea Tagliasacchi, Matthias Schröder, Anastasia Tkach, Sofien Bouaziz, Mario Botsch, and Mark Pauly. 2015. Robust articulated-icp for real-time hand tracking. In Computer Graphics Forum.Google Scholar
Anastasia Tkach, Andrea Tagliasacchi, Edoardo Remelli, Mark Pauly, and Andrew Fitzgibbon. 2017. Online generative model personalization for hand tracking. ACM TOG 36, 6 (2017), 1–11.Google ScholarDigital Library
Jonathan Tompson, Murphy Stein, Yann Lecun, and Ken Perlin. 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks. ACM TOG 33 (2014).Google Scholar
C. Wan, T. Probst, L. Gool, and A. Yao. 2018. Dense 3D Regression for Hand Pose Estimation. In CVPR.Google Scholar
Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, and Ira Kemelmacher-Shlizerman. 2022. HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video. In CVPR.Google Scholar

Index Terms

Reconstructing Hand Shape and Appearance for Accurate Tracking from Monocular Video
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction
        Tracking

Recommendations

Global hand pose estimation by multiple camera ellipse tracking

Immersive virtual environments with life-like interaction capabilities have very demanding requirements including high-precision motion capture and high-processing speed. These issues raise many challenges for computer vision-based motion estimation ...
Read More
Accurate and efficient 3D hand pose regression for robot hand teleoperation using a monocular RGB camera
Highlights
- A large-scale multi-view dataset that provides accurate annotations for hand poses.
Abstract
In this paper, we present a novel deep learning-based architecture, which is under the scope of expert and intelligent systems, to perform accurate real-time tridimensional hand pose estimation using a single RGB frame as an input, so ...
Read More
Hand pose estimation by combining fingertip tracking and articulated ICP
VRCAI '12: Proceedings of the 11th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry

In this paper we present a model-based framework for hand pose estimation, which relies on the depth and color image sequence input. The proposed framework adopts a divide-and-conquer scheme, and combines fingertip tracking and articulated iterative ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SA '23: SIGGRAPH Asia 2023 Doctoral Consortium
November 2023
50 pages
ISBN:9798400703928
DOI:10.1145/3623053
Editors:
June Kim,
Simon See,
Aaron Quigley,
Mashhuda Glencross
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
hand pose estimation
hand shape and appearance reconstruction
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate178of869submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 72
  Total Downloads
- Downloads (Last 12 months)72
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Reconstructing Hand Shape and Appearance for Accurate Tracking from Monocular Video

SA '23: SIGGRAPH Asia 2023 Doctoral Consortium

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Global hand pose estimation by multiple camera ellipse tracking

Accurate and efficient 3D hand pose regression for robot hand teleoperation using a monocular RGB camera

Hand pose estimation by combining fingertip tracking and articulated ICP