ABSTRACT
A virtual animatable hand avatar capable of representing a user’s hand shape and appearance, and tracking the articulated motion is essential for an immersive experience in AR/VR. Recent approaches use implicit representations to capture geometry and appearance combined with neural rendering. However, they fail to generalize to unseen shapes, don’t handle lighting leading to baked-in illumination and self-shadows, and cannot capture complex poses. In this thesis, we 1) introduce a novel hand shape model that augments a data-driven shape model and adapt its local scale to represent unseen hand shapes, 2) propose a method to reconstruct a detailed hand avatar from monocular RGB video captured under real-world environment lighting by jointly optimizing shape, appearance, and lighting parameters using a realistic shading model in a differentiable rendering framework incorporating Monte Carlo path tracing, and 3) present a robust hand tracking framework that accurately registers our hand model to monocular depth data utilizing a modified skinning function with blend shapes. Our evaluation demonstrates that our approach outperforms existing hand shape and appearance reconstruction methods on all commonly used metrics. Further, our tracking framework improves over existing generative and discriminative hand pose estimation methods.
Supplemental Material
- Adnane Boukhayma, Rodrigo de Bem, and Philip HS Torr. 2019. 3d hand shape and pose from images in the wild. In CVPR.Google Scholar
- Xingyu Chen, Baoyuan Wang, and Heung-Yeung Shum. 2023. Hand Avatar: Free-Pose Hand Animation and Rendering From Monocular Video. In CVPR.Google Scholar
- Philip-William Grassal, Malte Prinzler, Titus Leistner, Carsten Rother, Matthias Nießner, and Justus Thies. 2022. Neural head avatars from monocular RGB videos. In CVPR.Google Scholar
- Jon Hasselgren, Nikolai Hofmann, and Jacob Munkberg. 2022. Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising. In NeurIPS.Google Scholar
- Weiting Huang, Pengfei Ren, Jingyu Wang, Qi Qi, and Haifeng Sun. 2020. AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation. In AAAI.Google Scholar
- Boyi Jiang, Yang Hong, Hujun Bao, and Juyong Zhang. 2022. SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video. In CVPR.Google Scholar
- Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. 2023. InstantAvatar: Learning Avatars From Monocular Video in 60 Seconds. In CVPR.Google Scholar
- Pratik Kalshetti and Parag Chaudhuri. 2019. Unsupervised Incremental Learning for Hand Shape and Pose Estimation. In ACM SIGGRAPH 2019 Posters.Google Scholar
- Pratik Kalshetti and Parag Chaudhuri. 2022a. Local Scale Adaptation for Augmenting Hand Shape Models. In ACM SIGGRAPH 2022 Posters.Google Scholar
- Pratik Kalshetti and Parag Chaudhuri. 2022b. Local Scale Adaptation to Hand Shape Model for Accurate and Robust Hand Tracking. Computer Graphics Forum 41, 8 (2022), 219–229.Google ScholarCross Ref
- Pratik Kalshetti and Parag Chaudhuri. 2024. Intrinsic Hand Avatar: Illumination-aware Hand Appearance and Shape Reconstruction from Monocular RGB Video. In WACV. (to appear).Google Scholar
- Korrawe Karunratanakul, Sergey Prokudin, Otmar Hilliges, and Siyu Tang. 2023. HARP: Personalized Hand Reconstruction From a Monocular RGB Video. In CVPR.Google Scholar
- Yuwei Li, Longwen Zhang, Zesong Qiu, Yingwenqi Jiang, Nianyi Li, Yuexin Ma, Yuyao Zhang, Lan Xu, and Jingyi Yu. 2022. NIMBLE: A Non-Rigid Hand Model with Bones and Muscles. ACM TOG 41, 4, Article 120 (2022).Google ScholarDigital Library
- Stephen McAuley, Stephen Hill, Naty Hoffman, Yoshiharu Gotanda, Brian Smits, Brent Burley, and Adam Martinez. 2012. Practical Physically-Based Shading in Film and Game Production. In ACM SIGGRAPH 2012 Courses.Google Scholar
- Gyeongsik Moon, Shoou-I Yu, He Wen, Takaaki Shiratori, and Kyoung Mu Lee. 2020. InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image. In ECCV.Google Scholar
- Neng Qian, Jiayi Wang, Franziska Mueller, Florian Bernard, Vladislav Golyanik, and Christian Theobalt. 2020. HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization. In ECCV.Google Scholar
- Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied Hands: Modeling and Capturing Hands and Bodies Together. ACM TOG 36, 6 (2017), 245:1–245:17.Google ScholarDigital Library
- Xiao Sun, Yichen Wei, Shuang Liang, Xiaoou Tang, and Jian Sun. 2015. Cascaded hand pose regression. In CVPR.Google Scholar
- Andrea Tagliasacchi, Matthias Schröder, Anastasia Tkach, Sofien Bouaziz, Mario Botsch, and Mark Pauly. 2015. Robust articulated-icp for real-time hand tracking. In Computer Graphics Forum.Google Scholar
- Anastasia Tkach, Andrea Tagliasacchi, Edoardo Remelli, Mark Pauly, and Andrew Fitzgibbon. 2017. Online generative model personalization for hand tracking. ACM TOG 36, 6 (2017), 1–11.Google ScholarDigital Library
- Jonathan Tompson, Murphy Stein, Yann Lecun, and Ken Perlin. 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks. ACM TOG 33 (2014).Google Scholar
- C. Wan, T. Probst, L. Gool, and A. Yao. 2018. Dense 3D Regression for Hand Pose Estimation. In CVPR.Google Scholar
- Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, and Ira Kemelmacher-Shlizerman. 2022. HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video. In CVPR.Google Scholar
Index Terms
- Reconstructing Hand Shape and Appearance for Accurate Tracking from Monocular Video
Recommendations
Global hand pose estimation by multiple camera ellipse tracking
Immersive virtual environments with life-like interaction capabilities have very demanding requirements including high-precision motion capture and high-processing speed. These issues raise many challenges for computer vision-based motion estimation ...
Accurate and efficient 3D hand pose regression for robot hand teleoperation using a monocular RGB camera
Highlights- A large-scale multi-view dataset that provides accurate annotations for hand poses.
AbstractIn this paper, we present a novel deep learning-based architecture, which is under the scope of expert and intelligent systems, to perform accurate real-time tridimensional hand pose estimation using a single RGB frame as an input, so ...
Hand pose estimation by combining fingertip tracking and articulated ICP
VRCAI '12: Proceedings of the 11th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in IndustryIn this paper we present a model-based framework for hand pose estimation, which relies on the depth and color image sequence input. The proposed framework adopts a divide-and-conquer scheme, and combines fingertip tracking and articulated iterative ...
Comments