skip to main content
10.1145/3623053.3623371acmconferencesArticle/Chapter ViewAbstractPublication Pagessiggraph-asiaConference Proceedingsconference-collections
research-article

Reconstructing Hand Shape and Appearance for Accurate Tracking from Monocular Video

Published:28 November 2023Publication History

ABSTRACT

A virtual animatable hand avatar capable of representing a user’s hand shape and appearance, and tracking the articulated motion is essential for an immersive experience in AR/VR. Recent approaches use implicit representations to capture geometry and appearance combined with neural rendering. However, they fail to generalize to unseen shapes, don’t handle lighting leading to baked-in illumination and self-shadows, and cannot capture complex poses. In this thesis, we 1) introduce a novel hand shape model that augments a data-driven shape model and adapt its local scale to represent unseen hand shapes, 2) propose a method to reconstruct a detailed hand avatar from monocular RGB video captured under real-world environment lighting by jointly optimizing shape, appearance, and lighting parameters using a realistic shading model in a differentiable rendering framework incorporating Monte Carlo path tracing, and 3) present a robust hand tracking framework that accurately registers our hand model to monocular depth data utilizing a modified skinning function with blend shapes. Our evaluation demonstrates that our approach outperforms existing hand shape and appearance reconstruction methods on all commonly used metrics. Further, our tracking framework improves over existing generative and discriminative hand pose estimation methods.

Skip Supplemental Material Section

Supplemental Material

video.mp4

mp4

28 MB

References

  1. Adnane Boukhayma, Rodrigo de Bem, and Philip HS Torr. 2019. 3d hand shape and pose from images in the wild. In CVPR.Google ScholarGoogle Scholar
  2. Xingyu Chen, Baoyuan Wang, and Heung-Yeung Shum. 2023. Hand Avatar: Free-Pose Hand Animation and Rendering From Monocular Video. In CVPR.Google ScholarGoogle Scholar
  3. Philip-William Grassal, Malte Prinzler, Titus Leistner, Carsten Rother, Matthias Nießner, and Justus Thies. 2022. Neural head avatars from monocular RGB videos. In CVPR.Google ScholarGoogle Scholar
  4. Jon Hasselgren, Nikolai Hofmann, and Jacob Munkberg. 2022. Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising. In NeurIPS.Google ScholarGoogle Scholar
  5. Weiting Huang, Pengfei Ren, Jingyu Wang, Qi Qi, and Haifeng Sun. 2020. AWR: Adaptive Weighting Regression for 3D Hand Pose Estimation. In AAAI.Google ScholarGoogle Scholar
  6. Boyi Jiang, Yang Hong, Hujun Bao, and Juyong Zhang. 2022. SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video. In CVPR.Google ScholarGoogle Scholar
  7. Tianjian Jiang, Xu Chen, Jie Song, and Otmar Hilliges. 2023. InstantAvatar: Learning Avatars From Monocular Video in 60 Seconds. In CVPR.Google ScholarGoogle Scholar
  8. Pratik Kalshetti and Parag Chaudhuri. 2019. Unsupervised Incremental Learning for Hand Shape and Pose Estimation. In ACM SIGGRAPH 2019 Posters.Google ScholarGoogle Scholar
  9. Pratik Kalshetti and Parag Chaudhuri. 2022a. Local Scale Adaptation for Augmenting Hand Shape Models. In ACM SIGGRAPH 2022 Posters.Google ScholarGoogle Scholar
  10. Pratik Kalshetti and Parag Chaudhuri. 2022b. Local Scale Adaptation to Hand Shape Model for Accurate and Robust Hand Tracking. Computer Graphics Forum 41, 8 (2022), 219–229.Google ScholarGoogle ScholarCross RefCross Ref
  11. Pratik Kalshetti and Parag Chaudhuri. 2024. Intrinsic Hand Avatar: Illumination-aware Hand Appearance and Shape Reconstruction from Monocular RGB Video. In WACV. (to appear).Google ScholarGoogle Scholar
  12. Korrawe Karunratanakul, Sergey Prokudin, Otmar Hilliges, and Siyu Tang. 2023. HARP: Personalized Hand Reconstruction From a Monocular RGB Video. In CVPR.Google ScholarGoogle Scholar
  13. Yuwei Li, Longwen Zhang, Zesong Qiu, Yingwenqi Jiang, Nianyi Li, Yuexin Ma, Yuyao Zhang, Lan Xu, and Jingyi Yu. 2022. NIMBLE: A Non-Rigid Hand Model with Bones and Muscles. ACM TOG 41, 4, Article 120 (2022).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Stephen McAuley, Stephen Hill, Naty Hoffman, Yoshiharu Gotanda, Brian Smits, Brent Burley, and Adam Martinez. 2012. Practical Physically-Based Shading in Film and Game Production. In ACM SIGGRAPH 2012 Courses.Google ScholarGoogle Scholar
  15. Gyeongsik Moon, Shoou-I Yu, He Wen, Takaaki Shiratori, and Kyoung Mu Lee. 2020. InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image. In ECCV.Google ScholarGoogle Scholar
  16. Neng Qian, Jiayi Wang, Franziska Mueller, Florian Bernard, Vladislav Golyanik, and Christian Theobalt. 2020. HTML: A Parametric Hand Texture Model for 3D Hand Reconstruction and Personalization. In ECCV.Google ScholarGoogle Scholar
  17. Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied Hands: Modeling and Capturing Hands and Bodies Together. ACM TOG 36, 6 (2017), 245:1–245:17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Xiao Sun, Yichen Wei, Shuang Liang, Xiaoou Tang, and Jian Sun. 2015. Cascaded hand pose regression. In CVPR.Google ScholarGoogle Scholar
  19. Andrea Tagliasacchi, Matthias Schröder, Anastasia Tkach, Sofien Bouaziz, Mario Botsch, and Mark Pauly. 2015. Robust articulated-icp for real-time hand tracking. In Computer Graphics Forum.Google ScholarGoogle Scholar
  20. Anastasia Tkach, Andrea Tagliasacchi, Edoardo Remelli, Mark Pauly, and Andrew Fitzgibbon. 2017. Online generative model personalization for hand tracking. ACM TOG 36, 6 (2017), 1–11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jonathan Tompson, Murphy Stein, Yann Lecun, and Ken Perlin. 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks. ACM TOG 33 (2014).Google ScholarGoogle Scholar
  22. C. Wan, T. Probst, L. Gool, and A. Yao. 2018. Dense 3D Regression for Hand Pose Estimation. In CVPR.Google ScholarGoogle Scholar
  23. Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, and Ira Kemelmacher-Shlizerman. 2022. HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video. In CVPR.Google ScholarGoogle Scholar

Index Terms

  1. Reconstructing Hand Shape and Appearance for Accurate Tracking from Monocular Video

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SA '23: SIGGRAPH Asia 2023 Doctoral Consortium
        November 2023
        50 pages
        ISBN:9798400703928
        DOI:10.1145/3623053

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 November 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate178of869submissions,20%
      • Article Metrics

        • Downloads (Last 12 months)72
        • Downloads (Last 6 weeks)8

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format