skip to main content
10.1145/3359997.3365707acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

3D Human Avatar Digitization from a Single Image

Published: 14 November 2019 Publication History

Abstract

With the development of AR/VR technologies, a reliable and straightforward way to digitize three-dimensional human body is in high demand. Most existing methods use complex equipment and sophisticated algorithms. This is impractical for everyday users. In this paper, we propose a pipeline that reconstructs 3D human shape avatar at a glance. Our approach simultaneously reconstructs the three-dimensional human geometry and whole body texture map with only a single RGB image as input. We first segment the human body part from the image and then obtain an initial body geometry by fitting the segment to a parametric model. Next, we warp the initial geometry to the final shape by applying a silhouette-based dense correspondence. Finally, to infer invisible backside texture from a frontal image, we propose a network we call InferGAN. Comprehensive experiments demonstrate that our solution is robust and effective on both public and our own captured data. Our human avatars can be easily rigged and animated using MoCap data. We developed a mobile application that demonstrates this capability in AR/VR settings.

Supplemental Material

MP4 File - a12-li-supplement
video

References

[1]
Thiemo Alldieck, Marcus Magnor, Bharat Lal Bhatnagar, Christian Theobalt, and Gerard Pons-Moll. 2019a. Learning to Reconstruct People in Clothing from a Single RGB Camera. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1175–1186.
[2]
Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018a. Detailed Human Avatars from Monocular Video. In International Conference on 3D Vision. 98–109. https://doi.org/10.1109/3DV.2018.00022
[3]
Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018b. Video based reconstruction of 3d people models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8387–8397.
[4]
Thiemo Alldieck, Gerard Pons-Moll, Christian Theobalt, and Marcus Magnor. 2019b. Tex2Shape: Detailed Full Human Body Geometry from a Single Image. arXiv preprint arXiv:1904.08645(2019).
[5]
Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: shape completion and animation of people. In ACM transactions on graphics (TOG), Vol. 24. ACM, 408–416.
[6]
Caroline Chan, Shiry Ginosar, Tinghui Zhou, and Alexei A Efros. 2019. Everybody Dance Now. International Conference on Computer Vision (ICCV) (2019).
[7]
Michael S Floater. 2003. Mean value coordinates. Computer aided geometric design 20, 1 (2003), 19–27.
[8]
Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, and Liang Lin. 2019. Graphonomy: Universal Human Parsing via Graph Transfer Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7450–7459.
[9]
Ralph Gross and Jianbo Shi. 2001. The cmu motion of body (mobo) database. (2001).
[10]
Peng Guan, Alexander Weiss, Alexandru O Balan, and Michael J Black. 2009. Estimating human shape and pose from a single image. In 2009 IEEE 12th International Conference on Computer Vision. IEEE, 1381–1388.
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[12]
https://8i.com/. [n. d.]. Real human holograms for augmented, virtual and mixed reality. Accessed:2017-10-03 ([n. d.]).
[13]
Arjun Jain, Thorsten Thormählen, Hans-Peter Seidel, and Christian Theobalt. 2010. Moviereshape: Tracking and reshaping of humans in videos. In ACM Transactions on Graphics (TOG), Vol. 29. ACM, 148.
[14]
Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. 2015. Panoptic studio: A massively multiview system for social motion capture. In Proceedings of the IEEE International Conference on Computer Vision. 3334–3342.
[15]
Takeo Kanade and PJ Narayanan. 2007. Virtualized reality: perspectives on 4D digitization of dynamic events. IEEE Computer Graphics and Applications 27, 3 (2007), 32–40.
[16]
Angjoo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. 2018a. End-to-end recovery of human shape and pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7122–7131.
[17]
Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018b. End-to-end Recovery of Human Shape and Pose. In Computer Vision and Pattern Regognition (CVPR).
[18]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).
[19]
Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20]
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: A skinned multi-person linear model. ACM transactions on graphics (TOG) 34, 6 (2015), 248.
[21]
Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuytelaars, and Luc Van Gool. 2017. Pose guided person image generation. In Advances in Neural Information Processing Systems. 406–416.
[22]
Ryota Natsume, Shunsuke Saito, Zeng Huang, Weikai Chen, Chongyang Ma, Hao Li, and Shigeo Morishima. 2019. Siclope: Silhouette-based clothed people. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4480–4490.
[23]
Mohamed Omran, Christoph Lassner, Gerard Pons-Moll, Peter Gehler, and Bernt Schiele. 2018. Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In 2018 International Conference on 3D Vision (3DV). IEEE, 484–494.
[24]
Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L Davidson, Sameh Khamis, Mingsong Dou, 2016. Holoportation: Virtual 3d teleportation in real-time. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, 741–754.
[25]
Eunbyung Park, Jimei Yang, Ersin Yumer, Duygu Ceylan, and Alexander C Berg. 2017. Transformation-grounded image generation network for novel 3d view synthesis. In Proceedings of the ieee conference on computer vision and pattern recognition. 3500–3509.
[26]
Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, and Kostas Daniilidis. 2018. Learning to estimate 3D human pose and shape from a single color image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 459–468.
[27]
Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, and Hao Li. 2019. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization. arXiv preprint arXiv:1905.05172(2019).
[28]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).
[29]
Wei Sun, Jawadul H. Bappy, Shanglin Yang, Yi Xu, Tianfu Wu, and Hui Zhou. 2019. Pose Guided Fashion Image Synthesis Using Deep Generative Model. In Proceedings of KDD 2019 Workshop AI for Fashion.
[30]
Gul Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, and Cordelia Schmid. 2018. Bodynet: Volumetric inference of 3d human body shapes. In Proceedings of the European Conference on Computer Vision (ECCV). 20–36.
[31]
Chung-Yi Weng, Brian Curless, and Ira Kemelmacher-Shlizerman. 2019. Photo wake-up: 3d character animation from a single photo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5908–5917.
[32]
Weipeng Xu, Avishek Chatterjee, Michael Zollhöfer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, and Christian Theobalt. 2018. Monoperfcap: Human performance capture from monocular video. ACM Transactions on Graphics (ToG) 37, 2 (2018), 27.
[33]
Shizhe Zhou, Hongbo Fu, Ligang Liu, Daniel Cohen-Or, and Xiaoguang Han. 2010. Parametric reshaping of human bodies in images. ACM Transactions on Graphics (TOG) 29, 4 (2010), 126.
[34]
Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A Efros. 2016. View synthesis by appearance flow. In European conference on computer vision. Springer, 286–301.

Cited By

View all
  • (2024)reconFIGURE: Confronting Audiences with Digital DoppelgängersProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/36642087:4(1-10)Online publication date: 19-Jul-2024
  • (2024)Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual RealityIEEE Transactions on Image Processing10.1109/TIP.2024.346888133(5740-5754)Online publication date: 2-Oct-2024
  • (2024)Single-Image to 3D Human: A Comprehensive Reconstruction Framework2024 International Conference on Control, Automation and Diagnosis (ICCAD)10.1109/ICCAD60883.2024.10553767(1-6)Online publication date: 15-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
VRCAI '19: Proceedings of the 17th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry
November 2019
354 pages
ISBN:9781450370028
DOI:10.1145/3359997
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3D reconstruction
  2. Augmented Reality
  3. Deep Learning
  4. Human body modeling

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

VRCAI '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 51 of 107 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)4
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)reconFIGURE: Confronting Audiences with Digital DoppelgängersProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/36642087:4(1-10)Online publication date: 19-Jul-2024
  • (2024)Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual RealityIEEE Transactions on Image Processing10.1109/TIP.2024.346888133(5740-5754)Online publication date: 2-Oct-2024
  • (2024)Single-Image to 3D Human: A Comprehensive Reconstruction Framework2024 International Conference on Control, Automation and Diagnosis (ICCAD)10.1109/ICCAD60883.2024.10553767(1-6)Online publication date: 15-May-2024
  • (2024)A systematic literature review of generative adversarial networks (GANs) in 3D avatar reconstruction from 2D imagesMultimedia Tools and Applications10.1007/s11042-024-18665-383:26(68813-68853)Online publication date: 1-Mar-2024
  • (2024)3D Avatar Reconstruction Using Multi-level Pixel-Aligned Implicit FunctionProceedings of 4th International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications10.1007/978-981-99-9442-7_20(221-231)Online publication date: 23-May-2024
  • (2023)When XR and AI Meet - A Scoping Review on Extended Reality and Artificial IntelligenceProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581072(1-45)Online publication date: 19-Apr-2023
  • (2023)Generating Texture for 3D Human Avatar from a Single Image using Sampling and Refinement NetworksComputer Graphics Forum10.1111/cgf.1476942:2(385-396)Online publication date: 23-May-2023
  • (2023)MetaFi++: WiFi-Enabled Transformer-Based Human Pose Estimation for Metaverse Avatar SimulationIEEE Internet of Things Journal10.1109/JIOT.2023.326294010:16(14128-14136)Online publication date: 15-Aug-2023
  • (2023)One-shot Implicit Animatable Avatars with Model-based Priors2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00824(8940-8951)Online publication date: 1-Oct-2023
  • (2023)NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-Shot Real Image Animation2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00825(8539-8548)Online publication date: Jun-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media