research-article

3D Human Avatar Digitization from a Single Image

Authors:

Yi XuAuthors Info & Claims

VRCAI '19: Proceedings of the 17th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry

Article No.: 12, Pages 1 - 8

https://doi.org/10.1145/3359997.3365707

Published: 14 November 2019 Publication History

Abstract

With the development of AR/VR technologies, a reliable and straightforward way to digitize three-dimensional human body is in high demand. Most existing methods use complex equipment and sophisticated algorithms. This is impractical for everyday users. In this paper, we propose a pipeline that reconstructs 3D human shape avatar at a glance. Our approach simultaneously reconstructs the three-dimensional human geometry and whole body texture map with only a single RGB image as input. We first segment the human body part from the image and then obtain an initial body geometry by fitting the segment to a parametric model. Next, we warp the initial geometry to the final shape by applying a silhouette-based dense correspondence. Finally, to infer invisible backside texture from a frontal image, we propose a network we call InferGAN. Comprehensive experiments demonstrate that our solution is robust and effective on both public and our own captured data. Our human avatars can be easily rigged and animated using MoCap data. We developed a mobile application that demonstrates this capability in AR/VR settings.

Supplemental Material

MP4 File - a12-li-supplement

video

Download
59.06 MB

References

[1]

Thiemo Alldieck, Marcus Magnor, Bharat Lal Bhatnagar, Christian Theobalt, and Gerard Pons-Moll. 2019a. Learning to Reconstruct People in Clothing from a Single RGB Camera. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1175–1186.

[2]

Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018a. Detailed Human Avatars from Monocular Video. In International Conference on 3D Vision. 98–109. https://doi.org/10.1109/3DV.2018.00022

[3]

Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018b. Video based reconstruction of 3d people models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8387–8397.

[4]

Thiemo Alldieck, Gerard Pons-Moll, Christian Theobalt, and Marcus Magnor. 2019b. Tex2Shape: Detailed Full Human Body Geometry from a Single Image. arXiv preprint arXiv:1904.08645(2019).

[5]

Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: shape completion and animation of people. In ACM transactions on graphics (TOG), Vol. 24. ACM, 408–416.

[6]

Caroline Chan, Shiry Ginosar, Tinghui Zhou, and Alexei A Efros. 2019. Everybody Dance Now. International Conference on Computer Vision (ICCV) (2019).

[7]

Michael S Floater. 2003. Mean value coordinates. Computer aided geometric design 20, 1 (2003), 19–27.

Digital Library

[8]

Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, and Liang Lin. 2019. Graphonomy: Universal Human Parsing via Graph Transfer Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7450–7459.

[9]

Ralph Gross and Jianbo Shi. 2001. The cmu motion of body (mobo) database. (2001).

[10]

Peng Guan, Alexander Weiss, Alexandru O Balan, and Michael J Black. 2009. Estimating human shape and pose from a single image. In 2009 IEEE 12th International Conference on Computer Vision. IEEE, 1381–1388.

[11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

[12]

https://8i.com/. [n. d.]. Real human holograms for augmented, virtual and mixed reality. Accessed:2017-10-03 ([n. d.]).

[13]

Arjun Jain, Thorsten Thormählen, Hans-Peter Seidel, and Christian Theobalt. 2010. Moviereshape: Tracking and reshaping of humans in videos. In ACM Transactions on Graphics (TOG), Vol. 29. ACM, 148.

[14]

Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. 2015. Panoptic studio: A massively multiview system for social motion capture. In Proceedings of the IEEE International Conference on Computer Vision. 3334–3342.

Digital Library

[15]

Takeo Kanade and PJ Narayanan. 2007. Virtualized reality: perspectives on 4D digitization of dynamic events. IEEE Computer Graphics and Applications 27, 3 (2007), 32–40.

Digital Library

[16]

Angjoo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. 2018a. End-to-end recovery of human shape and pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7122–7131.

[17]

Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018b. End-to-end Recovery of Human Shape and Pose. In Computer Vision and Pattern Regognition (CVPR).

[18]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).

[19]

Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: A skinned multi-person linear model. ACM transactions on graphics (TOG) 34, 6 (2015), 248.

Digital Library

[21]

Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuytelaars, and Luc Van Gool. 2017. Pose guided person image generation. In Advances in Neural Information Processing Systems. 406–416.

[22]

Ryota Natsume, Shunsuke Saito, Zeng Huang, Weikai Chen, Chongyang Ma, Hao Li, and Shigeo Morishima. 2019. Siclope: Silhouette-based clothed people. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4480–4490.

[23]

Mohamed Omran, Christoph Lassner, Gerard Pons-Moll, Peter Gehler, and Bernt Schiele. 2018. Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In 2018 International Conference on 3D Vision (3DV). IEEE, 484–494.

[24]

Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L Davidson, Sameh Khamis, Mingsong Dou, 2016. Holoportation: Virtual 3d teleportation in real-time. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, 741–754.

Digital Library

[25]

Eunbyung Park, Jimei Yang, Ersin Yumer, Duygu Ceylan, and Alexander C Berg. 2017. Transformation-grounded image generation network for novel 3d view synthesis. In Proceedings of the ieee conference on computer vision and pattern recognition. 3500–3509.

[26]

Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, and Kostas Daniilidis. 2018. Learning to estimate 3D human pose and shape from a single color image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 459–468.

[27]

Shunsuke Saito, Zeng Huang, Ryota Natsume, Shigeo Morishima, Angjoo Kanazawa, and Hao Li. 2019. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization. arXiv preprint arXiv:1905.05172(2019).

[28]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).

[29]

Wei Sun, Jawadul H. Bappy, Shanglin Yang, Yi Xu, Tianfu Wu, and Hui Zhou. 2019. Pose Guided Fashion Image Synthesis Using Deep Generative Model. In Proceedings of KDD 2019 Workshop AI for Fashion.

[30]

Gul Varol, Duygu Ceylan, Bryan Russell, Jimei Yang, Ersin Yumer, Ivan Laptev, and Cordelia Schmid. 2018. Bodynet: Volumetric inference of 3d human body shapes. In Proceedings of the European Conference on Computer Vision (ECCV). 20–36.

Digital Library

[31]

Chung-Yi Weng, Brian Curless, and Ira Kemelmacher-Shlizerman. 2019. Photo wake-up: 3d character animation from a single photo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5908–5917.

[32]

Weipeng Xu, Avishek Chatterjee, Michael Zollhöfer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, and Christian Theobalt. 2018. Monoperfcap: Human performance capture from monocular video. ACM Transactions on Graphics (ToG) 37, 2 (2018), 27.

Digital Library

[33]

Shizhe Zhou, Hongbo Fu, Ligang Liu, Daniel Cohen-Or, and Xiaoguang Han. 2010. Parametric reshaping of human bodies in images. ACM Transactions on Graphics (TOG) 29, 4 (2010), 126.

Digital Library

[34]

Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A Efros. 2016. View synthesis by appearance flow. In European conference on computer vision. Springer, 286–301.

Cited By

Bruggisser FLeisi CLund-Jensen PFröhlich MSalter C(2024)reconFIGURE: Confronting Audiences with Digital DoppelgängersProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/36642087:4(1-10)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3664208
Chen YSaha AChapiro AHäne CBazin JQiu BZanetti SKatsavounidis IBovik A(2024)Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual RealityIEEE Transactions on Image Processing10.1109/TIP.2024.346888133(5740-5754)Online publication date: 2-Oct-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3468881
Druart A(2024)Single-Image to 3D Human: A Comprehensive Reconstruction Framework2024 International Conference on Control, Automation and Diagnosis (ICCAD)10.1109/ICCAD60883.2024.10553767(1-6)Online publication date: 15-May-2024
https://doi.org/10.1109/ICCAD60883.2024.10553767
Show More Cited By

Index Terms

3D Human Avatar Digitization from a Single Image

Index terms have been assigned to the content through auto-classification.

Recommendations

A Layered Model of Human Body and Garment Deformation
3DV '14: Proceedings of the 2014 2nd International Conference on 3D Vision - Volume 01

In this paper we present a framework for learning a three layered model of human shape, pose and garment deformation. The proposed deformation model provides intuitive control over the three parameters independently, while producing aesthetically ...
Realistic modeling and animation of human body based on scanned data
Abstract
In this paper we propose a novel method for building animation model of real human body from surface scanned data. The human model is represented by a triangular mesh and described as a layered geometric model. The model consists of two layers: ...
Garden: A Mixed Reality Experience Combining Virtual Reality and 3D Reconstruction
CHI EA '16: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems

Garden is a Mixed Reality (MR) experience that combines both Virtual Reality (VR) and Augmented Reality (AR), and lets players transform their environment into a virtual garden they can play in. This is done by doing both stereoscopic rendering and 3D ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

VRCAI '19: Proceedings of the 17th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry

November 2019

354 pages

ISBN:9781450370028

DOI:10.1145/3359997

Conference Chairs:
Joaquim Jorge
INESC/ID & Univ Lisboa, Portugal
,
June Kim
University of New South Wales, Australia
,
Herman Van Eyken
Griffith University, Queensland, Australia
,
Editor:
Stephen N. Spencer
University of Washington
,
Program Chairs:
Mashhuda Glencross
University of Queensland, Australia
,
Kenny Mitchell
Disney Research, USA and Edinburgh Napier University, UK
,
Takuji Narumi
University of Tokyo, Japan

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

VRCAI '19

Sponsor:

SIGGRAPH

VRCAI '19: The 17th International Conference on Virtual-Reality Continuum and its Applications in Industry

November 14 - 16, 2019

QLD, Brisbane, Australia

Acceptance Rates

Overall Acceptance Rate 51 of 107 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
878
Total Downloads

Downloads (Last 12 months)52
Downloads (Last 6 weeks)4

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bruggisser FLeisi CLund-Jensen PFröhlich MSalter C(2024)reconFIGURE: Confronting Audiences with Digital DoppelgängersProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/36642087:4(1-10)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3664208
Chen YSaha AChapiro AHäne CBazin JQiu BZanetti SKatsavounidis IBovik A(2024)Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual RealityIEEE Transactions on Image Processing10.1109/TIP.2024.346888133(5740-5754)Online publication date: 2-Oct-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3468881
Druart A(2024)Single-Image to 3D Human: A Comprehensive Reconstruction Framework2024 International Conference on Control, Automation and Diagnosis (ICCAD)10.1109/ICCAD60883.2024.10553767(1-6)Online publication date: 15-May-2024
https://doi.org/10.1109/ICCAD60883.2024.10553767
Koh ATan SNasrudin M(2024)A systematic literature review of generative adversarial networks (GANs) in 3D avatar reconstruction from 2D imagesMultimedia Tools and Applications10.1007/s11042-024-18665-383:26(68813-68853)Online publication date: 1-Mar-2024
https://doi.org/10.1007/s11042-024-18665-3
Muttagi SPatil VBabar PChunamari RKulkarni UChikkamath SMeena S(2024)3D Avatar Reconstruction Using Multi-level Pixel-Aligned Implicit FunctionProceedings of 4th International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications10.1007/978-981-99-9442-7_20(221-231)Online publication date: 23-May-2024
https://doi.org/10.1007/978-981-99-9442-7_20
Hirzle TMüller FDraxler FSchmitz MKnierim PHornbæk K(2023)When XR and AI Meet - A Scoping Review on Extended Reality and Artificial IntelligenceProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581072(1-45)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581072
Cha SSeo KAshtari ANoh J(2023)Generating Texture for 3D Human Avatar from a Single Image using Sampling and Refinement NetworksComputer Graphics Forum10.1111/cgf.1476942:2(385-396)Online publication date: 23-May-2023
https://doi.org/10.1111/cgf.14769
Zhou YHuang HYuan SZou HXie LYang J(2023)MetaFi++: WiFi-Enabled Transformer-Based Human Pose Estimation for Metaverse Avatar SimulationIEEE Internet of Things Journal10.1109/JIOT.2023.326294010:16(14128-14136)Online publication date: 15-Aug-2023
https://doi.org/10.1109/JIOT.2023.3262940
Huang YYi HLiu WWang HWu BWang WLin BZhang DCai D(2023)One-shot Implicit Animatable Avatars with Model-based Priors2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00824(8940-8951)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.00824
Yin YGhasedi KWu HYang JTong XFu Y(2023)NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-Shot Real Image Animation2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00825(8539-8548)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.00825
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten