ABSTRACT
Human avatar construction is a trending research topic nowadays, as this technology can be applied to a number of domains for better online interactions, such as meta-universe. Our Vertex2Image model technique takes a single video source and constructs a target person from any arbitrary camera angle after training. Our model is based on SMPL [7] vertices to collect color information and distill the information through a modified version of UNet++ [19] to construct the representations. Although many deep learning architectures have been proposed in the literature, most of them suffer from long training time and no transfer learning to a new target. Our contribution is to train a generalized model to learn how textures are formed with sparse color information, then apply transfer learning to a specific target. Therefore, our training time for a new targeted person is drastically reduced to only 2 hours, instead of a couple of days, which is a typical training span for many existing models.
- Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018. Video Based Reconstruction of 3D People Models. https://doi.org/10.48550/ARXIV.1803.04758Google ScholarCross Ref
- Enric Corona, Albert Pumarola, Guillem Alenyà, Gerard Pons-Moll, and Francesc Moreno-Noguer. 2021. SMPLicit: Topology-aware Generative Model for Clothed People. https://doi.org/10.48550/ARXIV.2103.06871Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. https://doi.org/10.48550/ARXIV.1512.03385Google ScholarCross Ref
- Yang Hong, Juyong Zhang, Boyi Jiang, Yudong Guo, Ligang Liu, and Hujun Bao. 2021. StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision. https://doi.org/10.48550/ARXIV.2104.05289Google ScholarCross Ref
- Youngjoong Kwon, Dahun Kim, Duygu Ceylan, and Henry Fuchs. 2021. Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering. https://doi.org/10.48550/ARXIV.2109.07448Google ScholarCross Ref
- Lingjie Liu, Marc Habermann, Viktor Rudnev, Kripasindhu Sarkar, Jiatao Gu, and Christian Theobalt. 2021. Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control. https://doi.org/10.48550/ARXIV.2106.02019Google ScholarCross Ref
- Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 34, 6 (Oct. 2015), 248:1–248:16. https://doi.org/10.1145/2816795.2818013Google ScholarDigital Library
- Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. https://doi.org/10.48550/ARXIV.2003.08934Google ScholarCross Ref
- Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021. Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans. In CVPR.Google Scholar
- Sergey Prokudin, Michael J. Black, and Javier Romero. 2020. SMPLpix: Neural Avatars from 3D Human Models. https://doi.org/10.48550/ARXIV.2008.06872Google ScholarCross Ref
- Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. 2020. D-NeRF: Neural Radiance Fields for Dynamic Scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
- Shunsuke Saito, Tomas Simon, Jason Saragih, and Hanbyul Joo. 2020. PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- Shunsuke Saito, Jinlong Yang, Qianli Ma, and Michael J. Black. 2021. SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks. https://doi.org/10.48550/ARXIV.2104.03313Google ScholarCross Ref
- Shih-Yang Su, Frank Yu, Michael Zollhoefer, and Helge Rhodin. 2021. A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose. https://doi.org/10.48550/ARXIV.2102.06199Google ScholarCross Ref
- Garvita Tiwari, Nikolaos Sarafianos, Tony Tung, and Gerard Pons-Moll. 2021. Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing. https://doi.org/10.48550/ARXIV.2108.08807Google ScholarCross Ref
- Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, and Ira Kemelmacher-Shlizerman. 2022. HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video. https://doi.org/10.48550/ARXIV.2201.04127Google ScholarCross Ref
- Yuliang Xiu, Jinlong Yang, Dimitrios Tzionas, and Michael J. Black. 2021. ICON: Implicit Clothed humans Obtained from Normals. https://doi.org/10.48550/ARXIV.2112.09127Google ScholarCross Ref
- Fuqiang Zhao, Wei Yang, Jiakai Zhang, Pei Lin, Yingliang Zhang, Jingyi Yu, and Lan Xu. 2021. HumanNeRF: Efficiently Generated Human Radiance Field from Sparse Inputs. https://doi.org/10.48550/ARXIV.2112.02789Google ScholarCross Ref
- Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. 2018. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. https://doi.org/10.48550/ARXIV.1807.10165Google ScholarCross Ref
Index Terms
- Vertex2Image: Construct Human Figure Based On A Monocular Video
Recommendations
Monocular human pose estimation: A survey of deep learning-based methods
AbstractVision-based monocular human pose estimation, as one of the most fundamental and challenging problems in computer vision, aims to obtain posture of the human body from input images or video sequences. The recent developments of deep ...
Deep learning-based for human segmentation and tracking, 3D human pose estimation and action recognition on monocular video of MADS dataset
AbstractHuman segmentation and tracking (HS-T) in the video often utilize person detection results. In addition, 3D human pose estimation (3D-HPE) and human activity recognition (HAR) often use human segmentation results to reduce data storage and ...
Monocular Human Body Shape Estimation: A Generation-aid Approach
VRCAI '22: Proceedings of the 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in IndustryObserving human beings from monocular images is one of the basic tasks of computer vision. Reconstructing human bodies from monocular images mainly includes the reconstruction of posture and body shape. However, in the past studies, researchers were ...
Comments