skip to main content
10.1145/3581754.3584145acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
poster

Vertex2Image: Construct Human Figure Based On A Monocular Video

Published:27 March 2023Publication History

ABSTRACT

Human avatar construction is a trending research topic nowadays, as this technology can be applied to a number of domains for better online interactions, such as meta-universe. Our Vertex2Image model technique takes a single video source and constructs a target person from any arbitrary camera angle after training. Our model is based on SMPL [7] vertices to collect color information and distill the information through a modified version of UNet++ [19] to construct the representations. Although many deep learning architectures have been proposed in the literature, most of them suffer from long training time and no transfer learning to a new target. Our contribution is to train a generalized model to learn how textures are formed with sparse color information, then apply transfer learning to a specific target. Therefore, our training time for a new targeted person is drastically reduced to only 2 hours, instead of a couple of days, which is a typical training span for many existing models.

References

  1. Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018. Video Based Reconstruction of 3D People Models. https://doi.org/10.48550/ARXIV.1803.04758Google ScholarGoogle ScholarCross RefCross Ref
  2. Enric Corona, Albert Pumarola, Guillem Alenyà, Gerard Pons-Moll, and Francesc Moreno-Noguer. 2021. SMPLicit: Topology-aware Generative Model for Clothed People. https://doi.org/10.48550/ARXIV.2103.06871Google ScholarGoogle ScholarCross RefCross Ref
  3. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. https://doi.org/10.48550/ARXIV.1512.03385Google ScholarGoogle ScholarCross RefCross Ref
  4. Yang Hong, Juyong Zhang, Boyi Jiang, Yudong Guo, Ligang Liu, and Hujun Bao. 2021. StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision. https://doi.org/10.48550/ARXIV.2104.05289Google ScholarGoogle ScholarCross RefCross Ref
  5. Youngjoong Kwon, Dahun Kim, Duygu Ceylan, and Henry Fuchs. 2021. Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering. https://doi.org/10.48550/ARXIV.2109.07448Google ScholarGoogle ScholarCross RefCross Ref
  6. Lingjie Liu, Marc Habermann, Viktor Rudnev, Kripasindhu Sarkar, Jiatao Gu, and Christian Theobalt. 2021. Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control. https://doi.org/10.48550/ARXIV.2106.02019Google ScholarGoogle ScholarCross RefCross Ref
  7. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 34, 6 (Oct. 2015), 248:1–248:16. https://doi.org/10.1145/2816795.2818013Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. https://doi.org/10.48550/ARXIV.2003.08934Google ScholarGoogle ScholarCross RefCross Ref
  9. Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021. Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans. In CVPR.Google ScholarGoogle Scholar
  10. Sergey Prokudin, Michael J. Black, and Javier Romero. 2020. SMPLpix: Neural Avatars from 3D Human Models. https://doi.org/10.48550/ARXIV.2008.06872Google ScholarGoogle ScholarCross RefCross Ref
  11. Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. 2020. D-NeRF: Neural Radiance Fields for Dynamic Scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  12. Shunsuke Saito, Tomas Simon, Jason Saragih, and Hanbyul Joo. 2020. PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  13. Shunsuke Saito, Jinlong Yang, Qianli Ma, and Michael J. Black. 2021. SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks. https://doi.org/10.48550/ARXIV.2104.03313Google ScholarGoogle ScholarCross RefCross Ref
  14. Shih-Yang Su, Frank Yu, Michael Zollhoefer, and Helge Rhodin. 2021. A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose. https://doi.org/10.48550/ARXIV.2102.06199Google ScholarGoogle ScholarCross RefCross Ref
  15. Garvita Tiwari, Nikolaos Sarafianos, Tony Tung, and Gerard Pons-Moll. 2021. Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing. https://doi.org/10.48550/ARXIV.2108.08807Google ScholarGoogle ScholarCross RefCross Ref
  16. Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, and Ira Kemelmacher-Shlizerman. 2022. HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video. https://doi.org/10.48550/ARXIV.2201.04127Google ScholarGoogle ScholarCross RefCross Ref
  17. Yuliang Xiu, Jinlong Yang, Dimitrios Tzionas, and Michael J. Black. 2021. ICON: Implicit Clothed humans Obtained from Normals. https://doi.org/10.48550/ARXIV.2112.09127Google ScholarGoogle ScholarCross RefCross Ref
  18. Fuqiang Zhao, Wei Yang, Jiakai Zhang, Pei Lin, Yingliang Zhang, Jingyi Yu, and Lan Xu. 2021. HumanNeRF: Efficiently Generated Human Radiance Field from Sparse Inputs. https://doi.org/10.48550/ARXIV.2112.02789Google ScholarGoogle ScholarCross RefCross Ref
  19. Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. 2018. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. https://doi.org/10.48550/ARXIV.1807.10165Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Vertex2Image: Construct Human Figure Based On A Monocular Video

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            IUI '23 Companion: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces
            March 2023
            266 pages
            ISBN:9798400701078
            DOI:10.1145/3581754

            Copyright © 2023 Owner/Author

            Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 27 March 2023

            Check for updates

            Qualifiers

            • poster
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate746of2,811submissions,27%
          • Article Metrics

            • Downloads (Last 12 months)55
            • Downloads (Last 6 weeks)1

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format