poster

Vertex2Image: Construct Human Figure Based On A Monocular Video

Authors:
Zihao Wang

Computer Science, San Francisco State University, United States

Computer Science, San Francisco State University, United States

0000-0002-5375-3044
View Profile

,
Shah Rukh Humayoun

Computer Science, San Francisco State University, United States

Computer Science, San Francisco State University, United States

0000-0002-4645-8223
View Profile

IUI '23 Companion: Companion Proceedings of the 28th International Conference on Intelligent User InterfacesMarch 2023Pages 108–111https://doi.org/10.1145/3581754.3584145

Published:27 March 2023Publication History

IUI '23 Companion: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces

Pages 108–111

ABSTRACT

Human avatar construction is a trending research topic nowadays, as this technology can be applied to a number of domains for better online interactions, such as meta-universe. Our Vertex2Image model technique takes a single video source and constructs a target person from any arbitrary camera angle after training. Our model is based on SMPL [7] vertices to collect color information and distill the information through a modified version of UNet++ [19] to construct the representations. Although many deep learning architectures have been proposed in the literature, most of them suffer from long training time and no transfer learning to a new target. Our contribution is to train a generalized model to learn how textures are formed with sparse color information, then apply transfer learning to a specific target. Therefore, our training time for a new targeted person is drastically reduced to only 2 hours, instead of a couple of days, which is a typical training span for many existing models.

References

Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018. Video Based Reconstruction of 3D People Models. https://doi.org/10.48550/ARXIV.1803.04758Google ScholarCross Ref
Enric Corona, Albert Pumarola, Guillem Alenyà, Gerard Pons-Moll, and Francesc Moreno-Noguer. 2021. SMPLicit: Topology-aware Generative Model for Clothed People. https://doi.org/10.48550/ARXIV.2103.06871Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. https://doi.org/10.48550/ARXIV.1512.03385Google ScholarCross Ref
Yang Hong, Juyong Zhang, Boyi Jiang, Yudong Guo, Ligang Liu, and Hujun Bao. 2021. StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision. https://doi.org/10.48550/ARXIV.2104.05289Google ScholarCross Ref
Youngjoong Kwon, Dahun Kim, Duygu Ceylan, and Henry Fuchs. 2021. Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering. https://doi.org/10.48550/ARXIV.2109.07448Google ScholarCross Ref
Lingjie Liu, Marc Habermann, Viktor Rudnev, Kripasindhu Sarkar, Jiatao Gu, and Christian Theobalt. 2021. Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control. https://doi.org/10.48550/ARXIV.2106.02019Google ScholarCross Ref
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 34, 6 (Oct. 2015), 248:1–248:16. https://doi.org/10.1145/2816795.2818013Google ScholarDigital Library
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. https://doi.org/10.48550/ARXIV.2003.08934Google ScholarCross Ref
Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021. Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans. In CVPR.Google Scholar
Sergey Prokudin, Michael J. Black, and Javier Romero. 2020. SMPLpix: Neural Avatars from 3D Human Models. https://doi.org/10.48550/ARXIV.2008.06872Google ScholarCross Ref
Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. 2020. D-NeRF: Neural Radiance Fields for Dynamic Scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google Scholar
Shunsuke Saito, Tomas Simon, Jason Saragih, and Hanbyul Joo. 2020. PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
Shunsuke Saito, Jinlong Yang, Qianli Ma, and Michael J. Black. 2021. SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks. https://doi.org/10.48550/ARXIV.2104.03313Google ScholarCross Ref
Shih-Yang Su, Frank Yu, Michael Zollhoefer, and Helge Rhodin. 2021. A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose. https://doi.org/10.48550/ARXIV.2102.06199Google ScholarCross Ref
Garvita Tiwari, Nikolaos Sarafianos, Tony Tung, and Gerard Pons-Moll. 2021. Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing. https://doi.org/10.48550/ARXIV.2108.08807Google ScholarCross Ref
Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, and Ira Kemelmacher-Shlizerman. 2022. HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video. https://doi.org/10.48550/ARXIV.2201.04127Google ScholarCross Ref
Yuliang Xiu, Jinlong Yang, Dimitrios Tzionas, and Michael J. Black. 2021. ICON: Implicit Clothed humans Obtained from Normals. https://doi.org/10.48550/ARXIV.2112.09127Google ScholarCross Ref
Fuqiang Zhao, Wei Yang, Jiakai Zhang, Pei Lin, Yingliang Zhang, Jingyi Yu, and Lan Xu. 2021. HumanNeRF: Efficiently Generated Human Radiance Field from Sparse Inputs. https://doi.org/10.48550/ARXIV.2112.02789Google ScholarCross Ref
Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. 2018. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. https://doi.org/10.48550/ARXIV.1807.10165Google ScholarCross Ref

Index Terms

Vertex2Image: Construct Human Figure Based On A Monocular Video
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

Monocular human pose estimation: A survey of deep learning-based methods
Abstract
Vision-based monocular human pose estimation, as one of the most fundamental and challenging problems in computer vision, aims to obtain posture of the human body from input images or video sequences. The recent developments of deep ...
Read More
Deep learning-based for human segmentation and tracking, 3D human pose estimation and action recognition on monocular video of MADS dataset
Abstract
Human segmentation and tracking (HS-T) in the video often utilize person detection results. In addition, 3D human pose estimation (3D-HPE) and human activity recognition (HAR) often use human segmentation results to reduce data storage and ...
Read More
Monocular Human Body Shape Estimation: A Generation-aid Approach
VRCAI '22: Proceedings of the 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry

Observing human beings from monocular images is one of the basic tasks of computer vision. Reconstructing human bodies from monocular images mainly includes the reconstruction of posture and body shape. However, in the past studies, researchers were ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

IUI '23 Companion: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces
March 2023
266 pages
ISBN:9798400701078
DOI:10.1145/3581754

Copyright © 2023 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 March 2023
Check for updates
Author Tags
Deep Learning
Human Figure Construction
Monocular Video Based
Qualifiers
- poster
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate746of2,811submissions,27%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 62
  Total Downloads
- Downloads (Last 12 months)55
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Vertex2Image: Construct Human Figure Based On A Monocular Video

IUI '23 Companion: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Monocular human pose estimation: A survey of deep learning-based methods

Deep learning-based for human segmentation and tracking, 3D human pose estimation and action recognition on monocular video of MADS dataset

Monocular Human Body Shape Estimation: A Generation-aid Approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Vertex2Image: Construct Human Figure Based On A Monocular Video

IUI '23 Companion: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Monocular human pose estimation: A survey of deep learning-based methods

Deep learning-based for human segmentation and tracking, 3D human pose estimation and action recognition on monocular video of MADS dataset

Monocular Human Body Shape Estimation: A Generation-aid Approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media