skip to main content
10.1145/3574131.3574448acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

Monocular Human Body Shape Estimation: A Generation-aid Approach

Published:13 January 2023Publication History

ABSTRACT

Observing human beings from monocular images is one of the basic tasks of computer vision. Reconstructing human bodies from monocular images mainly includes the reconstruction of posture and body shape. However, in the past studies, researchers were more interested in pose estimation, ignoring the study of body shape, and this paper focuses on the estimation of the body shape of a 3D model. Learning body parameters via instance segmentation requires a large number of labels. While the parameters based on pose estimation are completely based on the results of key points detection, which effect is not friendly for pictures with poor angles and low resolution. In response to the above problems, we propose a method to automatically generate datasets. The dataset provides low-resolution images and labels of various angles and blurred shapes. On the generated low-resolution and poorly angled dataset, we propose a generative-assisted deep learning network framework. Experiments show that the framework can effectively estimate the body shape parameters of the model from monocular images.

References

  1. Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. http://arxiv.org/abs/1607.08128 Number: arXiv:1607.08128 arXiv:1607.08128 [cs].Google ScholarGoogle Scholar
  2. Hongsuk Choi, Gyeongsik Moon, Ju Yong Chang, and Kyoung Mu Lee. 2021. Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Nashville, TN, USA, 1964–1973. https://doi.org/10.1109/CVPR46437.2021.00200Google ScholarGoogle ScholarCross RefCross Ref
  3. Georgios Georgakis, Ren Li, Srikrishna Karanam, Terrence Chen, Jana Kosecka, and Ziyan Wu. 2020. Hierarchical Kinematic Human Mesh Recovery. http://arxiv.org/abs/2003.04232 arXiv:2003.04232 [cs].Google ScholarGoogle Scholar
  4. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative Adversarial Networks. (2014).Google ScholarGoogle Scholar
  5. Riza Alp Guler and Iasonas Kokkinos. 2019. HoloPose: Holistic 3D Human Reconstruction In-The-Wild. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 10876–10886. https://doi.org/10.1109/CVPR.2019.01114Google ScholarGoogle ScholarCross RefCross Ref
  6. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90Google ScholarGoogle Scholar
  7. Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end Recovery of Human Shape and Pose. http://arxiv.org/abs/1712.06584 arXiv:1712.06584 [cs].Google ScholarGoogle Scholar
  8. Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, and Jitendra Malik. 2019. Learning 3D Human Dynamics From Video. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 5607–5616. https://doi.org/10.1109/CVPR.2019.00576Google ScholarGoogle ScholarCross RefCross Ref
  9. Muhammed Kocabas, Nikos Athanasiou, and Michael J. Black. 2020. VIBE: Video Inference for Human Body Pose and Shape Estimation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 5252–5262. https://doi.org/10.1109/CVPR42600.2020.00530Google ScholarGoogle Scholar
  10. Muhammed Kocabas, Chun-Hao P. Huang, Otmar Hilliges, and Michael J. Black. 2021. PARE: Part Attention Regressor for 3D Human Body Estimation. http://arxiv.org/abs/2104.08527 arXiv:2104.08527 [cs].Google ScholarGoogle Scholar
  11. Nikos Kolotouros, Georgios Pavlakos, Michael Black, and Kostas Daniilidis. 2019. Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 2252–2261. https://doi.org/10.1109/ICCV.2019.00234Google ScholarGoogle ScholarCross RefCross Ref
  12. Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, and Peter V. Gehler. 2017. Unite the People: Closing the Loop Between 3D and 2D Human Representations. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, 4704–4713. https://doi.org/10.1109/CVPR.2017.500Google ScholarGoogle ScholarCross RefCross Ref
  13. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics 34, 6. http://dx.doi.org/10.1145/2816795.2818013 Blendshapes;Body shapes;Graphics pipeline;Linear functions;Rotation matrices;Skinning;Soft tissue;Soft tissue deformation;.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Zhengyi Luo, S. Alireza Golestaneh, and Kris M. Kitani. 2020. 3D Human Motion Estimation via Motion Compression and Refinement. http://arxiv.org/abs/2008.03789 arXiv:2008.03789 [cs].Google ScholarGoogle Scholar
  15. Gyeongsik Moon, Hongsuk Choi, and Kyoung Mu Lee. 2022. Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation. http://arxiv.org/abs/2011.11534 arXiv:2011.11534 [cs].Google ScholarGoogle Scholar
  16. Mohamed Omran, Christoph Lassner, Gerard Pons-Moll, Peter Gehler, and Bernt Schiele. 2018. Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation. In 2018 International Conference on 3D Vision (3DV). IEEE, Verona, 484–494. https://doi.org/10.1109/3DV.2018.00062Google ScholarGoogle Scholar
  17. Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, and Kostas Daniilidis. 2018. Learning to Estimate 3D Human Pose and Shape from a Single Color Image. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, 459–468. https://doi.org/10.1109/CVPR.2018.00055Google ScholarGoogle ScholarCross RefCross Ref
  18. D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli. 2020. 3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  19. Akash Sengupta, Ignas Budvytis, and Roberto Cipolla. 2020. Synthetic Training for Accurate 3D Human Pose and Shape Estimation in the Wild. http://arxiv.org/abs/2009.10013 arXiv:2009.10013 [cs].Google ScholarGoogle Scholar
  20. Yu Sun, Yun Ye, Wu Liu, Wenpeng Gao, Yili Fu, and Tao Mei. 2019. Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 5348–5357. https://doi.org/10.1109/ICCV.2019.00545Google ScholarGoogle ScholarCross RefCross Ref
  21. Y. Tian, H. Zhang, Y. Liu, and L. Wang. 2022. Recovering 3D Human Mesh from Monocular Images: A Survey. (2022).Google ScholarGoogle Scholar
  22. Xuejun Wang, Tiancheng Xu, Xiaoqiang Zhang, chenyang Song, and Qi Lin. 2018. Acupoint coordinate mesurement based on binocular vision. Electronic Mesurement Technology 41, 22 (2018), 5.Google ScholarGoogle Scholar
  23. Tiancheng Xu and Youbing Xia. 2021. Guidance for Acupuncture Robot with Potentially Utilizing Medical Robotic Technologies. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE 2021 (MAR 31 2021). https://doi.org/10.1155/2021/8883598Google ScholarGoogle Scholar
  24. Yuanlu Xu, Song-Chun Zhu, and Tony Tung. 2019. DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 7759–7769. https://doi.org/10.1109/ICCV.2019.00785Google ScholarGoogle ScholarCross RefCross Ref
  25. Hongwen Zhang, Jie Cao, Guo Lu, Wanli Ouyang, and Zhenan Sun. 2019. DaNet: Decompose-and-aggregate Network for 3D Human Shape and Pose Estimation. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, Nice France, 935–944. https://doi.org/10.1145/3343031.3351057Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hongwen Zhang, Jie Cao, Guo Lu, Wanli Ouyang, and Zhenan Sun. 2020. Learning 3D Human Shape and Pose from Dense Body Parts. http://arxiv.org/abs/1912.13344 arXiv:1912.13344 [cs].Google ScholarGoogle Scholar
  27. Hongwen Zhang, Yating Tian, Xinchi Zhou, Wanli Ouyang, Yebin Liu, Limin Wang, and Zhenan Sun. 2021. PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, QC, Canada, 11426–11436. https://doi.org/10.1109/ICCV48922.2021.01125Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Monocular Human Body Shape Estimation: A Generation-aid Approach

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          VRCAI '22: Proceedings of the 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry
          December 2022
          284 pages
          ISBN:9798400700316
          DOI:10.1145/3574131
          • Editors:
          • Enhua Wu,
          • Lionel Ming-Shuan Ni,
          • Zhigeng Pan,
          • Daniel Thalmann,
          • Ping Li,
          • Charlie C.L. Wang,
          • Lei Zhu,
          • Minghao Yang

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 January 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate51of107submissions,48%

          Upcoming Conference

          SIGGRAPH '24
        • Article Metrics

          • Downloads (Last 12 months)22
          • Downloads (Last 6 weeks)0

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format