ABSTRACT
Observing human beings from monocular images is one of the basic tasks of computer vision. Reconstructing human bodies from monocular images mainly includes the reconstruction of posture and body shape. However, in the past studies, researchers were more interested in pose estimation, ignoring the study of body shape, and this paper focuses on the estimation of the body shape of a 3D model. Learning body parameters via instance segmentation requires a large number of labels. While the parameters based on pose estimation are completely based on the results of key points detection, which effect is not friendly for pictures with poor angles and low resolution. In response to the above problems, we propose a method to automatically generate datasets. The dataset provides low-resolution images and labels of various angles and blurred shapes. On the generated low-resolution and poorly angled dataset, we propose a generative-assisted deep learning network framework. Experiments show that the framework can effectively estimate the body shape parameters of the model from monocular images.
- Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. http://arxiv.org/abs/1607.08128 Number: arXiv:1607.08128 arXiv:1607.08128 [cs].Google Scholar
- Hongsuk Choi, Gyeongsik Moon, Ju Yong Chang, and Kyoung Mu Lee. 2021. Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Nashville, TN, USA, 1964–1973. https://doi.org/10.1109/CVPR46437.2021.00200Google ScholarCross Ref
- Georgios Georgakis, Ren Li, Srikrishna Karanam, Terrence Chen, Jana Kosecka, and Ziyan Wu. 2020. Hierarchical Kinematic Human Mesh Recovery. http://arxiv.org/abs/2003.04232 arXiv:2003.04232 [cs].Google Scholar
- I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative Adversarial Networks. (2014).Google Scholar
- Riza Alp Guler and Iasonas Kokkinos. 2019. HoloPose: Holistic 3D Human Reconstruction In-The-Wild. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 10876–10886. https://doi.org/10.1109/CVPR.2019.01114Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90Google Scholar
- Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end Recovery of Human Shape and Pose. http://arxiv.org/abs/1712.06584 arXiv:1712.06584 [cs].Google Scholar
- Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, and Jitendra Malik. 2019. Learning 3D Human Dynamics From Video. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 5607–5616. https://doi.org/10.1109/CVPR.2019.00576Google ScholarCross Ref
- Muhammed Kocabas, Nikos Athanasiou, and Michael J. Black. 2020. VIBE: Video Inference for Human Body Pose and Shape Estimation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 5252–5262. https://doi.org/10.1109/CVPR42600.2020.00530Google Scholar
- Muhammed Kocabas, Chun-Hao P. Huang, Otmar Hilliges, and Michael J. Black. 2021. PARE: Part Attention Regressor for 3D Human Body Estimation. http://arxiv.org/abs/2104.08527 arXiv:2104.08527 [cs].Google Scholar
- Nikos Kolotouros, Georgios Pavlakos, Michael Black, and Kostas Daniilidis. 2019. Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 2252–2261. https://doi.org/10.1109/ICCV.2019.00234Google ScholarCross Ref
- Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, and Peter V. Gehler. 2017. Unite the People: Closing the Loop Between 3D and 2D Human Representations. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, 4704–4713. https://doi.org/10.1109/CVPR.2017.500Google ScholarCross Ref
- Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics 34, 6. http://dx.doi.org/10.1145/2816795.2818013 Blendshapes;Body shapes;Graphics pipeline;Linear functions;Rotation matrices;Skinning;Soft tissue;Soft tissue deformation;.Google ScholarDigital Library
- Zhengyi Luo, S. Alireza Golestaneh, and Kris M. Kitani. 2020. 3D Human Motion Estimation via Motion Compression and Refinement. http://arxiv.org/abs/2008.03789 arXiv:2008.03789 [cs].Google Scholar
- Gyeongsik Moon, Hongsuk Choi, and Kyoung Mu Lee. 2022. Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation. http://arxiv.org/abs/2011.11534 arXiv:2011.11534 [cs].Google Scholar
- Mohamed Omran, Christoph Lassner, Gerard Pons-Moll, Peter Gehler, and Bernt Schiele. 2018. Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation. In 2018 International Conference on 3D Vision (3DV). IEEE, Verona, 484–494. https://doi.org/10.1109/3DV.2018.00062Google Scholar
- Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, and Kostas Daniilidis. 2018. Learning to Estimate 3D Human Pose and Shape from a Single Color Image. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, 459–468. https://doi.org/10.1109/CVPR.2018.00055Google ScholarCross Ref
- D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli. 2020. 3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Akash Sengupta, Ignas Budvytis, and Roberto Cipolla. 2020. Synthetic Training for Accurate 3D Human Pose and Shape Estimation in the Wild. http://arxiv.org/abs/2009.10013 arXiv:2009.10013 [cs].Google Scholar
- Yu Sun, Yun Ye, Wu Liu, Wenpeng Gao, Yili Fu, and Tao Mei. 2019. Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 5348–5357. https://doi.org/10.1109/ICCV.2019.00545Google ScholarCross Ref
- Y. Tian, H. Zhang, Y. Liu, and L. Wang. 2022. Recovering 3D Human Mesh from Monocular Images: A Survey. (2022).Google Scholar
- Xuejun Wang, Tiancheng Xu, Xiaoqiang Zhang, chenyang Song, and Qi Lin. 2018. Acupoint coordinate mesurement based on binocular vision. Electronic Mesurement Technology 41, 22 (2018), 5.Google Scholar
- Tiancheng Xu and Youbing Xia. 2021. Guidance for Acupuncture Robot with Potentially Utilizing Medical Robotic Technologies. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE 2021 (MAR 31 2021). https://doi.org/10.1155/2021/8883598Google Scholar
- Yuanlu Xu, Song-Chun Zhu, and Tony Tung. 2019. DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Seoul, Korea (South), 7759–7769. https://doi.org/10.1109/ICCV.2019.00785Google ScholarCross Ref
- Hongwen Zhang, Jie Cao, Guo Lu, Wanli Ouyang, and Zhenan Sun. 2019. DaNet: Decompose-and-aggregate Network for 3D Human Shape and Pose Estimation. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, Nice France, 935–944. https://doi.org/10.1145/3343031.3351057Google ScholarDigital Library
- Hongwen Zhang, Jie Cao, Guo Lu, Wanli Ouyang, and Zhenan Sun. 2020. Learning 3D Human Shape and Pose from Dense Body Parts. http://arxiv.org/abs/1912.13344 arXiv:1912.13344 [cs].Google Scholar
- Hongwen Zhang, Yating Tian, Xinchi Zhou, Wanli Ouyang, Yebin Liu, Limin Wang, and Zhenan Sun. 2021. PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, QC, Canada, 11426–11436. https://doi.org/10.1109/ICCV48922.2021.01125Google ScholarCross Ref
Index Terms
- Monocular Human Body Shape Estimation: A Generation-aid Approach
Recommendations
SHARP: Shape-Aware Reconstruction of People in Loose Clothing
AbstractRecent advancements in deep learning have enabled 3D human body reconstruction from a monocular image, which has broad applications in multiple domains. In this paper, we propose SHARP (SHape Aware Reconstruction of People in loose clothing), a ...
3D Human Body Shape and Pose Estimation from Depth Image
Pattern Recognition and Computer VisionAbstractThis work addresses the problem of 3D human body shape and pose estimation from a single depth image. Most 3D human pose estimation methods based on deep learning utilize RGB images instead of depth images. Traditional optimization-based methods ...
Estimation of human body shape and posture under clothing
Estimating the body shape and posture of a dressed human subject in motion represented as a sequence of (possibly incomplete) 3D meshes is important for virtual change rooms and security. To solve this problem, statistical shape spaces encoding human ...
Comments