Abstract
This paper presents a novel framework to estimate detailed human body shape from color images in an unsupervised manner. It is a challenging task due to factors such as variations in human shapes, occlusion, and cloth details. The existing methods are mainly supervised and require a large number of ground truth real training data which is usually hard to obtain. To solve this problem, we propose an unsupervised detailed human shape estimation method from multi-view color images. Specifically, we first predict the depth map for the source view through robust photometric consistency with different views. Then, we predict the initial SMPL model from the color image and refine it by an iterative error feedback regressor based on point clouds of the predicted depth map. Finally, the refined SMPL model is deformed to fit the details (i.e., clothes and faces) on the point clouds to recover the detailed human shapes which are represented by adding a set of offsets to the SMPL model. The experimental results on different dataset demonstrate that our method outperforms the state-of-the-art methods and achieves higher reconstruction accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alldieck, T., Magnor, M., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1175–1186 (2019)
Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3D people models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8387–8397 (2018)
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep It SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4733–4742 (2016)
Fish Tung, H.Y., Harley, A.W., Seto, W., Fragkiadaki, K.: Adversarial inverse graphics networks: learning 2D-to-3D lifting and image-to-image translation from unpaired supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4354–4362 (2017)
Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: 3D-coded: 3D correspondences by deep deformation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 230–246 (2018)
He, T., Collomosse, J.P., Jin, H., Soatto, S.: Geo-pifu: geometry and pixel aligned implicit functions for single-view human reconstruction. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6–12, 2020, virtual (2020)
Jiang, B., Zhang, J., Hong, Y., Luo, J., Liu, L., Bao, H.: BCNet: learning body and cloth shape from a single image. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 18–35. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_2
Joo, H., et al.: Panoptic studio: a massively multiview system for social interaction capture. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7122–7131 (2018)
Khot, T., Agrawal, S., Tulsiani, S., Mertz, C., Lucey, S., Hebert, M.: Learning unsupervised multi-view stereopsis via robust photometric consistency. arXiv preprint arXiv:1905.02706 (2019)
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 2252–2261 (2019)
Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4501–4510 (2019)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 1–16 (2015)
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3d surface construction algorithm. In: Stone, M.C. (ed.) Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1987, Anaheim, California, USA, July 27–31, 1987, pp. 163–169. ACM (1987). https://doi.org/10.1145/37401.37422
Ma, Q., et al.: Learning to dress 3D people in generative clothing. In: Computer Vision and Pattern Recognition (CVPR) (2020)
Mahjourian, R., Wicke, M., Angelova, A.: Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5667–5675 (2018)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017)
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: Pifu: pixel-aligned implicit function for high-resolution clothed human digitization. In: The IEEE International Conference on Computer Vision (ICCV), October 2019
Saito, S., Simon, T., Saragih, J.M., Joo, H.: Pifuhd: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 81–90. IEEE (2020)
Sorkine, O.: Differential representations for mesh processing. Comput. Graph. Forum 25(4), 789–807 (2006). https://doi.org/10.1111/j.1467-8659.2006.00999.x
Tang, S., Tan, F., Cheng, K., Li, Z., Zhu, S., Tan, P.: A neural network for detailed human depth estimation from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019
Varol, G., et al.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017)
Wang, K., Zhang, G., Yang, J., Bao, H.: Dynamic human body reconstruction and motion tracking with low-cost depth cameras. Vis. Comput. 37(3), 603–618 (2021)
Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861
Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: Mvsnet: depth inference for unstructured multi-view stereo. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 767–783 (2018)
Zhang, C., Pujades, S., Black, M.J., Pons-Moll, G.: Detailed, accurate, human shape estimation from clothed 3D scan sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4191–4200 (2017)
Zhu, H., Zuo, X., Wang, S., Cao, X., Yang, R.: Detailed human shape estimation from a single image by hierarchical mesh deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Acknowledgements
This work was supported in part by the Fundamental Research Funds for the Central Universities (NJ2020023), in part by the Open Project Program of State Key Laboratory of Virtual Reality Technology and Systems of Beihang University (No.VRLAB2021C03), and in part by the Open Project Program of the State Key Lab of CAD&CG of Zhejiang University (Grant No.A2106).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zheng, H., Wang, K., Li, W., Yang, J. (2021). Unsupervised Detailed Human Shape Estimation from Multi-view Color Images. In: Peng, Y., Hu, SM., Gabbouj, M., Zhou, K., Elad, M., Xu, K. (eds) Image and Graphics. ICIG 2021. Lecture Notes in Computer Science(), vol 12889. Springer, Cham. https://doi.org/10.1007/978-3-030-87358-5_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-87358-5_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87357-8
Online ISBN: 978-3-030-87358-5
eBook Packages: Computer ScienceComputer Science (R0)