Skip to main content

Advertisement

Log in

Single-image clothed 3D human reconstruction guided by a well-aligned parametric body model

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Reconstructing clothed 3D human models from a single image is rather challenging, since the information about the invisible areas of a human being has to be “guessed” by algorithms. To reduce the difficulty, current state-of-the-art methods usually employ a parametric 3D body model to guide the clothed 3D human reconstruction. However, the quality of reconstructed clothed 3D human models heavily depends on the accuracy of the parametric body model. To address this problem, we propose to employ a well-aligned parametric body model to guide single-image clothed 3D human reconstruction. First, the STAR model is adopted as the statistical model to represent the parametric body model, and a two-stage method that combines a regression-based approach and an optimization-based approach is proposed to estimate the pose and shape parameters iteratively. By incorporating the advantages of the statistical models and the parameter estimation method, a well-aligned 3D body model can be recovered from a single input image. Then, a deep neural network that fuses the 3D geometry information of the 3D parametric body model and the visual features extracted from the input image is proposed for reconstructing clothed 3D human models. Training losses that aim to align the reconstructed model with the ground-truth model respectively in the 3D model space and the multi-view 2D re-projection spaces are designed. Quantitative and qualitative experimental results on three public datasets (THuman, BUFF, and LSP) show that our method produces more accurate and robust clothed 3D human reconstructions compared to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets used in our paper (THuman, BUFF, and LSP) are publicly available.

References

  1. Zhao, Y., Jiang, J., Chen, Y., Liu, R., Yang, Y., Xue, X., Chen, S.: Metaverse: perspectives from graphics, interactions and visualization. Vis. Inform. 6(1), 56–67 (2022). https://doi.org/10.1016/j.visinf.2022.03.002

    Article  Google Scholar 

  2. Pons-Moll, G., Pujades, S., Hu, S., Black, M.J.: ClothCap: seamless 4D clothing capture and retargeting. ACM Trans. Graphics 36(4), 1–15 (2017). https://doi.org/10.1145/3072959.3073711

    Article  Google Scholar 

  3. Cha, Y.-W., Price, T., Wei, Z., Lu, X., Rewkowski, N., Chabra, R., Qin, Z., Kim, H., Su, Z., Liu, Y., Ilie, A., State, A., Xu, Z., Frahm, J.-M., Fuchs, H.: Towards fully mobile 3D face, body, and environment capture using only head-worn cameras. IEEE Trans. Vis. Comput. Graphics 24(11), 2993–3004 (2018). https://doi.org/10.1109/TVCG.2018.2868527

    Article  Google Scholar 

  4. Huang, Z., Li, T., Chen, W., Zhao, Y., Xing, J., LeGendre, C., Luo, L., Ma, C., Li, H.: Deep volumetric video from very sparse multi-view performance capture. In: European Conference on Computer Vision (ECCV), pp. 336–354 (2018). https://doi.org/10.1007/978-3-030-01270-0_21

  5. Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3D people models. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8387–8397 (2018). https://doi.org/10.1109/CVPR.2018.00875

  6. Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: DeepHuman: 3D human reconstruction from a single image. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7738–7748 (2019). https://doi.org/10.1109/ICCV.2019.00783

  7. Liu, L., Sun, J., Gao, Y., Chen, J.: HEI-Human: a hybrid explicit and implicit method for single-view 3D clothed human reconstruction. In: Chinese Conference on Pattern Recognition and Computer Vision (PRCV), pp. 251–262 (2021). https://doi.org/10.1007/978-3-030-88007-1_21

  8. Zheng, Z., Yu, T., Liu, Y., Dai, Q.: PaMIR: parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3170–3184 (2022). https://doi.org/10.1109/TPAMI.2021.3050505

    Article  Google Scholar 

  9. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graphics 34(6), 1–16 (2015). https://doi.org/10.1145/2816795.2818013

    Article  Google Scholar 

  10. Osman, A.A., Bolkart, T., Black, M.J.: STAR: sparse trained articulated human body regressor. In: European Conference on Computer Vision (ECCV), pp. 598–613 (2020). https://doi.org/10.1007/978-3-030-58539-6_36

  11. Tung, H.-Y.F., Tung, H.-W., Yumer, E., Fragkiadaki, K.: Self-supervised learning of motion capture. In: International Conference on Neural Information Processing Systems (NIPS), pp. 5242–5252 (2017)

  12. Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., Schiele, B.: Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: International Conference on 3D Vision (3DV), pp. 484–494 (2018). https://doi.org/10.1109/3DV.2018.00062

  13. Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 459–468 (2018). https://doi.org/10.1109/CVPR.2018.00055

  14. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7122–7131 (2018). https://doi.org/10.1109/CVPR.2018.00744

  15. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep It SMPL: automatic estimation of 3D human pose and shape from a single image. In: European Conference on Computer Vision (ECCV), pp. 561–578 (2016). https://doi.org/10.1007/978-3-319-46454-1_34

  16. Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.: DeepCut: Joint subset partition and labeling for multi person pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4929–4937 (2016). https://doi.org/10.1109/CVPR.2016.533

  17. Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4704–4713 (2017). https://doi.org/10.1109/CVPR.2017.500

  18. Huang, Y., Bogo, F., Lassner, C., Kanazawa, A., Gehler, P.V., Romero, J., Akhter, I., Black, M.J.: Towards accurate marker-less human shape and pose estimation over time. In: International Conference on 3D Vision (3DV), pp. 421–430 (2017). https://doi.org/10.1109/3DV.2017.00055

  19. Fang, H.-S., Xie, S., Tai, Y.-W., Lu, C.: RMPE: regional multi-person pose estimation. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2353–2362 (2017). https://doi.org/10.1109/ICCV.2017.256

  20. Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2021). https://doi.org/10.1109/TPAMI.2019.2929257

    Article  Google Scholar 

  21. Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3686–3693 (2014). https://doi.org/10.1109/CVPR.2014.471

  22. Saito, S., Huang, Z., Natsume, R., Morishima, S., Li, H., Kanazawa, A.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2304–2314 (2019). https://doi.org/10.1109/ICCV.2019.00239

  23. He, T., Collomosse, J., Jin, H., Soatto, S.: Geo-PIFu: Geometry and pixel aligned implicit functions for single-view human reconstruction. Int. Conf. Neural Inf. Process. Syst. 33, 9276–9287 (2020)

    Google Scholar 

  24. Natsume, R., Saito, S., Huang, Z., Chen, W., Ma, C., Li, H., Morishima, S.: SiCloPe: Silhouette-based clothed people. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4475–4485 (2019). https://doi.org/10.1109/CVPR.2019.00461

  25. Varol, G., Ceylan, D., Russell, B., Yang, J., Yumer, E., Laptev, I., Schmid, C.: BodyNet: volumetric inference of 3D human body shapes. In: European Conference on Computer Vision (ECCV), Cham, pp. 20–38 (2018). https://doi.org/10.1007/978-3-030-01234-2_2

  26. Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. ACM Trans. Graphics 24(3), 408–416 (2005). https://doi.org/10.1145/1073204.1073207

    Article  Google Scholar 

  27. Pavlakos, G., Choutas, V., Ghorbani, N., Bolkart, T., Osman, A.A., Tzionas, D., Black, M.J.: Expressive body capture: 3D hands, face, and body from a single image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10967–10977 (2019). https://doi.org/10.1109/CVPR.2019.01123

  28. Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.: Tex2Shape: detailed full human body geometry from a single image. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2293–2303 (2019). https://doi.org/10.1109/ICCV.2019.00238

  29. Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: Conference on Computer Graphics and Interactive Techniques, pp. 163–169 (1987). https://doi.org/10.1145/37401.37422

  30. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision (ECCV), Cham, pp. 630–645 (2016). https://doi.org/10.1007/978-3-319-46493-0_38

  31. Geman, S.: Statistical methods for tomographic image reconstruction. Bull. Int. Stat. Inst. 4, 5–21 (1987)

    MathSciNet  Google Scholar 

  32. Jorge, N., Stephen, J.W.: Numerical Optimization. Spinger, New York (2006)

    MATH  Google Scholar 

  33. Zhang, C., Pujades, S., Black, M., Pons-Moll, G.: Detailed, accurate, human shape estimation from clothed 3D scan sequences. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5484–5493 (2017). https://doi.org/10.1109/CVPR.2017.582

  34. Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: British Machine Vision Conference (BMVC), pp. 1–11 (2010). https://doi.org/10.5244/C.24.12

  35. Attene, M.: A lightweight approach to repairing digitized polygon meshes. Vis. Comput. 26(11), 1393–1406 (2010). https://doi.org/10.1007/s00371-010-0416-3

    Article  Google Scholar 

  36. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR), pp. 1–15 (2015)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant no. 62077026 and the Fundamental Research Funds for the Central Universities under Grant no. CCNU22QN012.

Author information

Authors and Affiliations

Authors

Contributions

LL and JC proposed the conceptualization and methodology, LL and YG wrote the main manuscript and prepared the figures, YG and Jianchi Sun conducted the experiments, and JC revised the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Jingying Chen.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, L., Gao, Y., Sun, J. et al. Single-image clothed 3D human reconstruction guided by a well-aligned parametric body model. Multimedia Systems 29, 1579–1592 (2023). https://doi.org/10.1007/s00530-023-01069-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-023-01069-y

Keywords

Navigation