Skip to main content

Reconstructing 3D Human Avatars from Monocular Images

  • Chapter
  • First Online:
Real VR – Immersive Digital Reality

Abstract

Creating convincing representations of humans is a fundamental problem in both traditional arts and modern media. In our digital world, virtual avatars allow us to simulate and render the human body for a variety of applications, including movie production, sports, human-computer interaction, and medical sciences. However, capturing digital representations of a person’s shape, appearance, and motion is an expensive and time-consuming process which usually requires a lot of manual adjustments.

With the advances in consumer-grade virtual reality devices, personalized virtual avatars became an essential part of interactive and immersive applications like telepresence and virtual try-on for online fashion shopping, thereby increasing the need for versatile easy-to-use self-digitization.

In this chapter, we discuss a selection of recent acquisition methods for personalized human avatar reconstruction. In contrast to conventional setups, these fully-automatic approaches only use low-cost monocular video cameras to effectively fuse information from multiple points in time and realistically complete reconstructions from sparse observations. We address both straight-forward and sophisticated reconstruction methods focused on accuracy, simplicity, and usability to compare and provide insights into their visual fidelity and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmed, N., de Aguiar, E., Theobalt, C., Magnor, M., Seidel, H.P.: Automatic generation of personalized human avatars from multi-view video. In: Proceedings of the ACM Symposium on Virtual Reality Software and Technology, pp. 257–260. ACM (2005)

    Google Scholar 

  2. Aliev, K.A., Ulyanov, D., Lempitsky, V.: Neural point-based graphics. arXiv preprint arXiv:1906.08240 (2019)

  3. Allain, B., Franco, J.S., Boyer, E.: An efficient volumetric framework for shape tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 268–276. IEEE (2015)

    Google Scholar 

  4. Alldieck, T., Kassubeck, M., Wandt, B., Rosenhahn, B., Magnor, M.: Optical flow-based 3D human motion estimation from monocular video. In: Roth, V., Vetter, T. (eds.) GCPR 2017. LNCS, vol. 10496, pp. 347–360. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66709-6_28

    Chapter  Google Scholar 

  5. Alldieck, T., Magnor, M., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1175–1186. IEEE (2019)

    Google Scholar 

  6. Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Detailed human avatars from monocular video. In: International Conference on 3D Vision, pp. 98–109. IEEE (2018)

    Google Scholar 

  7. Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3D people models. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8387–8397. IEEE (2018)

    Google Scholar 

  8. Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.: Tex2Shape: detailed full human body geometry from a single image. In: IEEE International Conference on Computer Vision. IEEE (2019)

    Google Scholar 

  9. Allen, B., Curless, B., Curless, B., Popović, Z.: The space of human body shapes: reconstruction and parameterization from range scans. ACM Trans. Graph. 22(3), 587–594 (2003)

    Article  Google Scholar 

  10. Allen, B., Curless, B., Popović, Z., Hertzmann, A.: Learning a correlated model of identity and pose-dependent body shape variation for real-time synthesis. In: Proceedings of the 2006 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 147–156 (2006)

    Google Scholar 

  11. Alp Güler, R., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7297–7306. IEEE (2018)

    Google Scholar 

  12. Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. ACM Trans. Graph. 24(3), 408–416 (2005)

    Article  Google Scholar 

  13. Bălan, A.O., Black, M.J.: The naked truth: estimating body shape under clothing. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 15–29. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_2

    Chapter  Google Scholar 

  14. Bălan, A.O., Sigal, L., Black, M.J., Davis, J.E., Haussecker, H.W.: Detailed human shape and pose from images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)

    Google Scholar 

  15. Blinn, J.F., Newell, M.E.: Texture and reflection in computer generated images. Commun. ACM 19(10), 542–547 (1976)

    Article  Google Scholar 

  16. Bogo, F., Black, M.J., Loper, M., Romero, J.: Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In: IEEE International Conference on Computer Vision, pp. 2300–2308. IEEE (2015)

    Google Scholar 

  17. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34

    Chapter  Google Scholar 

  18. Bogo, F., Romero, J., Pons-Moll, G., Black, M.J.: Dynamic FAUST: registering human bodies in motion. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2017)

    Google Scholar 

  19. Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond Euclidean data. IEEE Sign. Process. Mag. 34, 18–42 (2017)

    Article  Google Scholar 

  20. Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2017)

    Google Scholar 

  21. Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: FaceWarehouse: a 3D facial expression database for visual computing. IEEE Trans. Visual. Comput. Graph. 20(3), 413–425 (2013)

    Google Scholar 

  22. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2017)

    Google Scholar 

  23. Carranza, J., Theobalt, C., Magnor, M.A., Seidel, H.P.: Free-viewpoint video of human actors. ACM Trans. Graph. 22(3), 569–577 (2003)

    Article  Google Scholar 

  24. Chen, X., Guo, Y., Zhou, B., Zhao, Q.: Deformable model for estimating clothed and naked human shapes from a single image. Vis. Comput. 29(11), 1187–1196 (2013)

    Article  Google Scholar 

  25. Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)

    Google Scholar 

  26. Collet, A., et al.: High-quality streamable free-viewpoint video. ACM Trans. Graph. 34(4), 69 (2015)

    Article  Google Scholar 

  27. Cui, Y., Chang, W., Nöll, T., Stricker, D.: KinectAvatar: fully automatic body capture using a single kinect. In: Park, J.-I., Kim, J. (eds.) ACCV 2012. LNCS, vol. 7729, pp. 133–147. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37484-5_12

    Chapter  Google Scholar 

  28. De Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.P., Thrun, S.: Performance capture from sparse multi-view video. ACM Trans. Graph. 27(3), 98 (2008)

    Article  Google Scholar 

  29. De Aguiar, E., Theobalt, C., Magnor, M., Seidel, H.P., et al.: Reconstructing human shape and motion from multi-view video. In: 2nd European Conference on Visual Media Production (CVMP), pp. 42–49 (2005)

    Google Scholar 

  30. Dibra, E., Jain, H., Öztireli, C., Ziegler, R., Gross, M.: HS-Nets: estimating human body shape from silhouettes with convolutional neural networks. In: International Conference on 3D Vision, pp. 108–117. IEEE (2016)

    Google Scholar 

  31. Dibra, E., Jain, H., Öztireli, C., Ziegler, R., Gross, M.: Human shape from silhouettes using generative HKS descriptors and cross-modal neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2017)

    Google Scholar 

  32. Dibra, E., Öztireli, C., Ziegler, R., Gross, M.: Shape from selfies: human body shape estimation using CCA regression forests. In: European Conference on Computer Vision, pp. 88–104 (2016)

    Google Scholar 

  33. Dou, M., et al.: Fusion4D: real-time performance capture of challenging scenes. ACM Trans. Graph. 35(4), 114 (2016)

    Article  Google Scholar 

  34. Gall, J., Stoll, C., De Aguiar, E., Theobalt, C., Rosenhahn, B., Seidel, H.P.: Motion capture using joint skeleton tracking and surface estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1746–1753. IEEE (2009)

    Google Scholar 

  35. Gilbert, A., Volino, M., Collomosse, J., Hilton, A.: Volumetric performance capture from minimal camera viewpoints. In: European Conference on Computer Vision (2018)

    Google Scholar 

  36. Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (2018)

    Google Scholar 

  37. Guan, P., Weiss, A., Bălan, A.O., Black, M.J.: Estimating human shape and pose from a single image. In: IEEE International Conference on Computer Vision, pp. 1381–1388. IEEE (2009)

    Google Scholar 

  38. Guler, R.A., Kokkinos, I.: Holopose: holistic 3D human reconstruction in-the-wild. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10884–10894. IEEE (2019)

    Google Scholar 

  39. Guo, Y., Chen, X., Zhou, B., Zhao, Q.: Clothed and naked human shapes estimation from a single image. In: Hu, S.-M., Martin, R.R. (eds.) CVM 2012. LNCS, vol. 7633, pp. 43–50. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34263-9_6

    Chapter  Google Scholar 

  40. Habermann, M., Xu, W., Zollhöfer, M., Pons-Moll, G., Theobalt, C.: LiveCap: real-time human performance capture from monocular video. ACM Trans. Graph. 38(2), 14:1–14:17 (2019)

    Article  Google Scholar 

  41. Hasler, N., Ackermann, H., Rosenhahn, B., Thormahlen, T., Seidel, H.P.: Multilinear pose and body shape estismation of dressed subjects from image sets. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1823–1830. IEEE (2010)

    Google Scholar 

  42. Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., Seidel, H.P.: A statistical model of human pose and body shape. Comput. Graph. Forum 28(2), 337–346 (2009)

    Article  Google Scholar 

  43. Henderson, P., Ferrari, V.: Learning to generate and reconstruct 3D meshes with only 2D supervision. In: British Machine Vision Conference (2018)

    Google Scholar 

  44. Hesse, N., Pujades, S., Black, M.J., Arens, M., Hofmann, U., Schroeder, S.: Learning and tracking the 3D body shape of freely moving infants from RGB-D sequences. Trans. Pattern Anal. Mach. Intell. (TPAMI) (2019). https://doi.org/10.1109/TPAMI.2019.2917908. 12 Pages

  45. Hilton, A., Beresford, D.J., Gentils, T., Smith, R.S., Sun, W.: Virtual people: capturing human models to populate virtual worlds. Proc. Comput. Anim. 99, 174 (1999)

    Google Scholar 

  46. Hirshberg, D.A., Loper, M., Rachlin, E., Black, M.J.: Coregistration: simultaneous alignment and modeling of articulated 3D shape. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 242–255. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_18

    Chapter  Google Scholar 

  47. Huang, C.H., Allain, B., Franco, J.S., Navab, N., Ilic, S., Boyer, E.: Volumetric 3D tracking by detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3862–3870. IEEE (2016)

    Google Scholar 

  48. Huang, Y., et al.: Towards accurate markerless human shape and pose estimation over time. In: International Conference on 3D Vision. IEEE (2017)

    Google Scholar 

  49. Huang, Y., Kaufmann, M., Aksan, E., Black, M.J., Hilliges, O., Pons-Moll, G.: Deep inertial poser learning to reconstruct human pose from sparseinertial measurements in real time. ACM Trans. Graph. 37(6), 185:1–185:15 (2018)

    Article  Google Scholar 

  50. Huang, Z., et al.: Deep volumetric video from very sparse multi-view performance capture. In: European Conference on Computer Vision, pp. 336–354 (2018)

    Google Scholar 

  51. Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., Stamminger, M.: VolumeDeform: real-time volumetric non-rigid reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 362–379. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_22

    Chapter  Google Scholar 

  52. Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_3

    Chapter  Google Scholar 

  53. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134. IEEE (2017)

    Google Scholar 

  54. Jackson, A.S., Manafas, C., Tzimiropoulos, G.: 3D human body reconstruction from a single image via volumetric regression. In: European Conference on Computer Vision, pp. 64–77 (2018)

    Google Scholar 

  55. Joo, H., Simon, T., Sheikh, Y.: Total capture: a 3D deformation model for tracking faces, hands, and bodies. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8320–8329. IEEE (2018)

    Google Scholar 

  56. Kakadiaris, I.A., Metaxas, D.: 3D human body model acquisition from multiple views. In: IEEE International Conference on Computer Vision. IEEE (1995)

    Google Scholar 

  57. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2018)

    Google Scholar 

  58. Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3D human dynamics from video. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5614–5623. IEEE (2019)

    Google Scholar 

  59. Kim, M., et al.: Data-driven physics for human soft tissue animation. ACM Trans. Graph. 36(4), 1–12 (2017)

    Article  Google Scholar 

  60. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, vol. 5 (2015)

    Google Scholar 

  61. Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)

    Google Scholar 

  62. Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2017)

    Google Scholar 

  63. Leroy, V., Franco, J.S., Boyer, E.: Multi-view dynamic shape refinement using local temporal integration. In: IEEE International Conference on Computer Vision. IEEE (2017)

    Google Scholar 

  64. Li, H., Vouga, E., Gudym, A., Luo, L., Barron, J.T., Gusev, G.: 3D self-portraits. ACM Trans. Graph. 32(6), 187 (2013)

    Google Scholar 

  65. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2016)

    Google Scholar 

  66. Lombardi, S., Simon, T., Saragih, J., Schwartz, G., Lehrmann, A., Sheikh, Y.: Neural volumes: learning dynamic renderable volumes from images. arXiv preprint arXiv:1906.07751 (2019)

  67. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1–248:16 (2015)

    Article  Google Scholar 

  68. von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: European Conference on Computer Vision (2018)

    Google Scholar 

  69. von Marcard, T., Pons-Moll, G., Rosenhahn, B.: Human pose estimation from video and IMUs. Trans. Pattern Anal. Mach. Intell. (PAMI) 38, 1533–1547 (2016)

    Article  Google Scholar 

  70. von Marcard, T., Rosenhahn, B., Black, M.J., Pons-Moll, G.: Sparse inertial poser: automatic 3D human pose estimation from sparse IMUs. In: Computer Graphics Forum, pp. 349–360 (2017)

    Google Scholar 

  71. Matusik, W., Buehler, C., Raskar, R., Gortler, S.J., McMillan, L.: Image-based visual hulls. In: Annual Conference on Computer Graphics and Interactive Techniques, pp. 369–374 (2000)

    Google Scholar 

  72. Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)

    Google Scholar 

  73. Michalkiewicz, M., Pontes, J.K., Jack, D., Baktashmotlagh, M., Eriksson, A.: Deep level sets: implicit surface representations for 3D shape inference. arXiv preprint arXiv:1901.06802 (2019)

  74. Natsume, R., et al.: SiCloPe: silhouette-based clothed people. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)

    Google Scholar 

  75. Newcombe, R.A., Fox, D., Seitz, S.M.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 343–352. IEEE (2015)

    Google Scholar 

  76. Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., Schiele, B.: Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: International Conference on 3D Vision. IEEE (2018)

    Google Scholar 

  77. Orts-Escolano, S., et al.: Holoportation: virtual 3D teleportation in real-time. In: Symposium on User Interface Software and Technology, pp. 741–754 (2016)

    Google Scholar 

  78. Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)

    Google Scholar 

  79. Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)

    Google Scholar 

  80. Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2018)

    Google Scholar 

  81. Pishchulin, L., et al.: DeepCut: joint subset partition and labeling for multi person pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2016)

    Google Scholar 

  82. Pons-Moll, G., Baak, A., Helten, T., Müller, M., Seidel, H.P., Rosenhahn, B.: Multisensor-fusion for 3D full-body human motion capture. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2010)

    Google Scholar 

  83. Pons-Moll, G., Fleet, D.J., Rosenhahn, B.: Posebits for monocular human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2345–2352. IEEE (2014)

    Google Scholar 

  84. Pons-Moll, G., Pujades, S., Hu, S., Black, M.J.: ClothCap: seamless 4D clothing capture and retargeting. ACM Trans. Graph. 36(4), 1–15 (2017)

    Article  Google Scholar 

  85. Pons-Moll, G., Romero, J., Mahmood, N., Black, M.J.: Dyna: a model of dynamic human shape in motion. ACM Trans. Graph. 34, 120 (2015)

    Article  Google Scholar 

  86. Pons-Moll, G., Rosenhahn, B.: Model-based pose estimation. In: Moeslund, T., Hilton, A., Krüger, V., Sigal, L. (eds.) Visual Analysis of Humans, pp. 139–170. Springer, London (2011). https://doi.org/10.1007/978-0-85729-997-0_9

    Chapter  Google Scholar 

  87. Pumarola, A., Sanchez, J., Choi, G., Sanfeliu, A., Moreno-Noguer, F.: 3DPeople: modeling the geometry of dressed humans. arXiv preprint arXiv:1904.04571 (2019)

  88. Rhodin, H., Robertini, N., Casas, D., Richardt, C., Seidel, H.-P., Theobalt, C.: General automatic human shape and motion capture using volumetric contour cues. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 509–526. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_31

    Chapter  Google Scholar 

  89. Robertini, N., Casas, D., Rhodin, H., Seidel, H.P., Theobalt, C.: Model-based outdoor performance capture. In: International Conference on 3D Vision. IEEE (2016)

    Google Scholar 

  90. Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. 36(6), 245 (2017)

    Article  Google Scholar 

  91. Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: IEEE International Conference on Computer Vision. IEEE (2019)

    Google Scholar 

  92. Shapiro, A., et al.: Rapid avatar capture and simulation using commodity depth sensors. Comput. Anim. Virtual Worlds 25(3–4), 201–211 (2014)

    Article  Google Scholar 

  93. Shysheya, A., et al.: Textured neural avatars. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2387–2397. IEEE (2019)

    Google Scholar 

  94. Sigal, L., Balan, A., Black, M.J.: Combined discriminative and generative articulated pose and non-rigid shape estimation. In: Advances in Neural Information Processing Systems, pp. 1337–1344 (2007)

    Google Scholar 

  95. Sitzmann, V., Thies, J., Heide, F., Nießner, M., Wetzstein, G., Zollhöfer, M.: DeepVoxels: learning persistent 3D feature embeddings. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)

    Google Scholar 

  96. Slavcheva, M., Baust, M., Cremers, D., Ilic, S.: KillingFusion: non-rigid 3D reconstruction without correspondences. In: IEEE Conference on Computer Vision and Pattern Recognition, p. 7, no. 4. IEEE (2017)

    Google Scholar 

  97. Sminchisescu, C., Telea, A.: Human pose estimation from silhouettes. A consistent approach using distance level sets. In: 10th International Conference on Computer Graphics, Visualization and Computer Vision (WSCG 2002) (2002)

    Google Scholar 

  98. Sminchisescu, C., Triggs, B.: Kinematic jump processes for monocular 3D human tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, p. I. IEEE (2003)

    Google Scholar 

  99. Starck, J., Hilton, A.: Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27(3), 21–31 (2007)

    Article  Google Scholar 

  100. Stoll, C., Hasler, N., Gall, J., Seidel, H.P., Theobalt, C.: Fast articulated motion tracking using a sums of gaussians body model. In: IEEE International Conference on Computer Vision, pp. 951–958. IEEE (2011)

    Google Scholar 

  101. Tao, Y., et al.: DoubleFusion: real-time capture of human performance with inner body shape from a depth sensor. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2018)

    Google Scholar 

  102. Tao, Y., et al.: SimulCap: single-view human performance capture with cloth simulation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2019)

    Google Scholar 

  103. Taylor, C.J.: Reconstruction of articulated objects from point correspondences in a single uncalibrated image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 677–684. IEEE (2000)

    Google Scholar 

  104. Theobalt, C., Aguiar, E., Magnor, M.A., Seidel, H.P.: Reconstructing human shape, motion and appearance from multi-view video. In: Ozaktas, H.M., Onural, L. (eds.) Three-Dimensional Television. Signals and Communication Technology, pp. 29–57. Springer, Berlin (2008). https://doi.org/10.1007/978-3-540-72532-9_3

    Chapter  Google Scholar 

  105. Theobalt, C., Carranza, J., Magnor, M.A.: Enhancing silhouette-based human motion capture with 3D motion fields. In: Proceedings of the 11th Pacific Conference on Computer Graphics and Applications, pp. 185–193 (2003)

    Google Scholar 

  106. Tung, H.Y., Tung, H.W., Yumer, E., Fragkiadaki, K.: Self-supervised learning of motion capture. In: Advances in Neural Information Processing Systems, pp. 5236–5246 (2017)

    Google Scholar 

  107. Varol, G., et al.: BodyNet: volumetric inference of 3D human body shapes. In: European Conference on Computer Vision (2018)

    Google Scholar 

  108. Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph. 27(3), 97 (2008)

    Article  Google Scholar 

  109. Wang, W., Qiangeng, X., Ceylan, D., Mech, R., Neumann, U.: DISN: deep implicit surface network for high-quality single-view 3D reconstruction. arXiv preprint arXiv:1905.10711 (2019)

  110. Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: Asilomar Conference on Signals, Systems & Computers, vol. 2, pp. 1398–1402 (2003)

    Google Scholar 

  111. Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2016)

    Google Scholar 

  112. Weiss, A., Hirshberg, D., Black, M.J.: Home 3D body scans from noisy image and range data. In: IEEE International Conference on Computer Vision, pp. 1951–1958. IEEE (2011)

    Google Scholar 

  113. Xiang, D., Joo, H., Sheikh, Y.: Monocular total capture: posing face, body, and hands in the wild. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10965–10974. IEEE (2019)

    Google Scholar 

  114. Xu, W., et al.: MonoPerfCap: human performance capture from monocular video. ACM Trans. Graph. 37, 1–15 (2018)

    Google Scholar 

  115. Yao, P., Fang, Z., Wu, F., Feng, Y., Li, J.: DenseBody: directly regressing dense 3d human pose and shape from a single color image. arXiv preprint arXiv:1903.10153 (2019)

  116. Zeng, M., Zheng, J., Cheng, X., Liu, X.: Templateless quasi-rigid shape modeling with implicit loop-closure. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 145–152. IEEE (2013)

    Google Scholar 

  117. Zhang, C., Pujades, S., Black, M.J., Pons-Moll, G.: Detailed, accurate, human shape estimation from clothed 3D scan sequences. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2017)

    Google Scholar 

  118. Zhang, Q., Fu, B., Ye, M., Yang, R.: Quality dynamic human body modeling using a single low-cost depth camera. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 676–683. IEEE (2014)

    Google Scholar 

  119. Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: DeepHuman: 3D human reconstruction from a single image. arXiv preprint arXiv:1903.06473 (2019)

  120. Zhu, H., Zuo, X., Wang, S., Cao, X., Yang, R.: Detailed human shape estimation from a single image by hierarchical mesh deformation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4491–4500. IEEE (2019)

    Google Scholar 

  121. Zuffi, S., Kanazawa, A., Jacobs, D., Black, M.J.: 3D menagerie: modeling the 3D shape and pose of animals. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5524–5532. IEEE (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Susana Castillo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Alldieck, T., Kappel, M., Castillo, S., Magnor, M. (2020). Reconstructing 3D Human Avatars from Monocular Images. In: Magnor, M., Sorkine-Hornung, A. (eds) Real VR – Immersive Digital Reality. Lecture Notes in Computer Science(), vol 11900. Springer, Cham. https://doi.org/10.1007/978-3-030-41816-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41816-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41815-1

  • Online ISBN: 978-3-030-41816-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics