Skip to main content
Log in

Single image 3D object reconstruction based on deep learning: A review

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The reconstruction of 3D object from a single image is an important task in the field of computer vision. In recent years, 3D reconstruction of single image using deep learning technology has achieved remarkable results. Traditional methods to reconstruct 3D object from a single image require prior knowledge and assumptions, and the reconstruction object is limited to a certain category or it is difficult to accomplish a good reconstruction from a real image. Although deep learning can solve these problems well with its own powerful learning ability, it also faces many problems. In this paper, we first discuss the challenges faced by applying the deep learning method to reconstruct 3D objects from a single image. Second, we comprehensively review encoders, decoders and training details used in 3D reconstruction of a single image. Then, the common datasets and evaluation metrics of single image 3D object reconstruction in recent years are introduced. In order to analyze the advantages and disadvantages of different 3D reconstruction methods, a series of experiments are used for comparison. In addition, we simply give some related application examples involving 3D reconstruction of a single image. Finally, we summarize this paper and discuss the future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Alldieck T, Magnor M, Bhatnagar BL, Theobalt C, Pons-Moll G (2019) Learning to reconstruct people in clothing from a single RGB camera. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1175–1186

  2. Atick JJ, Griffin PA, Redlich AN (1996) Statistical approach to shape from shading: reconstruction of three-dimensional face surfaces from single two-dimensional images. Neural Comput 8(6):1321–1340

    Article  Google Scholar 

  3. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  4. Baka N, Kaptein BL, Bruijne MD, Walsum TV, Giphart WJ, Lelieveldt BPF (2011) 2D-3D shape reconstruction of the distal femur from stereo x-ray imaging using statistical shape models. Med Image Anal 15(6):840–850

    Article  Google Scholar 

  5. Blanz V, Vetter T (1999) A morphable model for the synthesis of 3D faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, pp 187–194

  6. Bronstein MM, Bruna J, Lecun Y, Szlam A, Vandergheynst P (2017) Geometric deep learning: going beyond euclidean data. IEEE Signal Process Mag 34(4):18–42

    Article  Google Scholar 

  7. Chang AX, Funkhouser T, Guibas L et al (2015) Shapenet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012

  8. Charles RQ, Su H, Mo K, Guibas LJ (2017) Point net: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 77–85

  9. Chen Z, Zhang H (2019) Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5939–5948

  10. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille A (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  11. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille A (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision, pp 801–818

  12. Chen W, Ling H, Gao J, Smith E, Lehtinen J et al (2019) Learning to predict 3D objects with an interpolation-based differentiable renderer. In: Proceedings of the Advances in Neural Information Processing Systems, pp 9605–9616

  13. Chinaev N, Chigorin A, Laptev I (2018) Mobileface: 3D face reconstruction with efficient CNN regression. In: Proceedings of the European Conference on Computer Vision, pp 15–30

  14. Choi J, Medioni G, Lin Y, Silva L, Regina O, Pamplona M, Faltemier TC (2010) 3D face reconstruction using a single or multiple views. In: Proceedings of the International Conference on Pattern Recognition, pp 3959–3962

  15. Choy CB, Xu D, Gwak J, Chen K, Savarese S (2016) 3D-r2n2: a unified approach for single and multi-view 3D object reconstruction. In: Proceedings of the European Conference on Computer Vision, pp 628–644

  16. Cimpoi M, Maji S, Kokkinos I, Mohamed S, Vedaldi A (2014) Describing textures in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3606–3613

  17. Dekhtiar J, Durupt A, Bricogne M, Eynard B, Rowson H, Kiritsis D (2018) Deep learning for big data applications in CAD and PLM–research review, opportunities and case study. Comput Ind 100:227–243

    Article  Google Scholar 

  18. Dou P, Kakadiaris IA (2018) Multi-view 3D face reconstruction with deep recurrent neural networks. Image Vis Comput 80:80–91

    Article  Google Scholar 

  19. Dou P, Shah K, Kakadiaris IA (2017) End-to-end 3D face reconstruction with deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5908–5917

  20. Dovgard R, Basri R (2004) Statistical symmetric shape from shading for 3D structure recovery of faces. In: Proceedings of the European Conference on Computer Vision, pp 99–113

  21. Eckart B, Kim K, Troccoli A, Kelly A, Kautz J (2016) Accelerated generative models for 3D point cloud data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5496–5505

  22. Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  23. Fan H, Su H, Guibas L (2017) A point set generated network for 3D object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 605–613

  24. Feng Y, Wu F, Shao X, Wang Y, Zhou X (2018) Joint 3D face reconstruction and dense alignment with position map regression network. In: Proceedings of the European Conference on Computer Vision, pp 534–551

  25. Furukawa Y, Curless B, Seitz SM, Szeliski R (2010) Towards internet-scale multi-view stereo. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1434-1441

  26. Gadelha M, Maji S, Wang R (2017) 3D shape induction from 2D views of multiple objects. In: Proceedings of the International Conference on 3D Vision, pp 402–411

  27. Genova K, Cole F, Maschinot A, Sarna A, Vlasic D, Freeman WT (2018) Unsupervised training for 3D morphable model regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8377–8386

  28. Girdhar R, Fouhe DF, Rodriguez M, Gupta A (2016) Learning a predictable and generative vector representation for objects. In: Proceedings of the European Conference on Computer Vision, pp 484–499

  29. Gkioxari G, Malik J, Johnson J (2019) Mesh r-cnn. arXiv preprint arXiv:1906.02739

  30. Groueix T, Fisher M, Kim VG, Russell BC, Aubry M (2018) A papier-mâché approach to learning 3D surface generated. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 216–224

  31. Gwak JY, Choy CB, Chandraker M, Garg A, Savarese S (2017) Weakly supervised 3D reconstruction with adversarial constraint. In: Proceedings of the International Conference on 3D Vision, pp 263–272

  32. Ham H, Wesley J, Hendra H (2019) Computer vision based 3D reconstruction: a review. Int J Electr Comput Eng 9(4):2394–2402

    Google Scholar 

  33. Häne C, Tulsiani S, Malik J (2017) Hierarchical surface prediction for 3D object reconstruction. In: Proceedings of International Conference on 3D Vision, pp 76–84

  34. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  35. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969

  36. Hepp B, Nießner M, Hilliges O (2018) Plan3D: viewpoint and trajectory optimization for aerial multi-view stereo reconstruction. ACM Trans Graphics 38(1):1–17

    Article  Google Scholar 

  37. Huang Q, Wang H, Koltun V (2015) Single-view reconstruction via joint analysis of image and shape collections. ACM Trans Graph 34(4):1–10

    Google Scholar 

  38. Huang S, Qi S, Zhu Y, Xiao Y, Xu Y, Zhu SC (2018) Holistic 3D scene parsing and reconstruction from a single rgb image. In: Proceedings of the European Conference on Computer Vision, pp 187–203

  39. Huang PH, Matzen K, Kopf J, Ahuja N, HuangJB (2018) Deepmvs: learning multi-view stereopsis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2821–2830

  40. Insafutdinov E, Dosovitskiy A (2018) Unsupervised learning of shape and pose with differentiable point clouds. In: Proceedings of the Advances in Neural Information Processing Systems, pp 2802–2812

  41. Jack D, Pontes JK, Sridharan S et al (2018) Learning free-form deformations for 3D object reconstruction. In: Proceedings of the Asian Conference on Computer Vision, pp 317–333

  42. Jackson AS, Bulat A, Argyriou V, Tzimiropoulos G (2017) Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1031–1039

  43. Jackson AS, Manafas C, Tzimiropoulos G (2018) 3D human body reconstruction from a single image via volumetric regression. In: Proceedings of the European Conference on Computer Vision, pp 64–77

  44. Jeon Y, Kim J (2017) Active convolution: learning the shape of convolution for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4201–4209

  45. Jiang L, Zhang J, Deng B, Li H, Liu L (2018) 3D face reconstruction with geometry details from a single image. IEEE Trans Image Process 27(10):4756–4770

    Article  MathSciNet  MATH  Google Scholar 

  46. Jiang L, Shi S, Qi X, Jia J (2018) Gal: geometric adversarial loss for single-view 3D-object reconstruction. In: Proceedings of the European Conference on Computer Vision, pp 802–816

  47. Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: Proceedings of the European Conference on Computer Vision, pp 694–711

  48. Kanazawa A, Tulsiani S, Efros AA, Malik J (2018) Learning category-specific mesh reconstruction from image collections. In: Proceedings of the European Conference on Computer Vision, pp 371–386

  49. Kar A, Tulsiani S, Carreira J, Malik J (2015) Category-specific object reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1966–1974

  50. Kato H, Harada T (2019) Learning view priors for single-view 3D reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9778–9787

  51. Kato H, Ushiku Y, Harada T (2018) Neural 3D mesh renderer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3907–3916

  52. Kemelmacher-Shlizerman I (2013) Internet based morphable model. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3256–3263

  53. Khan SH, Guo Y, Hayat M, Barnes N (2019) Unsupervised primitive discovery for improved 3D generative modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9739–9748

  54. Kim J, Lee JK, Lee KM (2016) Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1646–1654

  55. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114

  56. Klokov R, Lempitsky V (2017) Escape from cells: deep kd-networks for the recognition of 3D point cloud models. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2380–7504

  57. Kolotouros N, Pavlakos G, Daniilidis K (2019) Convolutional mesh regression for single-image human shape reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4501–4510

  58. Kulon D, Wang H, Güler RA, Bronstein M, Zafeifiou S (2019) Single image 3D hand reconstruction with mesh convolutions. arXiv preprint arXiv:1905.01326

  59. Larsen ABL, Sønderby SK, Larochelle H, Winther O (2015) Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300

  60. Le T, Duan Y (2018) Pointgrid: a deep network for 3D shape understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9204–9214

  61. Ledig C, Theis L, Huszár F et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 4681–4690

  62. Li CL, Zaheer M, Zhang Y, Poczos B, Salakhutdinov R (2018) Point cloud gan. arXiv preprint arXiv:1810.05795

  63. Li K, Pham T, Zhan H, Reid I (2018) Efficient dense point cloud object reconstruction using deformation vector fields. In: Proceedings of the European Conference on Computer Vision, pp 497–513

  64. Lim JJ, Pirsiavash H, Torralba A (2013) Parsing ikea objects: Fine pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2992–2999

  65. Lim B, Son S, Kim H, Nah S, Lee KM (2017) Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 136–144

  66. Lin CH, Kong C, Lucey S (2018) Learning efficient point cloud generated for dense 3D object reconstruction. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pp 7114–7121

  67. Liu S, Li T, Chen W, Li H (2019) Soft rasterizer: a differentiable renderer for image-based 3D reasoning. arXiv preprint arXiv:1904.01786

  68. Loh AM, Hartley RI (2005) Shape from non-homogeneous, non-stationary, anisotropic, perspective texture. In: Proceedings of the 2005 British Machine Vision Conference, pp 5:69–78

  69. Lun Z, Gadelha M, Kalogerakis E, Maji S, Wang R (2017) 3D shape reconstruction from sketches via multi-view convolutional networks. In: Proceedings of the International Conference on 3D Vision, pp 67–77

  70. Mandikal P, Radhakrishnan VB (2019) Dense 3D point cloud reconstruction using a deep pyramid network. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp 1052–1060

  71. Mandikal P, Murthy N, Agarwal M, Babu RV (2018) 3D-lmnet: latent embedding matching for accurate and diverse 3D point cloud reconstruction from a single image. arXiv preprint arXiv:1807.07796

  72. Mescheder L, Oechsle M, Niemeyer M, Nowozin S, Geiger A (2019) Occupancy networks: learning 3D reconstruction in function space. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4460–4470

  73. Michalkiewicz M, Pontes JK, Jack D, Baktashmotlagh M, Eriksson A (2019) Deep level sets: implicit surface representations for 3D Shape inference. arXiv preprint arXiv:1901.06802

  74. Montefusco LB, Lazzaro D, Papi S, Guerrini C (2010) A fast compressed sensing approach to 3D MR image reconstruction. IEEE Trans Med Imaging 30(5):1064–1075

    Article  Google Scholar 

  75. Navaneet KL, Mandikal P, Agarwal M, Babu RV (2019) CAPNet: continuous approximation projection for 3D point cloud reconstruction using 2d supervision. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 33:8819–8826

  76. Niu C, Li J, Xu K (2018) Im2struct: recovering 3D shape structure from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4521–4529

  77. Omran M, Lassner C, Pons-Moll G, Gehler P, Schiele B (2018) Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: Proceedings of the International Conference on 3D Vision, pp 484–494

  78. Oswald MR, Töppe E, Nieuwenhuis C, Cremers D (2013) A review of geometry recovery from a single image focusing on curved object reconstruction. Innovations for Shape Analysis, pp 343–378

  79. Park JJ, Florence P, Straub J, Newcombe R, Lovegrove S (2019) Deepsdf: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 165–174

  80. Pavlakos G, Zhou X, Derpanis KG, Daniilidis K (2017) Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7025–7034

  81. Pavlakos G, Zhu L, Zhou X, Daniilidis K (2018) Learning to estimate 3D human pose and shape from a single color image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 459–468

  82. Pollefeys M, Koch R, Vergauwen M, Gool LV (2000) Automated reconstruction of 3D scenes from sequences of images. ISPRS J Photogramm Remote Sens 55(4):251–267

    Article  Google Scholar 

  83. Pontes JK, Kong C, Sridharan S, Lucey S, Eriksson A, Fookes C (2018) Image2mesh: a learning framework for single image 3D reconstruction. In: Proceedings of the Asian Conference on Computer Vision, pp 365–381

  84. Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the Advances in Neural Information Processing Systems, pp 5099–5108

  85. Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv: 1511.06434

  86. Rezende DJ, Eslami SMA, Mohamed S, Battaglia P, Jaderberg M, Heess N (2016) Unsupervised learning of 3D structure from images. In: Proceedings of the Advances in Neural Information Processing Systems, pp 4996–5004

  87. Richardson E, SelaLUN M, Or-EI R, Kimmel R (2017) Learning detailed face reconstruction from a single image. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1259 – 126

  88. Richter SR, Roth S (2018) Matryoshka networks: predicting 3D geometry via nested shape layers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1936–1944

  89. Riegler G, Ulusoy AO, Geiger A (2017) Octnet: learning deep 3D representations at high resolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3577–3586

  90. Riegler G, Ulusoy AO, Bischof H, Geiger A (2017) Octnetfusion: learning depth fusion from data. In: Proceedings of the International Conference on 3D Vision, pp 57–66

  91. Rock J, Gupta T, Thorsen J, Gwak JY, Shin D, Hoiem D (2015) Completing 3D object shape from one depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2484–2493

  92. Samaras D, Metaxas D, Fua P, Leclerc YG (2000) Variable albedo surface reconstruction from stereo and shape from shading. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1:480–487

  93. Saxena A, Sun M, Ng AY (2008) Make3D: learning 3D scene structure from a single still image. IEEE Trans Pattern Anal Mach Intell 31(5):824–840

    Article  Google Scholar 

  94. Scarselli F, Gori M, Tsoi AC (2009) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80

    Article  Google Scholar 

  95. Schönberger JL, Zheng E, Frahm JM, Pollefeys M (2016) Pixelwise view selection for unstructured multi-view stereo. In: Proceedings of the European Conference on Computer Vision, pp 501–518

  96. Sharma S, Kumar V (2020) Voxel-based 3D face reconstruction and its application to face recognition using sequential deep learning. Multimedia Tools and Applications 1–28

  97. Sharma A, Grau O, Fritz M (2016) Vconv-dae: deep volumetric shape learning without object labels. In: Proceedings of the European Conference on Computer Vision, pp 236–250

  98. Shen W, Jia Y, Wu Y (2019) 3D Shape reconstruction from images in the frequency domain. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4471–4479

  99. Shin D, Fowlkes CC, Hoiem D (2018) Pixels, voxels, and views: a study of shape representations for single view 3D object shape prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3061–3069

  100. Shin D, Ren Z, Sudderth EB, Fowlkes CC (2019) Multi-layer depth and epipolar feature transformers for 3D scene reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 39–43

  101. Sinha A, Unmesh A, Huang Q, Ramani K (2017) Surfnet: generating 3D shape surfaces using deep residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6040–6049

  102. Smith E, Meger D (2017) Improved adversarial systems for 3D object generated and reconstruction. arXiv preprint arXiv:1707.09557

  103. Smith E, Fujimoto S, Meger D (2018) Multi-view silhouette and depth decomposition for high resolution 3D object representation. In: Proceedings of the Advances in Neural Information Processing Systems, pp 6479–6489

  104. Smith EJ, Fujimoto S, Romero A, Meger D (2019) GEOMetrics: exploiting geometric structure for graph-encoded objects. arXiv preprint arXiv:1901.11461

  105. Soltani AA, Huang H, Wu J, Kulkarni TD, Tenenbaum JB (2017) Synthesizing 3D shapes via modeling multi-view depth maps and silhouettes with deep generative networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1511–1519

  106. Song S, Xiao J (2014) Sliding shapes for 3D object detection in depth images. In: Proceedings of the European Conference on Computer Vision, pp 634–651

  107. Song HO, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4004–4012

  108. Song S, Yu F, Zeng A, Chang AX, Savva M, Funkhouser T (2017) Semantic scene completion from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 190–198

  109. Sra M, Garrido-Jurado S, Schmandt C, Maes P (2016) Procedurally generated virtual reality from 3D reconstructed physical space. In: Proceedings of the 22nd ACM Conference on Virtual Reality Software and Technology, pp 191–200

  110. Sun X, Wu J, Zhang X et al (2018) Pix3D: dataset and methods for single-image 3D shape modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2974–2983

  111. Tatarchenko M, Dosovitskiy A, Brox T (2017) Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2088–2096

  112. Tatarchenko M, Richter SR, Ranftl R, Li Z, Koltun V, Brox T (2019) What do single-view 3D reconstruction networks learn?. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3405–3414

  113. Tchapmi LP, Kosaraju V, Rezatofighi H, Reid I, Savarese S (2019) TopNet: structural point cloud decoder. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 383–392

  114. Tran L, Liu X (2018) Nonlinear 3D face morphable model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7346–7355

  115. Tulsiani S, Zhou T, Efros AA, Malik J (2017) Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2626–2634

  116. Tulsiani S, Su H, Guibas LJ, Efros A, Malik J (2017) Learning shape abstractions by assembling volumetric primitives. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2635–2643

  117. Varol G, Ceylan D, Russell B et al (2018) Bodynet: volumetric inference of 3D human body shapes. In: Proceedings of the European Conference on Computer Vision, pp 20–36

  118. Wang F, Jiang MQ, Qian C et al (2017) Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3156–3164

  119. Wang PS, Liu Y, Guo YX, Sun CY, Tong X (2017) O-cnn: octree-based convolutional neural networks for 3D shape analysis. ACM Trans Graph 36(4):72–81

    Google Scholar 

  120. Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang YG (2018) Pixel2mesh: generating 3D mesh models from single rgb images. In: Proceedings of the European Conference on Computer Vision, pp 55–71

  121. Wang PS, Sun CY, Liu Y, Tong X (2018) Adaptive o-cnn: a patch-based deep representation of 3D shapes. ACM Trans Graph 37(6):1–11

    Google Scholar 

  122. Wang H, Yang J, Liang W, Tong X (2019) Deep single-view 3D object reconstruction with visual hull embedding. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 33:8941–8948

  123. Wang W, Ceylan D, Mech R, Neumann U (2019) 3DN: 3D deformation network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1038–1046

  124. Wang WY, Xu Q, Ceylan D, Mech R, Neumann U (2019) Disn: deep implicit Surface network for high-quality single-view 3D reconstruction. arXiv preprint arXiv:1905.10711

  125. Wei Y, Liu S, Zhao W, Lu J (2019) Conditional single-view shape generated for multi-view stereo reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9651–9660

  126. Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: Proceedings of the European Conference on Computer Vision, pp 499–515

  127. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1912–1920

  128. Wu J, Zhang C, Xue T, Freeman B, Tenenbaum J (2016) Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Proceedings of the Advances in Neural Information Processing systems, pp 82–90

  129. Wu J, Wang Y, Xue T, Sun X, Freeman B, Tenenbaum J (2017) Marrnet: 3D shape reconstruction via 2.5D sketches. In: Proceedings of the Advances in Neural Information Processing Systems, pp 8–15

  130. Wu J, Zhang C, Zhang X, Zhang Z, Freeman WT, Tenenbaum JB (2018) Learning shape priors for single-view 3D completion and reconstruction. In: Proceedings of the European Conference on Computer Vision, pp 673–691

  131. Wu Y, He F, Zhang D, Li X (2018) Service-oriented feature-based data exchange for cloud-based design and manufacturing. IEEE Trans Serv Comput 11(2):341–353

    Article  Google Scholar 

  132. Wu Y, He F, Yang Y (2020) A grid-based secure product data exchange for cloud-based collaborative design. Int J Coop Inf Syst 29(01n02):2040006

    Article  Google Scholar 

  133. Xiang Y, Mottaghi R, Savarese S (2014) Beyond pascal: a benchmark for 3D object detection in the wild. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp 75–82

  134. Xiang Y, Kim W, Chen W et al (2016) Objectnet3D: a large scale database for 3D object recognition. In: Proceedings of the European Conference on Computer Vision, pp 160–176

  135. Xiao J, Ehinger KA, Hays J, Torralba A, Oliva A (2016) Sun database: exploring a large collection of scene categories. Int J Comput Vis 119(1):3–22

    Article  MathSciNet  Google Scholar 

  136. Xie H, Yao H, Sun X, Zhou S, Zhang S (2019) Pix2Vox: context-aware 3D reconstruction from single and multi-view images. arXiv preprint arXiv:1901.11153

  137. Yan X, Yang J, Yumer E, Guo Y, Lee H (2016) Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In: Proceedings of the Advances in Neural Information Processing Systems, pp 1696–1704

  138. Yang X, Wang Y, Wang Y et al (2018) Active object reconstruction using a guided view planner. arXiv preprint arXiv:1805.03081

  139. Yang Y, Feng C, Shen Y, Tian D (2018) Foldingnet: point cloud auto-encoder via deep grid deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 206–215

  140. Yang B, Lai Z, Lu X et al (2018) Learning 3D scene semantics and structure from a single depth image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 309–312

  141. Yang B, Wang S, Markham A, Trigoni N (2020) Robust attentional aggregation of deep feature sets for multi-view 3D reconstruction. Int J Comput Vis 128(1):53–73

    Article  MathSciNet  Google Scholar 

  142. Yu L, Li X, Fu CW, Cohen-Or D, Heng PA (2018) Pu-net: point cloud upsampling network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2790–2799

  143. Yuniarti A, Suciati N (2019) A Review of Deep Learning Techniques for 3D Reconstruction of 2D Images. In: Proceedings of the 2019 12th International Conference on Information & Communication Technology and System, pp 327–331

  144. Zeng N, Zhang H, Song B, Liu W, Li Y, Dobaie AM (2018) Facial expression recognition via learning deep sparse autoencoders. Neurocomputing 273:643–649

    Article  Google Scholar 

  145. Zeng W, Karaoglu S, Gevers T (2018) Inferring Point Clouds from Single Monocular Images by Depth Intermediation. arXiv preprint arXiv:1812.01402

  146. Zhang D, He F, Han S, Li X (2016) Quantitative optimization of interoperability during feature-based data exchange. Integr Comput Aided Eng 23(1):31–50

    Article  Google Scholar 

  147. Zhang J, Li K, Liang Y, Li N (2017) Learning 3D faces from 2D images via stacked contractive autoencoder. Neurocomputing 257:67–78

    Article  Google Scholar 

  148. Zhang X, Zhang Z, Zhang C, Tenenbaum J, Freeman B, Wu J (2018) Learning to reconstruct shapes from unseen classes. In: Proceedings of the Advances in Neural Information Processing Systems, pp 2257–2268

  149. Zhao R, Wang Y, Benitez-Quiroz CF, Liu Y, Martinez M (2016) Fast and precise face alignment and 3D shape reconstruction from a single 2D image. In: Proceedings of the European Conference on Computer Vision, pp 590–603

  150. Zheng Z, Yu T, Wei Y, Dai Q, Liu Y (2019) Deephuman: 3D human reconstruction from a single image. In: Proceedings of the IEEE International Conference on Computer Vision, pp 7739–7749

  151. Zhu H, Zuo X, Wang S, Cao X, Yang R (2019) Detailed human shape estimation from a single image by hierarchical mesh deformation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4491–4500

  152. Zou C, Yumer E, Yang J, Ceylan D, Hoiem D (2017) 3D-prnn: generating shape primitives with recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 900–909

Download references

Acknowledgements

The authors are highly thankful to the Development Research Center of Guangxi Relatively Sparse-populated Minorities (ID: GXRKJSZ201901), to the Natural Science Foundation of Guangxi Province (NO.2018GXNSFAA281164),This research was financially supported by the project of outstanding thousand young teachers’ training in higher education institutions of Guangxi, Guangxi Colleges and Universities Key Laboratory Breeding Base of System Control and Information Processing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiansheng Peng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Kui Fu and Jiansheng Peng contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, K., Peng, J., He, Q. et al. Single image 3D object reconstruction based on deep learning: A review. Multimed Tools Appl 80, 463–498 (2021). https://doi.org/10.1007/s11042-020-09722-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09722-8

Keywords

Navigation