Skip to main content
Log in

SE-MD: a single-encoder multiple-decoder deep network for point cloud reconstruction from 2D images

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

3D model reconstruction from single 2D RGB images is a challenging and actively researched computer vision task. Several techniques based on conventional network architectures have been proposed for the same. However, the body of research work is limited and there are some issues like using inefficient 3D representation formats, weak 3D model reconstruction backbones, inability to reconstruct dense point clouds, dependence of post-processing for reconstruction of dense point clouds and dependence on silhouettes in RGB images. In this paper, a new 2D RGB image to point cloud conversion technique is proposed, which improves the state-of-the-art in the field due to its efficient, robust and simple model by using the concept of parallelization in network architecture. It not only uses efficient and rich 3D representation of point clouds, but also uses a new robust point cloud reconstruction backbone to address the prevalent issues. This involves using a single-encoder multiple-decoder deep network architecture wherein each decoder reconstructs certain fixed viewpoints. This is followed by fusing all the viewpoints to reconstruct a dense point cloud. Various experiments are conducted to evaluate the proposed technique and to compare its performance with those of the state-of-the-arts and impressive gains in performance are demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The code for the paper is available online at: https://github.com/mueedhafiz1982/Point-cloud-generation-from-2D-image.git

References

  1. Mandikal P, Navaneet KL, Agarwal M, Babu RV (2019) 3D-LMNet: Latent embedding matching for accurate and diverse 3D point cloud reconstruction from a single image. arXiv: 1807.07796

  2. Mandikal P, Radhakrishnan VB (2019) Dense 3D point cloud reconstruction using a deep pyramid network. In IEEE winter conference on applications of computer vision, pp 1052–1060, Waikoloa, HI, USA, IEEE

  3. Lin C-H, Kong C, Lucey S (2018) Learning efficient point cloud generation for dense 3D object reconstruction. In AAAI Conference on artificial intelligence, vol 32, pp 7114–7121, New Orleans, Louisiana, USA, Apr AAAI Press

  4. Fan H, Su H, Guibas L (2017) A point set generation network for 3D object reconstruction from a single image. In IEEE conference on computer vision and pattern recognition, pp 2463–2471, Honolulu, HI, USA, IEEE

  5. Kim H, Yeo C, Cha M, Mun D (2021) A method of generating depth images for view-based shape retrieval of 3D CAD models from partial point clouds. Multimed Tools Appl 80(7):10859–10880

    Article  Google Scholar 

  6. Kui F, Peng J, He Q, Zhang H (2021) Single image 3D object reconstruction based on deep learning: a review. Multimed Tools Appl 80(1):463–498

    Article  Google Scholar 

  7. Wang L, Yang B, Ajith Abraham L, Qi XZ, Chen Z (2014) Construction of dynamic three-dimensional microstructure for the hydration of cement using 3D image registration. Pattern Anal Appl 17(3):655–665

    Article  MathSciNet  Google Scholar 

  8. Hu T, Lin G, Han Z, Zwicker M (2021) Learning to generate dense point clouds with textures on multiple categories. In IEEE winter conference on applications of computer vision (WACV), pp 2170–2179

  9. Li Y, Baciu G (2021) HSGAN: hierarchical graph learning for point cloud generation. IEEE Trans Image Process 30:4540–4554

    Article  MathSciNet  Google Scholar 

  10. Meng Q, Wang W, Zhou T, Shen J, Jia Y, Van Gool L (2021) Towards a weakly supervised framework for 3D point cloud object detection and annotation. IEEE Trans Pattern Anal Mach Intell 44(8):4454–4468

    Google Scholar 

  11. Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang Y-G (2018) Pixel2mesh: generating 3D mesh models from single RGB images. European conference on computer vision. Springer, Cham, pp 55–71

    Google Scholar 

  12. Liu S, Chen W, Li T, Li H (2019) Soft rasterizer: a differentiable renderer for image-based 3D reasoning. In International conference on computer vision, pp 7707–7716

  13. Choy CB, Xu D, Gwak J, Chen K, Savarese S (2016) 3D–R2N2: a unified approach for single and multi-view 3D object reconstruction. European conference on computer vision. Springer, Cham, pp 628–644

    Google Scholar 

  14. Zou C, Hoiem D (2020) Silhouette guided point cloud reconstruction beyond occlusion. In IEEE winter conference on applications of computer vision, pp 41–50

  15. Hassaballah M, Awad AI (2020) Deep learning in computer vision: principles and applications. CRC Press

  16. Hafiz AM, Bhat RA, Hassaballah M (2022) Image classification using convolutional neural network tree ensembles. Multimed Tools Appl 82(3):1–18

    Google Scholar 

  17. Hafiz AM, Hassaballah M (2021) Digit image recognition using an ensemble of one-versus-all deep network classifiers. In: Shamim Kaiser M, Xie J, Rathore VS (eds) Information and communication technology for competitive strategies (ICTCS 2020). pp. Springer Singapore, Singapore, pp 445–455

    Chapter  Google Scholar 

  18. Chang AX, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H, Xiao J, Yi L, Yu F (2015) Shapenet: an information-rich 3D model repository. arXiv: 1512.03012

  19. Tatarchenko M, Dosovitskiy A, Brox T (2016) Multi-view 3D models from single images with a convolutional network. In European conference on computer vision, pp 322–337, Springer

  20. Yan X, Yang J, Yumer E, Guo Y, Lee H (2016) Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In 30th international conference on neural information processing systems, pp 1704–1712

  21. Tulsiani S, Efros AA, Malik J (2018) Multi-view consistency as supervisory signal for learning shape and pose prediction. In IEEE conference on computer vision and pattern recognition, pp 2897–2905

  22. Häne C, Tulsiani S, Malik J (2017) Hierarchical surface prediction for 3D object reconstruction. In International conference on 3D vision (3DV), pp 412–420

  23. Tatarchenko M, Dosovitskiy A, Brox T (2017) Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In IEEE International conference on computer vision, pp 2107–2115

  24. Qi CR, Yi L, Su H, Guibas LJ(2017) PointNet++: deep hierarchical feature learning on point sets in a metric space. In 31st international conference on neural information processing systems, NIPS’17, pp 5105–5114, Red Hook, NY, USA, Curran Associates Inc

  25. Zeng W, Karaoglu S, Gevers T (2020) Inferring point clouds from single monocular images by depth intermediation. arXiv: 1812.01402

  26. Kanazawa A, Tulsiani S, Efros AA, Malik J (2018) Learning category-specific mesh reconstruction from image collections. European conference on computer vision. Springer, Cham, pp 386–402

    Google Scholar 

  27. Sridhar S, Rempe D (2019) Multiview aggregation for learning category-specific shape reconstruction. Advances in Neural Information Processing Systems (NeurIPS)

  28. Zhu JY, Zhang Z, Zhang C, Wu J, Torralba A, Tenenbaum JB, Freeman WT (2018) Visual object networks: Image generation with disentangled 3D representation. In 32nd International conference on neural information processing systems, pp 118—129

  29. Zhang X, Zhang Z, Zhang C, Tenenbaum JB, Freeman WT, Wu J (2018) Learning to reconstruct shapes from unseen classes. 32nd International conference on neural information processing systems. Montréal, Canada, pp 2263–2274

    Google Scholar 

  30. Sun X, Wu J, Zhang X, Zhang Z, Zhang C, Xue T, Tenenbaum JB, Freeman WT (2018) Pix3D: dataset and methods for single-image 3D shape modeling. In IEEE Conference on computer vision and pattern recognition, pp 2974–2983

  31. Julia Navarro, Neus Sabater (2021) Learning occlusion-aware view synthesis for light fields. Pattern Anal Appl 24(3):1319–1334

    Article  Google Scholar 

  32. Dai A, Qi CR, Niebner M (2017) Shape completion using 3D-encoder-predictor CNNs and shape synthesis. In IEEE Conference on computer vision and pattern recognition, pp 6545–6554

  33. Yuan W , Khot T, Held D, Mertz C, Hebert M (2018) Pcn: point completion network. In International conference on 3D vision, pp 728–737

  34. Yang Y, Feng C, Shen Y, Tian D (2018) Foldingnet: point cloud auto-encoder via deep grid deformation. In IEEE Conference on computer vision and pattern recognition, pp 206–215

  35. Achlioptas P, Diamanti O, Mitliagkas I, Guibas L (2018) Learning representations and generative models for 3D point clouds. In 35th International Conference on Machine Learning, vol 80, pp 40–49

  36. Charles R, Su H, Kaichun M, Guibas LJ (2017) PointNet: deep learning on point sets for 3D classification and segmentation. In IEEE Conference on computer vision and pattern recognition, pp 77–85, Los Alamitos, CA, USA

  37. Horn Berthold KP (1987) Closed-form solution of absolute orientation using unit quaternions. J Opt Soc Am A 4(4):629–642

    Article  Google Scholar 

  38. Kingma Diederik P, Jimmy B (2015) Adam: a method for stochastic optimization. In 3rd International conference on learning representations, pp 1–15

Download references

Funding

The work is not funded.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdul Mueed Hafiz.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hafiz, A.M., Bhat, R.U.A., Parah, S.A. et al. SE-MD: a single-encoder multiple-decoder deep network for point cloud reconstruction from 2D images. Pattern Anal Applic 26, 1291–1302 (2023). https://doi.org/10.1007/s10044-023-01155-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-023-01155-x

Keywords

Navigation