Abstract
3D model reconstruction from single 2D RGB images is a challenging and actively researched computer vision task. Several techniques based on conventional network architectures have been proposed for the same. However, the body of research work is limited and there are some issues like using inefficient 3D representation formats, weak 3D model reconstruction backbones, inability to reconstruct dense point clouds, dependence of post-processing for reconstruction of dense point clouds and dependence on silhouettes in RGB images. In this paper, a new 2D RGB image to point cloud conversion technique is proposed, which improves the state-of-the-art in the field due to its efficient, robust and simple model by using the concept of parallelization in network architecture. It not only uses efficient and rich 3D representation of point clouds, but also uses a new robust point cloud reconstruction backbone to address the prevalent issues. This involves using a single-encoder multiple-decoder deep network architecture wherein each decoder reconstructs certain fixed viewpoints. This is followed by fusing all the viewpoints to reconstruct a dense point cloud. Various experiments are conducted to evaluate the proposed technique and to compare its performance with those of the state-of-the-arts and impressive gains in performance are demonstrated.
Similar content being viewed by others
Data availability
The code for the paper is available online at: https://github.com/mueedhafiz1982/Point-cloud-generation-from-2D-image.git
References
Mandikal P, Navaneet KL, Agarwal M, Babu RV (2019) 3D-LMNet: Latent embedding matching for accurate and diverse 3D point cloud reconstruction from a single image. arXiv: 1807.07796
Mandikal P, Radhakrishnan VB (2019) Dense 3D point cloud reconstruction using a deep pyramid network. In IEEE winter conference on applications of computer vision, pp 1052–1060, Waikoloa, HI, USA, IEEE
Lin C-H, Kong C, Lucey S (2018) Learning efficient point cloud generation for dense 3D object reconstruction. In AAAI Conference on artificial intelligence, vol 32, pp 7114–7121, New Orleans, Louisiana, USA, Apr AAAI Press
Fan H, Su H, Guibas L (2017) A point set generation network for 3D object reconstruction from a single image. In IEEE conference on computer vision and pattern recognition, pp 2463–2471, Honolulu, HI, USA, IEEE
Kim H, Yeo C, Cha M, Mun D (2021) A method of generating depth images for view-based shape retrieval of 3D CAD models from partial point clouds. Multimed Tools Appl 80(7):10859–10880
Kui F, Peng J, He Q, Zhang H (2021) Single image 3D object reconstruction based on deep learning: a review. Multimed Tools Appl 80(1):463–498
Wang L, Yang B, Ajith Abraham L, Qi XZ, Chen Z (2014) Construction of dynamic three-dimensional microstructure for the hydration of cement using 3D image registration. Pattern Anal Appl 17(3):655–665
Hu T, Lin G, Han Z, Zwicker M (2021) Learning to generate dense point clouds with textures on multiple categories. In IEEE winter conference on applications of computer vision (WACV), pp 2170–2179
Li Y, Baciu G (2021) HSGAN: hierarchical graph learning for point cloud generation. IEEE Trans Image Process 30:4540–4554
Meng Q, Wang W, Zhou T, Shen J, Jia Y, Van Gool L (2021) Towards a weakly supervised framework for 3D point cloud object detection and annotation. IEEE Trans Pattern Anal Mach Intell 44(8):4454–4468
Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang Y-G (2018) Pixel2mesh: generating 3D mesh models from single RGB images. European conference on computer vision. Springer, Cham, pp 55–71
Liu S, Chen W, Li T, Li H (2019) Soft rasterizer: a differentiable renderer for image-based 3D reasoning. In International conference on computer vision, pp 7707–7716
Choy CB, Xu D, Gwak J, Chen K, Savarese S (2016) 3D–R2N2: a unified approach for single and multi-view 3D object reconstruction. European conference on computer vision. Springer, Cham, pp 628–644
Zou C, Hoiem D (2020) Silhouette guided point cloud reconstruction beyond occlusion. In IEEE winter conference on applications of computer vision, pp 41–50
Hassaballah M, Awad AI (2020) Deep learning in computer vision: principles and applications. CRC Press
Hafiz AM, Bhat RA, Hassaballah M (2022) Image classification using convolutional neural network tree ensembles. Multimed Tools Appl 82(3):1–18
Hafiz AM, Hassaballah M (2021) Digit image recognition using an ensemble of one-versus-all deep network classifiers. In: Shamim Kaiser M, Xie J, Rathore VS (eds) Information and communication technology for competitive strategies (ICTCS 2020). pp. Springer Singapore, Singapore, pp 445–455
Chang AX, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H, Xiao J, Yi L, Yu F (2015) Shapenet: an information-rich 3D model repository. arXiv: 1512.03012
Tatarchenko M, Dosovitskiy A, Brox T (2016) Multi-view 3D models from single images with a convolutional network. In European conference on computer vision, pp 322–337, Springer
Yan X, Yang J, Yumer E, Guo Y, Lee H (2016) Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In 30th international conference on neural information processing systems, pp 1704–1712
Tulsiani S, Efros AA, Malik J (2018) Multi-view consistency as supervisory signal for learning shape and pose prediction. In IEEE conference on computer vision and pattern recognition, pp 2897–2905
Häne C, Tulsiani S, Malik J (2017) Hierarchical surface prediction for 3D object reconstruction. In International conference on 3D vision (3DV), pp 412–420
Tatarchenko M, Dosovitskiy A, Brox T (2017) Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In IEEE International conference on computer vision, pp 2107–2115
Qi CR, Yi L, Su H, Guibas LJ(2017) PointNet++: deep hierarchical feature learning on point sets in a metric space. In 31st international conference on neural information processing systems, NIPS’17, pp 5105–5114, Red Hook, NY, USA, Curran Associates Inc
Zeng W, Karaoglu S, Gevers T (2020) Inferring point clouds from single monocular images by depth intermediation. arXiv: 1812.01402
Kanazawa A, Tulsiani S, Efros AA, Malik J (2018) Learning category-specific mesh reconstruction from image collections. European conference on computer vision. Springer, Cham, pp 386–402
Sridhar S, Rempe D (2019) Multiview aggregation for learning category-specific shape reconstruction. Advances in Neural Information Processing Systems (NeurIPS)
Zhu JY, Zhang Z, Zhang C, Wu J, Torralba A, Tenenbaum JB, Freeman WT (2018) Visual object networks: Image generation with disentangled 3D representation. In 32nd International conference on neural information processing systems, pp 118—129
Zhang X, Zhang Z, Zhang C, Tenenbaum JB, Freeman WT, Wu J (2018) Learning to reconstruct shapes from unseen classes. 32nd International conference on neural information processing systems. Montréal, Canada, pp 2263–2274
Sun X, Wu J, Zhang X, Zhang Z, Zhang C, Xue T, Tenenbaum JB, Freeman WT (2018) Pix3D: dataset and methods for single-image 3D shape modeling. In IEEE Conference on computer vision and pattern recognition, pp 2974–2983
Julia Navarro, Neus Sabater (2021) Learning occlusion-aware view synthesis for light fields. Pattern Anal Appl 24(3):1319–1334
Dai A, Qi CR, Niebner M (2017) Shape completion using 3D-encoder-predictor CNNs and shape synthesis. In IEEE Conference on computer vision and pattern recognition, pp 6545–6554
Yuan W , Khot T, Held D, Mertz C, Hebert M (2018) Pcn: point completion network. In International conference on 3D vision, pp 728–737
Yang Y, Feng C, Shen Y, Tian D (2018) Foldingnet: point cloud auto-encoder via deep grid deformation. In IEEE Conference on computer vision and pattern recognition, pp 206–215
Achlioptas P, Diamanti O, Mitliagkas I, Guibas L (2018) Learning representations and generative models for 3D point clouds. In 35th International Conference on Machine Learning, vol 80, pp 40–49
Charles R, Su H, Kaichun M, Guibas LJ (2017) PointNet: deep learning on point sets for 3D classification and segmentation. In IEEE Conference on computer vision and pattern recognition, pp 77–85, Los Alamitos, CA, USA
Horn Berthold KP (1987) Closed-form solution of absolute orientation using unit quaternions. J Opt Soc Am A 4(4):629–642
Kingma Diederik P, Jimmy B (2015) Adam: a method for stochastic optimization. In 3rd International conference on learning representations, pp 1–15
Funding
The work is not funded.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hafiz, A.M., Bhat, R.U.A., Parah, S.A. et al. SE-MD: a single-encoder multiple-decoder deep network for point cloud reconstruction from 2D images. Pattern Anal Applic 26, 1291–1302 (2023). https://doi.org/10.1007/s10044-023-01155-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-023-01155-x