SE-MD: a single-encoder multiple-decoder deep network for point cloud reconstruction from 2D images

Hafiz, Abdul Mueed; Bhat, Rouf Ul Alam; Parah, Shabir Ahmad; Hassaballah, M.

doi:10.1007/s10044-023-01155-x

SE-MD: a single-encoder multiple-decoder deep network for point cloud reconstruction from 2D images

Theoretical Advances
Published: 04 April 2023

Volume 26, pages 1291–1302, (2023)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Abdul Mueed Hafiz ORCID: orcid.org/0000-0002-2266-3708¹,
Rouf Ul Alam Bhat¹,
Shabir Ahmad Parah² &
…
M. Hassaballah^3,4

288 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

3D model reconstruction from single 2D RGB images is a challenging and actively researched computer vision task. Several techniques based on conventional network architectures have been proposed for the same. However, the body of research work is limited and there are some issues like using inefficient 3D representation formats, weak 3D model reconstruction backbones, inability to reconstruct dense point clouds, dependence of post-processing for reconstruction of dense point clouds and dependence on silhouettes in RGB images. In this paper, a new 2D RGB image to point cloud conversion technique is proposed, which improves the state-of-the-art in the field due to its efficient, robust and simple model by using the concept of parallelization in network architecture. It not only uses efficient and rich 3D representation of point clouds, but also uses a new robust point cloud reconstruction backbone to address the prevalent issues. This involves using a single-encoder multiple-decoder deep network architecture wherein each decoder reconstructs certain fixed viewpoints. This is followed by fusing all the viewpoints to reconstruct a dense point cloud. Various experiments are conducted to evaluate the proposed technique and to compare its performance with those of the state-of-the-arts and impressive gains in performance are demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Article 23 March 2023

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

Data availability

The code for the paper is available online at: https://github.com/mueedhafiz1982/Point-cloud-generation-from-2D-image.git

References

Mandikal P, Navaneet KL, Agarwal M, Babu RV (2019) 3D-LMNet: Latent embedding matching for accurate and diverse 3D point cloud reconstruction from a single image. arXiv: 1807.07796
Mandikal P, Radhakrishnan VB (2019) Dense 3D point cloud reconstruction using a deep pyramid network. In IEEE winter conference on applications of computer vision, pp 1052–1060, Waikoloa, HI, USA, IEEE
Lin C-H, Kong C, Lucey S (2018) Learning efficient point cloud generation for dense 3D object reconstruction. In AAAI Conference on artificial intelligence, vol 32, pp 7114–7121, New Orleans, Louisiana, USA, Apr AAAI Press
Fan H, Su H, Guibas L (2017) A point set generation network for 3D object reconstruction from a single image. In IEEE conference on computer vision and pattern recognition, pp 2463–2471, Honolulu, HI, USA, IEEE
Kim H, Yeo C, Cha M, Mun D (2021) A method of generating depth images for view-based shape retrieval of 3D CAD models from partial point clouds. Multimed Tools Appl 80(7):10859–10880
Article Google Scholar
Kui F, Peng J, He Q, Zhang H (2021) Single image 3D object reconstruction based on deep learning: a review. Multimed Tools Appl 80(1):463–498
Article Google Scholar
Wang L, Yang B, Ajith Abraham L, Qi XZ, Chen Z (2014) Construction of dynamic three-dimensional microstructure for the hydration of cement using 3D image registration. Pattern Anal Appl 17(3):655–665
Article MathSciNet Google Scholar
Hu T, Lin G, Han Z, Zwicker M (2021) Learning to generate dense point clouds with textures on multiple categories. In IEEE winter conference on applications of computer vision (WACV), pp 2170–2179
Li Y, Baciu G (2021) HSGAN: hierarchical graph learning for point cloud generation. IEEE Trans Image Process 30:4540–4554
Article MathSciNet Google Scholar
Meng Q, Wang W, Zhou T, Shen J, Jia Y, Van Gool L (2021) Towards a weakly supervised framework for 3D point cloud object detection and annotation. IEEE Trans Pattern Anal Mach Intell 44(8):4454–4468
Google Scholar
Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang Y-G (2018) Pixel2mesh: generating 3D mesh models from single RGB images. European conference on computer vision. Springer, Cham, pp 55–71
Google Scholar
Liu S, Chen W, Li T, Li H (2019) Soft rasterizer: a differentiable renderer for image-based 3D reasoning. In International conference on computer vision, pp 7707–7716
Choy CB, Xu D, Gwak J, Chen K, Savarese S (2016) 3D–R2N2: a unified approach for single and multi-view 3D object reconstruction. European conference on computer vision. Springer, Cham, pp 628–644
Google Scholar
Zou C, Hoiem D (2020) Silhouette guided point cloud reconstruction beyond occlusion. In IEEE winter conference on applications of computer vision, pp 41–50
Hassaballah M, Awad AI (2020) Deep learning in computer vision: principles and applications. CRC Press
Hafiz AM, Bhat RA, Hassaballah M (2022) Image classification using convolutional neural network tree ensembles. Multimed Tools Appl 82(3):1–18
Google Scholar
Hafiz AM, Hassaballah M (2021) Digit image recognition using an ensemble of one-versus-all deep network classifiers. In: Shamim Kaiser M, Xie J, Rathore VS (eds) Information and communication technology for competitive strategies (ICTCS 2020). pp. Springer Singapore, Singapore, pp 445–455
Chapter Google Scholar
Chang AX, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H, Xiao J, Yi L, Yu F (2015) Shapenet: an information-rich 3D model repository. arXiv: 1512.03012
Tatarchenko M, Dosovitskiy A, Brox T (2016) Multi-view 3D models from single images with a convolutional network. In European conference on computer vision, pp 322–337, Springer
Yan X, Yang J, Yumer E, Guo Y, Lee H (2016) Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In 30th international conference on neural information processing systems, pp 1704–1712
Tulsiani S, Efros AA, Malik J (2018) Multi-view consistency as supervisory signal for learning shape and pose prediction. In IEEE conference on computer vision and pattern recognition, pp 2897–2905
Häne C, Tulsiani S, Malik J (2017) Hierarchical surface prediction for 3D object reconstruction. In International conference on 3D vision (3DV), pp 412–420
Tatarchenko M, Dosovitskiy A, Brox T (2017) Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In IEEE International conference on computer vision, pp 2107–2115
Qi CR, Yi L, Su H, Guibas LJ(2017) PointNet++: deep hierarchical feature learning on point sets in a metric space. In 31st international conference on neural information processing systems, NIPS’17, pp 5105–5114, Red Hook, NY, USA, Curran Associates Inc
Zeng W, Karaoglu S, Gevers T (2020) Inferring point clouds from single monocular images by depth intermediation. arXiv: 1812.01402
Kanazawa A, Tulsiani S, Efros AA, Malik J (2018) Learning category-specific mesh reconstruction from image collections. European conference on computer vision. Springer, Cham, pp 386–402
Google Scholar
Sridhar S, Rempe D (2019) Multiview aggregation for learning category-specific shape reconstruction. Advances in Neural Information Processing Systems (NeurIPS)
Zhu JY, Zhang Z, Zhang C, Wu J, Torralba A, Tenenbaum JB, Freeman WT (2018) Visual object networks: Image generation with disentangled 3D representation. In 32nd International conference on neural information processing systems, pp 118—129
Zhang X, Zhang Z, Zhang C, Tenenbaum JB, Freeman WT, Wu J (2018) Learning to reconstruct shapes from unseen classes. 32nd International conference on neural information processing systems. Montréal, Canada, pp 2263–2274
Google Scholar
Sun X, Wu J, Zhang X, Zhang Z, Zhang C, Xue T, Tenenbaum JB, Freeman WT (2018) Pix3D: dataset and methods for single-image 3D shape modeling. In IEEE Conference on computer vision and pattern recognition, pp 2974–2983
Julia Navarro, Neus Sabater (2021) Learning occlusion-aware view synthesis for light fields. Pattern Anal Appl 24(3):1319–1334
Article Google Scholar
Dai A, Qi CR, Niebner M (2017) Shape completion using 3D-encoder-predictor CNNs and shape synthesis. In IEEE Conference on computer vision and pattern recognition, pp 6545–6554
Yuan W , Khot T, Held D, Mertz C, Hebert M (2018) Pcn: point completion network. In International conference on 3D vision, pp 728–737
Yang Y, Feng C, Shen Y, Tian D (2018) Foldingnet: point cloud auto-encoder via deep grid deformation. In IEEE Conference on computer vision and pattern recognition, pp 206–215
Achlioptas P, Diamanti O, Mitliagkas I, Guibas L (2018) Learning representations and generative models for 3D point clouds. In 35th International Conference on Machine Learning, vol 80, pp 40–49
Charles R, Su H, Kaichun M, Guibas LJ (2017) PointNet: deep learning on point sets for 3D classification and segmentation. In IEEE Conference on computer vision and pattern recognition, pp 77–85, Los Alamitos, CA, USA
Horn Berthold KP (1987) Closed-form solution of absolute orientation using unit quaternions. J Opt Soc Am A 4(4):629–642
Article Google Scholar
Kingma Diederik P, Jimmy B (2015) Adam: a method for stochastic optimization. In 3rd International conference on learning representations, pp 1–15

Download references

Funding

The work is not funded.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Institute of Technology, University of Kashmir, Srinagar, J &K, 190006, India
Abdul Mueed Hafiz & Rouf Ul Alam Bhat
Department of Electronics and Instrumentation Technology, University of Kashmir, Srinagar, J &K, 190006, India
Shabir Ahmad Parah
Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, AlKharj, 16278, Saudi Arabia
M. Hassaballah
Department of Computer Science, Faculty of Computers and Information, South Valley University, Qena, Egypt
M. Hassaballah

Authors

Abdul Mueed Hafiz
View author publications
You can also search for this author in PubMed Google Scholar
Rouf Ul Alam Bhat
View author publications
You can also search for this author in PubMed Google Scholar
Shabir Ahmad Parah
View author publications
You can also search for this author in PubMed Google Scholar
M. Hassaballah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdul Mueed Hafiz.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hafiz, A.M., Bhat, R.U.A., Parah, S.A. et al. SE-MD: a single-encoder multiple-decoder deep network for point cloud reconstruction from 2D images. Pattern Anal Applic 26, 1291–1302 (2023). https://doi.org/10.1007/s10044-023-01155-x

Download citation

Received: 29 May 2021
Accepted: 21 March 2023
Published: 04 April 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10044-023-01155-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SE-MD: a single-encoder multiple-decoder deep network for point cloud reconstruction from 2D images

Abstract

Access this article

Similar content being viewed by others

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SE-MD: a single-encoder multiple-decoder deep network for point cloud reconstruction from 2D images

Abstract

Access this article

Similar content being viewed by others

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

BEVFormer: Learning Bird’s-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation