PushNet: 3D reconstruction from a single image by pushing

Ping, Guiju; Wang, Han

doi:10.1007/s00521-023-09408-w

PushNet: 3D reconstruction from a single image by pushing

Original Article
Published: 28 January 2024

Volume 36, pages 6629–6641, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

319 Accesses
Explore all metrics

Abstract

Taking inspiration from the recent advancements in deep learning within the three-dimensional (3D) domain, we propose an end-to-end deep learning framework to reconstruct 3D shapes in point cloud format from a single color image. While many state-of-the-art learning-based 3D reconstruction methods are constrained to fixed resolutions, our framework, named PushNet, can produce point clouds with arbitrary resolutions and only require sparse point clouds during training. It predicts a push force for each randomly sampled spacial point and leads the point to project onto the surface of the underlying 3D object in the image. The network also employs a parallel design, allowing it to be trained on sparse point clouds and then generate point clouds of any resolution without degrading the quality or requiring any fine-tuning. Experiments on synthetic datasets and real datasets demonstrate the effectiveness of our method for inferring 3D shapes. We also demonstrate that our predicted point clouds can produce high-fidelity meshes after applying surface reconstruction algorithms. Experiments on linear interpolation, point cloud upsampling, and textured 3D reconstruction also prove the effectiveness of our framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated assembly quality inspection by deep learning with 2D and 3D synthetic CAD data

Article Open access 18 April 2024

PCT: Point cloud transformer

Article Open access 10 April 2021

3D point cloud-based place recognition: a survey

Article Open access 07 March 2024

Data availability

The datasets generated during or analyzed during the current study are available in the ShapeNet repository, https://shapenet.org/.

References

Kolev K, Cremers D, (2008) Integration of multiview stereo and silhouettes via convex functionals on convex domains. In: European conference on computer vision, Springer, (pp. 752–765).
Kostrikov I, Horbert E, Leibe B, (2014) Probabilistic labeling cost for high-accuracy multi-view reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1534–1541)
Kar AC, Malik J, (2017) Learning a multi-view stereo machine. arXiv preprint arXiv:1708.05375
Wen C, Zhang Y, Li Z, Fu Y, (2019) Pixel2mesh++: multi-view 3d mesh generation via deformation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 1042–1051)
Choy CB, Xu D, Gwak J, Chen K, Savarese S, (2016) 3d-r2n2: a unified approach for single and multi-view 3d object reconstruction. In: European conference on computer vision. Springer, (pp. 628–644)
Girdhar R, Fouhey DF, Rodriguez M, Gupta A, (2016) Learning a predictable and generative vector representation for objects. In: European Conference on Computer Vision. Springer, (pp. 484–499)
Wu J, Wang Y, Xue T, Sun X, Freeman WT, Tenenbaum JB, (2017) Marrnet: 3d shape reconstruction via 2.5 d sketches. arXiv preprint arXiv:1711.03129
Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang Y-G, (2018) Pixel2mesh: generating 3d mesh models from single RGB images. In: Proceedings of the European Conference on Computer Vision (ECCV), (pp. 52–67)
Mescheder L, Oechsle M, Niemeyer M, Nowozin S, Geiger A, (2019) Occupancy networks: learning 3d reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 4460–4470)
Xie H, Yao H, Sun X, Zhou S, Zhang S, (2019) Pix2vox: context-aware 3d reconstruction from single and multi-view images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 2690–2698)
Fan Haoqiang, Su H, Guibas LJ, (2017) A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 605–613)
Mandikal P, Navaneet KL, Agarwal M, Babu RV, (2018) 3d-imnet: latent embedding matching for accurate and diverse 3d point cloud reconstruction from a single image. arXiv preprint arXiv:1807.07796
Mandikal P, Radhakrishnan VB, (2019) Dense 3d point cloud reconstruction using a deep pyramid network. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, (pp. 1052–1060)
Jiang L, Shi S, Qi X, Jia J, (2018) Gal: geometric adversarial loss for single-view 3d-object reconstruction. In: Proceedings of the European Conference on Computer Vision (ECCV), (pp. 802–816)
Kar A, Tulsiani S, Carreira J, Malik J, (2015) Category-specific object reconstruction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 1966–1974)
Kanazawa A, Tulsiani S, Efros AA, Malik J, (2018) Learning category-specific mesh reconstruction from image collections. In: Proceedings of the European Conference on Computer Vision (ECCV), (pp. 371–386)
Lorensen WE, Cline HE (1987) Marching cubes: a high resolution 3d surface construction algorithm. ACM Siggraph Comput Graph 21(4):163–169
Article Google Scholar
Park JJ, Florence P, Straub J, Newcombe R, Lovegrove S, (2019) Deepsdf: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 165–174)
Chen Z, Zhang H, (2019) Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 5939–5948)
Chang AX, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H, et al. (2015) Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012
Sun X, Wu J, Zhang X, Zhang Z, Zhang C, Xue T, Tenenbaum JB, Freeman WT, (2018) Pix3d: dataset and methods for single-image 3d shape modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2974–2983)
Yagubbayli F, Tonioni A, Tombari F, (2021) Legoformer: transformers for block-by-block multi-view 3d reconstruction. arXiv preprint arXiv:2106.12102
Meagher D (1982) Geometric modeling using octree encoding. Comput Graph Image Process 19(2):129–147
Article Google Scholar
Riegler G, Osman Ulusoy A, Geiger A, (2017) Octnet: learning deep 3d representations at high resolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 3577–3586)
Wang P-S, Liu Y, Guo Y-X, Sun C-Y, Tong X (2017) O-CNN: octree-based convolutional neural networks for 3d shape analysis. ACM Trans Graph (TOG) 36(4):1–11
CAS Google Scholar
Tatarchenko M, Dosovitskiy A, Brox T, (2017) Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. In: Proceedings of the IEEE international conference on computer vision, (pp. 2088–2096)
Wang P-S, Sun C-Y, Liu Y, Tong X (2018) Adaptive O-CNN: a patch-based deep representation of 3d shapes. ACM Trans Graph (TOG) 37(6):1–11
Google Scholar
Qi CR, Su H, Mo K, Guibas LJ, (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 652–660)
Groueix T, Fisher M, Kim VG, Russell BC, Aubry M, (2018) A papier-mâché approach to learning 3d surface generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 216–224)
Spurek P, Winczowski S, Zięba Ma, Trzciński T, Kania K, (2021) Modeling 3d surface manifolds with a locally conditioned atlas. arXiv preprint arXiv:2102.05984
Xu Q, Wang W, Ceylan D, Mech R, Neumann U, (2019) Disn: deep implicit surface network for high-quality single-view 3d reconstruction. arXiv preprint arXiv:1905.10711
He K, Zhang X, Ren S, Sun J, (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 770–778)
Knapitsch A, Park J, Zhou Q-Y, Koltun V (2017) Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans Graph (ToG) 36(4):1–13
Article Google Scholar
Edelsbrunner H, Mücke EP (1994) Three-dimensional alpha shapes. ACM Trans Graph (TOG) 13(1):43–72
Article Google Scholar
der Maaten V, Laurens HG (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore, Singapore
Guiju Ping
School of Electrical Engineering and Artificial Intelligence, Xiamen University Malaysia, Jalan Sunsuria, Bandar Sunsuria, 43900, Sepang, Selangor, Malaysia
Han Wang

Authors

Guiju Ping
View author publications
You can also search for this author in PubMed Google Scholar
Han Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guiju Ping.

Ethics declarations

Conflict of interest

The authors declared that there is no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ping, G., Wang, H. PushNet: 3D reconstruction from a single image by pushing. Neural Comput & Applic 36, 6629–6641 (2024). https://doi.org/10.1007/s00521-023-09408-w

Download citation

Received: 06 July 2022
Accepted: 13 December 2023
Published: 28 January 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00521-023-09408-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PushNet: 3D reconstruction from a single image by pushing

Abstract

Access this article

Similar content being viewed by others

Automated assembly quality inspection by deep learning with 2D and 3D synthetic CAD data

PCT: Point cloud transformer

3D point cloud-based place recognition: a survey

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PushNet: 3D reconstruction from a single image by pushing

Abstract

Access this article

Similar content being viewed by others

Automated assembly quality inspection by deep learning with 2D and 3D synthetic CAD data

PCT: Point cloud transformer

3D point cloud-based place recognition: a survey

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation