Skip to main content
Log in

PushNet: 3D reconstruction from a single image by pushing

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Taking inspiration from the recent advancements in deep learning within the three-dimensional (3D) domain, we propose an end-to-end deep learning framework to reconstruct 3D shapes in point cloud format from a single color image. While many state-of-the-art learning-based 3D reconstruction methods are constrained to fixed resolutions, our framework, named PushNet, can produce point clouds with arbitrary resolutions and only require sparse point clouds during training. It predicts a push force for each randomly sampled spacial point and leads the point to project onto the surface of the underlying 3D object in the image. The network also employs a parallel design, allowing it to be trained on sparse point clouds and then generate point clouds of any resolution without degrading the quality or requiring any fine-tuning. Experiments on synthetic datasets and real datasets demonstrate the effectiveness of our method for inferring 3D shapes. We also demonstrate that our predicted point clouds can produce high-fidelity meshes after applying surface reconstruction algorithms. Experiments on linear interpolation, point cloud upsampling, and textured 3D reconstruction also prove the effectiveness of our framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The datasets generated during or analyzed during the current study are available in the ShapeNet repository, https://shapenet.org/.

References

  1. Kolev K, Cremers D, (2008) Integration of multiview stereo and silhouettes via convex functionals on convex domains. In: European conference on computer vision, Springer, (pp. 752–765).

  2. Kostrikov I, Horbert E, Leibe B, (2014) Probabilistic labeling cost for high-accuracy multi-view reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1534–1541)

  3. Kar AC, Malik J, (2017) Learning a multi-view stereo machine. arXiv preprint arXiv:1708.05375

  4. Wen C, Zhang Y, Li Z, Fu Y, (2019) Pixel2mesh++: multi-view 3d mesh generation via deformation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 1042–1051)

  5. Choy CB, Xu D, Gwak J, Chen K, Savarese S, (2016) 3d-r2n2: a unified approach for single and multi-view 3d object reconstruction. In: European conference on computer vision. Springer, (pp. 628–644)

  6. Girdhar R, Fouhey DF, Rodriguez M, Gupta A, (2016) Learning a predictable and generative vector representation for objects. In: European Conference on Computer Vision. Springer, (pp. 484–499)

  7. Wu J, Wang Y, Xue T, Sun X, Freeman WT, Tenenbaum JB, (2017) Marrnet: 3d shape reconstruction via 2.5 d sketches. arXiv preprint arXiv:1711.03129

  8. Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang Y-G, (2018) Pixel2mesh: generating 3d mesh models from single RGB images. In: Proceedings of the European Conference on Computer Vision (ECCV), (pp. 52–67)

  9. Mescheder L, Oechsle M, Niemeyer M, Nowozin S, Geiger A, (2019) Occupancy networks: learning 3d reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 4460–4470)

  10. Xie H, Yao H, Sun X, Zhou S, Zhang S, (2019) Pix2vox: context-aware 3d reconstruction from single and multi-view images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 2690–2698)

  11. Fan Haoqiang, Su H, Guibas LJ, (2017) A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 605–613)

  12. Mandikal P, Navaneet KL, Agarwal M, Babu RV, (2018) 3d-imnet: latent embedding matching for accurate and diverse 3d point cloud reconstruction from a single image. arXiv preprint arXiv:1807.07796

  13. Mandikal P, Radhakrishnan VB, (2019) Dense 3d point cloud reconstruction using a deep pyramid network. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, (pp. 1052–1060)

  14. Jiang L, Shi S, Qi X, Jia J, (2018) Gal: geometric adversarial loss for single-view 3d-object reconstruction. In: Proceedings of the European Conference on Computer Vision (ECCV), (pp. 802–816)

  15. Kar A, Tulsiani S, Carreira J, Malik J, (2015) Category-specific object reconstruction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 1966–1974)

  16. Kanazawa A, Tulsiani S, Efros AA, Malik J, (2018) Learning category-specific mesh reconstruction from image collections. In: Proceedings of the European Conference on Computer Vision (ECCV), (pp. 371–386)

  17. Lorensen WE, Cline HE (1987) Marching cubes: a high resolution 3d surface construction algorithm. ACM Siggraph Comput Graph 21(4):163–169

    Article  Google Scholar 

  18. Park JJ, Florence P, Straub J, Newcombe R, Lovegrove S, (2019) Deepsdf: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 165–174)

  19. Chen Z, Zhang H, (2019) Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 5939–5948)

  20. Chang AX, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H, et al. (2015) Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012

  21. Sun X, Wu J, Zhang X, Zhang Z, Zhang C, Xue T, Tenenbaum JB, Freeman WT, (2018) Pix3d: dataset and methods for single-image 3d shape modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2974–2983)

  22. Yagubbayli F, Tonioni A, Tombari F, (2021) Legoformer: transformers for block-by-block multi-view 3d reconstruction. arXiv preprint arXiv:2106.12102

  23. Meagher D (1982) Geometric modeling using octree encoding. Comput Graph Image Process 19(2):129–147

    Article  Google Scholar 

  24. Riegler G, Osman Ulusoy A, Geiger A, (2017) Octnet: learning deep 3d representations at high resolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 3577–3586)

  25. Wang P-S, Liu Y, Guo Y-X, Sun C-Y, Tong X (2017) O-CNN: octree-based convolutional neural networks for 3d shape analysis. ACM Trans Graph (TOG) 36(4):1–11

    CAS  Google Scholar 

  26. Tatarchenko M, Dosovitskiy A, Brox T, (2017) Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. In: Proceedings of the IEEE international conference on computer vision, (pp. 2088–2096)

  27. Wang P-S, Sun C-Y, Liu Y, Tong X (2018) Adaptive O-CNN: a patch-based deep representation of 3d shapes. ACM Trans Graph (TOG) 37(6):1–11

    Google Scholar 

  28. Qi CR, Su H, Mo K, Guibas LJ, (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 652–660)

  29. Groueix T, Fisher M, Kim VG, Russell BC, Aubry M, (2018) A papier-mâché approach to learning 3d surface generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 216–224)

  30. Spurek P, Winczowski S, Zięba Ma, Trzciński T, Kania K, (2021) Modeling 3d surface manifolds with a locally conditioned atlas. arXiv preprint arXiv:2102.05984

  31. Xu Q, Wang W, Ceylan D, Mech R, Neumann U, (2019) Disn: deep implicit surface network for high-quality single-view 3d reconstruction. arXiv preprint arXiv:1905.10711

  32. He K, Zhang X, Ren S, Sun J, (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 770–778)

  33. Knapitsch A, Park J, Zhou Q-Y, Koltun V (2017) Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans Graph (ToG) 36(4):1–13

    Article  Google Scholar 

  34. Edelsbrunner H, Mücke EP (1994) Three-dimensional alpha shapes. ACM Trans Graph (TOG) 13(1):43–72

    Article  Google Scholar 

  35. der Maaten V, Laurens HG (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guiju Ping.

Ethics declarations

Conflict of interest

The authors declared that there is no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ping, G., Wang, H. PushNet: 3D reconstruction from a single image by pushing. Neural Comput & Applic 36, 6629–6641 (2024). https://doi.org/10.1007/s00521-023-09408-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09408-w

Keywords

Navigation