Abstract
Taking inspiration from the recent advancements in deep learning within the three-dimensional (3D) domain, we propose an end-to-end deep learning framework to reconstruct 3D shapes in point cloud format from a single color image. While many state-of-the-art learning-based 3D reconstruction methods are constrained to fixed resolutions, our framework, named PushNet, can produce point clouds with arbitrary resolutions and only require sparse point clouds during training. It predicts a push force for each randomly sampled spacial point and leads the point to project onto the surface of the underlying 3D object in the image. The network also employs a parallel design, allowing it to be trained on sparse point clouds and then generate point clouds of any resolution without degrading the quality or requiring any fine-tuning. Experiments on synthetic datasets and real datasets demonstrate the effectiveness of our method for inferring 3D shapes. We also demonstrate that our predicted point clouds can produce high-fidelity meshes after applying surface reconstruction algorithms. Experiments on linear interpolation, point cloud upsampling, and textured 3D reconstruction also prove the effectiveness of our framework.
Similar content being viewed by others
Data availability
The datasets generated during or analyzed during the current study are available in the ShapeNet repository, https://shapenet.org/.
References
Kolev K, Cremers D, (2008) Integration of multiview stereo and silhouettes via convex functionals on convex domains. In: European conference on computer vision, Springer, (pp. 752–765).
Kostrikov I, Horbert E, Leibe B, (2014) Probabilistic labeling cost for high-accuracy multi-view reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1534–1541)
Kar AC, Malik J, (2017) Learning a multi-view stereo machine. arXiv preprint arXiv:1708.05375
Wen C, Zhang Y, Li Z, Fu Y, (2019) Pixel2mesh++: multi-view 3d mesh generation via deformation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 1042–1051)
Choy CB, Xu D, Gwak J, Chen K, Savarese S, (2016) 3d-r2n2: a unified approach for single and multi-view 3d object reconstruction. In: European conference on computer vision. Springer, (pp. 628–644)
Girdhar R, Fouhey DF, Rodriguez M, Gupta A, (2016) Learning a predictable and generative vector representation for objects. In: European Conference on Computer Vision. Springer, (pp. 484–499)
Wu J, Wang Y, Xue T, Sun X, Freeman WT, Tenenbaum JB, (2017) Marrnet: 3d shape reconstruction via 2.5 d sketches. arXiv preprint arXiv:1711.03129
Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang Y-G, (2018) Pixel2mesh: generating 3d mesh models from single RGB images. In: Proceedings of the European Conference on Computer Vision (ECCV), (pp. 52–67)
Mescheder L, Oechsle M, Niemeyer M, Nowozin S, Geiger A, (2019) Occupancy networks: learning 3d reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 4460–4470)
Xie H, Yao H, Sun X, Zhou S, Zhang S, (2019) Pix2vox: context-aware 3d reconstruction from single and multi-view images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, (pp. 2690–2698)
Fan Haoqiang, Su H, Guibas LJ, (2017) A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 605–613)
Mandikal P, Navaneet KL, Agarwal M, Babu RV, (2018) 3d-imnet: latent embedding matching for accurate and diverse 3d point cloud reconstruction from a single image. arXiv preprint arXiv:1807.07796
Mandikal P, Radhakrishnan VB, (2019) Dense 3d point cloud reconstruction using a deep pyramid network. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, (pp. 1052–1060)
Jiang L, Shi S, Qi X, Jia J, (2018) Gal: geometric adversarial loss for single-view 3d-object reconstruction. In: Proceedings of the European Conference on Computer Vision (ECCV), (pp. 802–816)
Kar A, Tulsiani S, Carreira J, Malik J, (2015) Category-specific object reconstruction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 1966–1974)
Kanazawa A, Tulsiani S, Efros AA, Malik J, (2018) Learning category-specific mesh reconstruction from image collections. In: Proceedings of the European Conference on Computer Vision (ECCV), (pp. 371–386)
Lorensen WE, Cline HE (1987) Marching cubes: a high resolution 3d surface construction algorithm. ACM Siggraph Comput Graph 21(4):163–169
Park JJ, Florence P, Straub J, Newcombe R, Lovegrove S, (2019) Deepsdf: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 165–174)
Chen Z, Zhang H, (2019) Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 5939–5948)
Chang AX, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H, et al. (2015) Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012
Sun X, Wu J, Zhang X, Zhang Z, Zhang C, Xue T, Tenenbaum JB, Freeman WT, (2018) Pix3d: dataset and methods for single-image 3d shape modeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2974–2983)
Yagubbayli F, Tonioni A, Tombari F, (2021) Legoformer: transformers for block-by-block multi-view 3d reconstruction. arXiv preprint arXiv:2106.12102
Meagher D (1982) Geometric modeling using octree encoding. Comput Graph Image Process 19(2):129–147
Riegler G, Osman Ulusoy A, Geiger A, (2017) Octnet: learning deep 3d representations at high resolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 3577–3586)
Wang P-S, Liu Y, Guo Y-X, Sun C-Y, Tong X (2017) O-CNN: octree-based convolutional neural networks for 3d shape analysis. ACM Trans Graph (TOG) 36(4):1–11
Tatarchenko M, Dosovitskiy A, Brox T, (2017) Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. In: Proceedings of the IEEE international conference on computer vision, (pp. 2088–2096)
Wang P-S, Sun C-Y, Liu Y, Tong X (2018) Adaptive O-CNN: a patch-based deep representation of 3d shapes. ACM Trans Graph (TOG) 37(6):1–11
Qi CR, Su H, Mo K, Guibas LJ, (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 652–660)
Groueix T, Fisher M, Kim VG, Russell BC, Aubry M, (2018) A papier-mâché approach to learning 3d surface generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 216–224)
Spurek P, Winczowski S, Zięba Ma, Trzciński T, Kania K, (2021) Modeling 3d surface manifolds with a locally conditioned atlas. arXiv preprint arXiv:2102.05984
Xu Q, Wang W, Ceylan D, Mech R, Neumann U, (2019) Disn: deep implicit surface network for high-quality single-view 3d reconstruction. arXiv preprint arXiv:1905.10711
He K, Zhang X, Ren S, Sun J, (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 770–778)
Knapitsch A, Park J, Zhou Q-Y, Koltun V (2017) Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans Graph (ToG) 36(4):1–13
Edelsbrunner H, Mücke EP (1994) Three-dimensional alpha shapes. ACM Trans Graph (TOG) 13(1):43–72
der Maaten V, Laurens HG (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared that there is no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ping, G., Wang, H. PushNet: 3D reconstruction from a single image by pushing. Neural Comput & Applic 36, 6629–6641 (2024). https://doi.org/10.1007/s00521-023-09408-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09408-w