3D Object Reconstruction with Deep Learning

Aremu, Stephen S.; Taherkhani, Aboozar; Liu, Chang; Yang, Shengxiang

doi:10.1007/978-3-031-57919-6_12

Stephen S. Aremu¹⁸,
Aboozar Taherkhani¹⁸,
Chang Liu¹⁹ &
…
Shengxiang Yang¹⁸

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 704))

Included in the following conference series:

International Conference on Intelligent Information Processing

31 Accesses

Abstract

Recent advancements and breakthroughs in deep learning have accelerated the rapid development in the field of computer vision. Having recorded a huge success in 2D object perception and detection, a lot of progress has also been made in 3D object reconstruction. Since humans can infer and relate better with 3D world images by just a single view 2D image of the object, it is necessary to train computers to think in 3D to achieve some key applications of computer vision. The use of deep learning in 3D object reconstruction of single-view images is rapidly evolving and recording significant results. In this research, we explore the Facebook well-known hybrid approach called Mesh R-CNN that combines voxel generation and triangular mesh reconstruction to generate 3D mesh structure of an object from a 2D single-view image. Although the reconstruction of objects with varying geometry and topology was achieved by Mesh R-CNN, the mesh quality was affected due to topological errors like self-intersection, causing non-smooth and rough mesh generation. In this research, Mesh R-CNN with Laplacian Smoothing (Mesh R-CNN-LS) was proposed to use the Laplacian smoothing and regularization algorithm to refine the non-smooth and rough mesh. The proposed Mesh R-CNN-LS helps to constrain the triangular deformation and generate a better and smoother 3D mesh. The proposed Mesh R-CNN-LS was compared with the original Mesh R-CNN on the Pix3D dataset and it showed better performance in terms of the loss and average precision score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Gkioxari, G., Johnson, J., Malik, J.: Mesh R-CNN. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9784–9794 (2019). https://doi.org/10.1109/ICCV.2019.00988
Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction (2016). CoRR, abs/1604.00449. http://arxiv.org/abs/1604.00449
Xie, H., Yao, H., Sun, X., Zhou, S., Zhang, S., Tong, X.: Pix2Vox: context-aware 3d reconstruction from single and multi-view images (2019). CoRR, abs/1901.11153. http://arxiv.org/abs/1901.11153
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. VisionComput. Vision 40(2), 99–121 (2000). https://doi.org/10.1023/A:1026543900054
Article Google Scholar
Jin, J., Patil, A.G., Zhang, H.: (Richard).: DR-KFD: a differentiable visual metric for 3d shape reconstruction (2019). CoRR, abs/1911.09204. http://arxiv.org/abs/1911.09204
Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.-G.: Pixel2Mesh: generating 3d mesh models from single RGB images (2018). CoRR, abs/1804.01654. http://arxiv.org/abs/1804.01654
Fu, K., Peng, J., He, Q., Zhang, H.: Single image 3D object reconstruction based on deep learning: a review. Multimedia Tools Appl. 80(1), 463–498 (2020). https://doi.org/10.1007/s11042-020-09722-8
Article Google Scholar
Charrada, T.B., Tabia, H., Chetouani, A., Laga, H.: Learnable triangulation for deep learning-based 3d reconstruction of objects of arbitrary topology from single RGB images (2021). CoRR, abs/2109.11844. https://arxiv.org/abs/2109.11844
Nealen, A., Igarashi, T., Sorkine, O., Alexa, M.: Laplacian mesh optimization. In: Proceedings of the 4th International Conference on Computer Graphics and Interactive Techniques in Australasia and Southeast Asia, pp. 381–389 (2006). https://doi.org/10.1145/1174429.1174494
Desbrun, M., Meyer, M., Schröder, P., Barr, A.H.: Implicit fairing of irregular meshes using diffusion and curvature flow. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 317–324 (1999). https://doi.org/10.1145/311535.311576
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN (2017). CoRR, abs/1703.06870. http://arxiv.org/abs/1703.06870
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3d object reconstruction from a single image (2016). CoRR, abs/1612.00603. http://arxiv.org/abs/1612.00603
Sun, X., et al.: Pix3D: dataset and methods for single-image 3D shape modeling (2018). CoRR, abs/1804.04610. http://arxiv.org/abs/1804.04610
Chai, J., Zeng, H., Li, A., Ngai, E.W.T.: Deep learning in computer vision: a critical review of emerging techniques and application scenarios. Mach Learn. Appl. 6, 100134 (2021). https://doi.org/10.1016/j.mlwa.2021.100134
Chang, A.X., et al.: Shapenet: an information-rich 3d model repository (2015). arXiv preprint arXiv:1512.03012
Wu, Y.: Monocular instance level 3d object reconstruction based on mesh R-CNN. In: 2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT), pp. 1–6 (2020). https://doi.org/10.1109/ISCTT51595.2020.00035
Hiu, J.: mAP (mean Average Precision) for object detection by Jonathan Hui. Medium (2018)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2, pp. 2366–2374 (2014)
Google Scholar
Zhou, K., et al.: Large mesh deformation using the volumetric graph Laplacian. ACM Trans. Graph. 24(3), 496–503 (2005). https://doi.org/10.1145/1073204.1073219
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by Liaoning Province Applied Basic Research Program: Human-machine Fusion Intelligent Modeling and Collaborative Optimization Driven by Data and Knowledge under Grant 2023JH2/101300184. We appreciate Mr. John Files for supporting us with HPC for processing deep neural networks.

Author information

Authors and Affiliations

School of Computer Science and Informatics, De Montfort University, Leicester, UK
Stephen S. Aremu, Aboozar Taherkhani & Shengxiang Yang
Digital Factory Department, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, 110016, China
Chang Liu

Authors

Stephen S. Aremu
View author publications
You can also search for this author in PubMed Google Scholar
Aboozar Taherkhani
View author publications
You can also search for this author in PubMed Google Scholar
Chang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shengxiang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aboozar Taherkhani .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Zhongzhi Shi
University of Oslo, Oslo, Norway
Jim Torresen
De Montfort University, Leicester, UK
Shengxiang Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aremu, S.S., Taherkhani, A., Liu, C., Yang, S. (2024). 3D Object Reconstruction with Deep Learning. In: Shi, Z., Torresen, J., Yang, S. (eds) Intelligent Information Processing XII. IIP 2024. IFIP Advances in Information and Communication Technology, vol 704. Springer, Cham. https://doi.org/10.1007/978-3-031-57919-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-57919-6_12
Published: 06 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57918-9
Online ISBN: 978-3-031-57919-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)