Abstract
Recent advancements and breakthroughs in deep learning have accelerated the rapid development in the field of computer vision. Having recorded a huge success in 2D object perception and detection, a lot of progress has also been made in 3D object reconstruction. Since humans can infer and relate better with 3D world images by just a single view 2D image of the object, it is necessary to train computers to think in 3D to achieve some key applications of computer vision. The use of deep learning in 3D object reconstruction of single-view images is rapidly evolving and recording significant results. In this research, we explore the Facebook well-known hybrid approach called Mesh R-CNN that combines voxel generation and triangular mesh reconstruction to generate 3D mesh structure of an object from a 2D single-view image. Although the reconstruction of objects with varying geometry and topology was achieved by Mesh R-CNN, the mesh quality was affected due to topological errors like self-intersection, causing non-smooth and rough mesh generation. In this research, Mesh R-CNN with Laplacian Smoothing (Mesh R-CNN-LS) was proposed to use the Laplacian smoothing and regularization algorithm to refine the non-smooth and rough mesh. The proposed Mesh R-CNN-LS helps to constrain the triangular deformation and generate a better and smoother 3D mesh. The proposed Mesh R-CNN-LS was compared with the original Mesh R-CNN on the Pix3D dataset and it showed better performance in terms of the loss and average precision score.
References
Gkioxari, G., Johnson, J., Malik, J.: Mesh R-CNN. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9784–9794 (2019). https://doi.org/10.1109/ICCV.2019.00988
Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction (2016). CoRR, abs/1604.00449. http://arxiv.org/abs/1604.00449
Xie, H., Yao, H., Sun, X., Zhou, S., Zhang, S., Tong, X.: Pix2Vox: context-aware 3d reconstruction from single and multi-view images (2019). CoRR, abs/1901.11153. http://arxiv.org/abs/1901.11153
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. VisionComput. Vision 40(2), 99–121 (2000). https://doi.org/10.1023/A:1026543900054
Jin, J., Patil, A.G., Zhang, H.: (Richard).: DR-KFD: a differentiable visual metric for 3d shape reconstruction (2019). CoRR, abs/1911.09204. http://arxiv.org/abs/1911.09204
Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.-G.: Pixel2Mesh: generating 3d mesh models from single RGB images (2018). CoRR, abs/1804.01654. http://arxiv.org/abs/1804.01654
Fu, K., Peng, J., He, Q., Zhang, H.: Single image 3D object reconstruction based on deep learning: a review. Multimedia Tools Appl. 80(1), 463–498 (2020). https://doi.org/10.1007/s11042-020-09722-8
Charrada, T.B., Tabia, H., Chetouani, A., Laga, H.: Learnable triangulation for deep learning-based 3d reconstruction of objects of arbitrary topology from single RGB images (2021). CoRR, abs/2109.11844. https://arxiv.org/abs/2109.11844
Nealen, A., Igarashi, T., Sorkine, O., Alexa, M.: Laplacian mesh optimization. In: Proceedings of the 4th International Conference on Computer Graphics and Interactive Techniques in Australasia and Southeast Asia, pp. 381–389 (2006). https://doi.org/10.1145/1174429.1174494
Desbrun, M., Meyer, M., Schröder, P., Barr, A.H.: Implicit fairing of irregular meshes using diffusion and curvature flow. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 317–324 (1999). https://doi.org/10.1145/311535.311576
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN (2017). CoRR, abs/1703.06870. http://arxiv.org/abs/1703.06870
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3d object reconstruction from a single image (2016). CoRR, abs/1612.00603. http://arxiv.org/abs/1612.00603
Sun, X., et al.: Pix3D: dataset and methods for single-image 3D shape modeling (2018). CoRR, abs/1804.04610. http://arxiv.org/abs/1804.04610
Chai, J., Zeng, H., Li, A., Ngai, E.W.T.: Deep learning in computer vision: a critical review of emerging techniques and application scenarios. Mach Learn. Appl. 6, 100134 (2021). https://doi.org/10.1016/j.mlwa.2021.100134
Chang, A.X., et al.: Shapenet: an information-rich 3d model repository (2015). arXiv preprint arXiv:1512.03012
Wu, Y.: Monocular instance level 3d object reconstruction based on mesh R-CNN. In: 2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT), pp. 1–6 (2020). https://doi.org/10.1109/ISCTT51595.2020.00035
Hiu, J.: mAP (mean Average Precision) for object detection by Jonathan Hui. Medium (2018)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2, pp. 2366–2374 (2014)
Zhou, K., et al.: Large mesh deformation using the volumetric graph Laplacian. ACM Trans. Graph. 24(3), 496–503 (2005). https://doi.org/10.1145/1073204.1073219
Acknowledgments
This work was supported in part by Liaoning Province Applied Basic Research Program: Human-machine Fusion Intelligent Modeling and Collaborative Optimization Driven by Data and Knowledge under Grant 2023JH2/101300184. We appreciate Mr. John Files for supporting us with HPC for processing deep neural networks.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 IFIP International Federation for Information Processing
About this paper
Cite this paper
Aremu, S.S., Taherkhani, A., Liu, C., Yang, S. (2024). 3D Object Reconstruction with Deep Learning. In: Shi, Z., Torresen, J., Yang, S. (eds) Intelligent Information Processing XII. IIP 2024. IFIP Advances in Information and Communication Technology, vol 704. Springer, Cham. https://doi.org/10.1007/978-3-031-57919-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-57919-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57918-9
Online ISBN: 978-3-031-57919-6
eBook Packages: Computer ScienceComputer Science (R0)