A convolutional architecture for 3D model embedding using image views

Labrada, Arniel; Bustos, Benjamin; Sipiran, Ivan

doi:10.1007/s00371-023-02872-4

A convolutional architecture for 3D model embedding using image views

Original article
Published: 28 April 2023

Volume 40, pages 1601–1615, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

449 Accesses
Explore all metrics

Abstract

During the last years, many advances have been made in tasks like 3D model retrieval, 3D model classification, and 3D model segmentation. The typical 3D representations such as point clouds, voxels, and polygon meshes are mostly suitable for rendering purposes, while their use for cognitive processes (retrieval, classification, segmentation) is limited due to their high redundancy and complexity. We propose a deep learning architecture to handle 3D models represented as sets of image views as input. Our proposed architecture combines other standard architectures, like Convolutional Neural Networks and autoencoders, for computing 3D model embeddings using sets of image views extracted from the 3D models, avoiding the common view pooling layer approach used in these cases. Our goal is to represent a 3D model as a vector with enough information so it can substitute the 3D model for high-level tasks. Since this vector is a learned representation which tries to capture the relevant information of a 3D model, we show that the embedding representation conveys semantic information that helps to deal with the similarity assessment of 3D objects. We compare our proposed embedding technique with state-of-the-art techniques for 3D Model Retrieval using the ShapeNet and ModelNet datasets. We show that the embeddings obtained with our proposed architecture allow us to obtain a high effectiveness score in both normalized and perturbed versions of the ShapeNet dataset while improving the training and inference times compared to the standard state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale CNNs for 3D model retrieval

Article 19 January 2018

CONDENSE: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images

CISPc: Embedding Images and Point Clouds in a Joint Concept Space by Contrastive Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a Meeting Held December 3-6, 2012, Lake Tahoe, Nevada, United States., pp. 1106–1114 (2012)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Ioannidou, A., Chatzilari, E., Nikolopoulos, S., Kompatsiaris, I.: Deep learning advances in computer vision with 3d data: A survey. ACM Comput. Surv. 50(2), 20–12038 (2017)
Google Scholar
Sun, K., Zhang, J., Liu, J., Yu, R., Song, Z.: DRCNN: dynamic routing convolutional neural network for multi-view 3d object recognition. IEEE Trans. Image Process. 30, 868–877 (2021). https://doi.org/10.1109/TIP.2020.3039378
Article PubMed ADS Google Scholar
Li, B., Godil, A., Aono, M., Bai, X., Furuya, T., Li, L., López-Sastre, R.J., Johan, H., Ohbuchi, R., Redondo-Cabrera, C., Tatsuma, A., Yanagimachi, T., Zhang, S.: Shrec’12 track: Generic 3d shape retrieval. In: Eurographics Workshop on 3D Object Retrieval 2012, Cagliari, Italy, May 13, 2012. Proceedings, pp. 119–126 (2012). https://doi.org/10.2312/3DOR/3DOR12/119-126
Shamir, A.: A survey on mesh segmentation techniques. Comput. Graph. Forum 27(6), 1539–1556 (2008). https://doi.org/10.1111/j.1467-8659.2007.01103.x
Article Google Scholar
Authors: a brief survey on 3d semantic segmentation of lidar point cloud with deep learning. In: 3rd Novel Intelligent and Leading Emerging Sciences Conference, NILES 2021, Giza, Egypt, October 23-25, 2021, pp. 405–408 (2021). https://doi.org/10.1109/NILES53778.2021.9600493
Nguyen, A., Le, B.: 3d point cloud segmentation: a survey. In: IEEE 6th International Conference on Robotics, Automation and Mechatronics, RAM 2013, Manila, Philippines, November 12-15, 2013, pp. 225–230 (2013). https://doi.org/10.1109/RAM.2013.6758588
Cerri, A., Biasotti, S., Abdelrahman, M., Angulo, J., Berger, K., Chevallier, L., El-Melegy, M.T., Farag, A.A., Lefebvre, F., Giachetti, A., Guermoud, H., Liu, Y.-, Velasco-Forero, S., Vigouroux, J., Xu, C.-, Zhang, J.-: Shrec’13 track: Retrieval on textured 3d models. In: Eurographics Workshop on 3D Object Retrieval, Girona, Spain, 2013. Proceedings, pp. 73–80 (2013). https://doi.org/10.2312/3DOR/3DOR13/073-080
Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., Bennamoun, M.: Deep learning for 3d point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4338–4364 (2021). https://doi.org/10.1109/TPAMI.2020.3005434
Article PubMed Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 77–85 (2017). https://doi.org/10.1109/CVPR.2017.16
Liu, Y., Fan, B., Meng, G., Lu, J., Xiang, S., Pan, C.: Densepoint: learning densely contextual representation for efficient point cloud processing. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 5238–5247 (2019). https://doi.org/10.1109/ICCV.2019.00534
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38(5), 146–114612 (2019). https://doi.org/10.1145/3326362
Article Google Scholar
Liu, Y., Fan, B., Xiang, S., Pan, C.: Relation-shape convolutional neural network for point cloud analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 8895–8904 (2019). https://doi.org/10.1109/CVPR.2019.00910. http://openaccess.thecvf.com/content_CVPR_2019/html/Liu_Relation-Shape_Convolutional_Neural_Network_for_Point_Cloud_Analysis_CVPR_2019_paper.html
Wei, X., Yu, R., Sun, J.: View-gcn: View-based graph convolutional network for 3d shape analysis. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 1847–1856 (2020). https://doi.org/10.1109/CVPR42600.2020.00192. https://openaccess.thecvf.com/content_CVPR_2020/html/Wei_View-GCN_View-Based_Graph_Convolutional_Network_for_3D_Shape_Analysis_CVPR_2020_paper.html
Guo, H., Wang, J., Gao, Y., Li, J., Lu, H.: Multi-view 3d object retrieval with deep embedding network. IEEE Trans. Image Process. 25(12), 5526–5537 (2016). https://doi.org/10.1109/TIP.2016.2609814
Article MathSciNet PubMed ADS Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). arXiv:1409.1556
Aktar, S., Al Mamun, M.: Multi-view 3d object retrieval using autoencoder & deep embedding network. In: 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 1–6 (2019). IEEE
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view cnns for object classification on 3d data. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 5648–5656 (2016). https://doi.org/10.1109/CVPR.2016.609
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.G.: Multi-view convolutional neural networks for 3d shape recognition. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp. 945–953 (2015). https://doi.org/10.1109/ICCV.2015.114
Sfikas, K., Pratikakis, I., Theoharis, T.: Ensemble of panorama-based convolutional neural networks for 3d model classification and retrieval. Comput. Graph. 71, 208–218 (2018)
Article Google Scholar
Papadakis, P., Pratikakis, I., Theoharis, T., Perantonis, S.J.: PANORAMA: a 3d shape descriptor based on panoramic views for unsupervised 3d object retrieval. Int. J. Comput. Vis. 89(2–3), 177–192 (2010)
Article Google Scholar
Han, Z., Lu, H., Liu, Z., Vong, C., Liu, Y., Zwicker, M., Han, J., Chen, C.L.P.: 3d2seqviews: aggregating sequential views for 3d global feature learning by CNN with hierarchical attention aggregation. IEEE Trans. Image Process. 28(8), 3986–3999 (2019). https://doi.org/10.1109/TIP.2019.2904460
Article MathSciNet PubMed ADS Google Scholar
Liu, Z., Zhang, Y., Gao, J., Wang, S.: VFMVAC: view-filtering-based multi-view aggregating convolution for 3d shape recognition and retrieval. Pattern Recognit. 129, 108774 (2022). https://doi.org/10.1016/j.patcog.2022.108774
Article Google Scholar
Lin, D., Li, Y., Cheng, Y., Prasad, S., Nwe, T.L., Dong, S., Guo, A.: Multi-view 3d object retrieval leveraging the aggregation of view and instance attentive features. Knowl. Based Syst. 247, 108754 (2022). https://doi.org/10.1016/j.knosys.2022.108754
Article Google Scholar
Rong, X.: word2vec parameter learning explained. CoRR abs/1411.2738 (2014) arXiv:1411.2738
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 1532–1543 (2014). https://www.aclweb.org/anthology/D14-1162/
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pp. 670–680 (2017). https://www.aclweb.org/anthology/D17-1070/
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019). https://www.aclweb.org/anthology/N19-1423/
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI, pp. 467–483 (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Kiela, D., Bottou, L.: Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 36–45 (2014). https://www.aclweb.org/anthology/D14-1005/
Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: Learning global representations for image search. In: Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI, pp. 241–257 (2016). https://doi.org/10.1007/978-3-319-46466-4_15
Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.J.: Learning representations and generative models for 3d point clouds. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings (2018). https://openreview.net/forum?id=r14RP5AUz
Chen, X., Chen, B., Mitra, N.J.: Unpaired point cloud completion on real scans using adversarial training. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 (2020). https://openreview.net/forum?id=HkgrZ0EYwB
Sitzmann, V., Thies, J., Heide, F., Nießner, M., Wetzstein, G., Zollhöfer, M.: Deepvoxels: Learning persistent 3d feature embeddings. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 2437–2446 (2019). https://doi.org/10.1109/CVPR.2019.00254. http://openaccess.thecvf.com/content_CVPR_2019/html/Sitzmann_DeepVoxels_Learning_Persistent_3D_Feature_Embeddings_CVPR_2019_paper.html
Li, H., Sun, L., Dong, S., Zhu, X., Cai, Q., Du, J.: Efficient 3d object retrieval based on compact views and hamming embedding. IEEE Access 6, 31854–31861 (2018). https://doi.org/10.1109/ACCESS.2018.2845362
Article Google Scholar
Turchenko, V., Chalmers, E., Luczak, A.: A deep convolutional auto-encoder with pooling - unpooling layers in caffe. CoRR abs/1701.04949 (2017) arXiv:1701.04949
David, O.E., Netanyahu, N.S.: Deeppainter: Painter classification using deep convolutional autoencoders. In: Villa, A.E.P., Masulli, P., Rivero, A.J.P. (eds.) Artificial Neural Networks and Machine Learning - ICANN 2016 - 25th International Conference on Artificial Neural Networks, Barcelona, Spain, September 6-9, 2016, Proceedings, Part II. Lecture Notes in Computer Science, vol. 9887, pp. 20–28 (2016). https://doi.org/10.1007/978-3-319-44781-0_3
Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning. CoRR abs/1603.07285 (2016) arXiv:1603.07285
Ahmed, E., Saint, A., Shabayek, A.E.R., Cherenkova, K., Das, R., Gusev, G., Aouada, D., Ottersten, B.E.: Deep learning advances on different 3d data representations: a survey. CoRR abs/1808.01462 (2018) arXiv:1808.01462
Zhao, L., Liang, S., Jia, J., Wei, Y.: Learning best views of 3d shapes from sketch contour. Vis. Comput. 31(6–8), 765–774 (2015). https://doi.org/10.1007/s00371-015-1091-1
Article Google Scholar
Zuva, K., Zuva, T.: Evaluation of information retrieval systems. Int. J. Comput. Sci. Inf. Technol. 4(3), 35 (2012)
Google Scholar
Chang, A.X., Funkhouser, T.A., Guibas, L.J., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: Shapenet: an information-rich 3d model repository. CoRR abs/1512.03012 (2015) arXiv:1512.03012
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp. 1912–1920 (2015). https://doi.org/10.1109/CVPR.2015.7298801
Kanezaki, A., Matsushita, Y., Nishida, Y.: Rotationnet for joint object categorization and unsupervised pose estimation from multi-view images. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 269–283 (2021). https://doi.org/10.1109/TPAMI.2019.2922640
Article PubMed Google Scholar

Download references

Acknowledgements

This work was funded by ANID - Millennium Science Initiative Program-Code ICN17_002.

Author information

Benjamin Bustos, Ivan Sipiran have contributed equally to this work.

Authors and Affiliations

Department of Computer Science, University of Chile, Santiago, Chile
Arniel Labrada, Benjamin Bustos & Ivan Sipiran

Authors

Arniel Labrada
View author publications
You can also search for this author inPubMed Google Scholar
Benjamin Bustos
View author publications
You can also search for this author inPubMed Google Scholar
Ivan Sipiran
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Arniel Labrada.

Ethics declarations

Conflict of interest

Authors declare that they have no competing interests nor financial or personal conflict, that may influence the content and results of the paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Labrada, A., Bustos, B. & Sipiran, I. A convolutional architecture for 3D model embedding using image views. Vis Comput 40, 1601–1615 (2024). https://doi.org/10.1007/s00371-023-02872-4

Download citation

Accepted: 07 April 2023
Published: 28 April 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00371-023-02872-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A convolutional architecture for 3D model embedding using image views

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-scale CNNs for 3D model retrieval

CONDENSE: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images

CISPc: Embedding Images and Point Clouds in a Joint Concept Space by Contrastive Learning

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now