Abstract
During the last years, many advances have been made in tasks like 3D model retrieval, 3D model classification, and 3D model segmentation. The typical 3D representations such as point clouds, voxels, and polygon meshes are mostly suitable for rendering purposes, while their use for cognitive processes (retrieval, classification, segmentation) is limited due to their high redundancy and complexity. We propose a deep learning architecture to handle 3D models represented as sets of image views as input. Our proposed architecture combines other standard architectures, like Convolutional Neural Networks and autoencoders, for computing 3D model embeddings using sets of image views extracted from the 3D models, avoiding the common view pooling layer approach used in these cases. Our goal is to represent a 3D model as a vector with enough information so it can substitute the 3D model for high-level tasks. Since this vector is a learned representation which tries to capture the relevant information of a 3D model, we show that the embedding representation conveys semantic information that helps to deal with the similarity assessment of 3D objects. We compare our proposed embedding technique with state-of-the-art techniques for 3D Model Retrieval using the ShapeNet and ModelNet datasets. We show that the embeddings obtained with our proposed architecture allow us to obtain a high effectiveness score in both normalized and perturbed versions of the ShapeNet dataset while improving the training and inference times compared to the standard state-of-the-art techniques.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a Meeting Held December 3-6, 2012, Lake Tahoe, Nevada, United States., pp. 1106–1114 (2012)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Ioannidou, A., Chatzilari, E., Nikolopoulos, S., Kompatsiaris, I.: Deep learning advances in computer vision with 3d data: A survey. ACM Comput. Surv. 50(2), 20–12038 (2017)
Sun, K., Zhang, J., Liu, J., Yu, R., Song, Z.: DRCNN: dynamic routing convolutional neural network for multi-view 3d object recognition. IEEE Trans. Image Process. 30, 868–877 (2021). https://doi.org/10.1109/TIP.2020.3039378
Li, B., Godil, A., Aono, M., Bai, X., Furuya, T., Li, L., López-Sastre, R.J., Johan, H., Ohbuchi, R., Redondo-Cabrera, C., Tatsuma, A., Yanagimachi, T., Zhang, S.: Shrec’12 track: Generic 3d shape retrieval. In: Eurographics Workshop on 3D Object Retrieval 2012, Cagliari, Italy, May 13, 2012. Proceedings, pp. 119–126 (2012). https://doi.org/10.2312/3DOR/3DOR12/119-126
Shamir, A.: A survey on mesh segmentation techniques. Comput. Graph. Forum 27(6), 1539–1556 (2008). https://doi.org/10.1111/j.1467-8659.2007.01103.x
Authors: a brief survey on 3d semantic segmentation of lidar point cloud with deep learning. In: 3rd Novel Intelligent and Leading Emerging Sciences Conference, NILES 2021, Giza, Egypt, October 23-25, 2021, pp. 405–408 (2021). https://doi.org/10.1109/NILES53778.2021.9600493
Nguyen, A., Le, B.: 3d point cloud segmentation: a survey. In: IEEE 6th International Conference on Robotics, Automation and Mechatronics, RAM 2013, Manila, Philippines, November 12-15, 2013, pp. 225–230 (2013). https://doi.org/10.1109/RAM.2013.6758588
Cerri, A., Biasotti, S., Abdelrahman, M., Angulo, J., Berger, K., Chevallier, L., El-Melegy, M.T., Farag, A.A., Lefebvre, F., Giachetti, A., Guermoud, H., Liu, Y.-, Velasco-Forero, S., Vigouroux, J., Xu, C.-, Zhang, J.-: Shrec’13 track: Retrieval on textured 3d models. In: Eurographics Workshop on 3D Object Retrieval, Girona, Spain, 2013. Proceedings, pp. 73–80 (2013). https://doi.org/10.2312/3DOR/3DOR13/073-080
Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., Bennamoun, M.: Deep learning for 3d point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4338–4364 (2021). https://doi.org/10.1109/TPAMI.2020.3005434
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 77–85 (2017). https://doi.org/10.1109/CVPR.2017.16
Liu, Y., Fan, B., Meng, G., Lu, J., Xiang, S., Pan, C.: Densepoint: learning densely contextual representation for efficient point cloud processing. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 5238–5247 (2019). https://doi.org/10.1109/ICCV.2019.00534
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38(5), 146–114612 (2019). https://doi.org/10.1145/3326362
Liu, Y., Fan, B., Xiang, S., Pan, C.: Relation-shape convolutional neural network for point cloud analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 8895–8904 (2019). https://doi.org/10.1109/CVPR.2019.00910. http://openaccess.thecvf.com/content_CVPR_2019/html/Liu_Relation-Shape_Convolutional_Neural_Network_for_Point_Cloud_Analysis_CVPR_2019_paper.html
Wei, X., Yu, R., Sun, J.: View-gcn: View-based graph convolutional network for 3d shape analysis. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 1847–1856 (2020). https://doi.org/10.1109/CVPR42600.2020.00192. https://openaccess.thecvf.com/content_CVPR_2020/html/Wei_View-GCN_View-Based_Graph_Convolutional_Network_for_3D_Shape_Analysis_CVPR_2020_paper.html
Guo, H., Wang, J., Gao, Y., Li, J., Lu, H.: Multi-view 3d object retrieval with deep embedding network. IEEE Trans. Image Process. 25(12), 5526–5537 (2016). https://doi.org/10.1109/TIP.2016.2609814
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). arXiv:1409.1556
Aktar, S., Al Mamun, M.: Multi-view 3d object retrieval using autoencoder & deep embedding network. In: 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 1–6 (2019). IEEE
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view cnns for object classification on 3d data. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 5648–5656 (2016). https://doi.org/10.1109/CVPR.2016.609
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.G.: Multi-view convolutional neural networks for 3d shape recognition. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp. 945–953 (2015). https://doi.org/10.1109/ICCV.2015.114
Sfikas, K., Pratikakis, I., Theoharis, T.: Ensemble of panorama-based convolutional neural networks for 3d model classification and retrieval. Comput. Graph. 71, 208–218 (2018)
Papadakis, P., Pratikakis, I., Theoharis, T., Perantonis, S.J.: PANORAMA: a 3d shape descriptor based on panoramic views for unsupervised 3d object retrieval. Int. J. Comput. Vis. 89(2–3), 177–192 (2010)
Han, Z., Lu, H., Liu, Z., Vong, C., Liu, Y., Zwicker, M., Han, J., Chen, C.L.P.: 3d2seqviews: aggregating sequential views for 3d global feature learning by CNN with hierarchical attention aggregation. IEEE Trans. Image Process. 28(8), 3986–3999 (2019). https://doi.org/10.1109/TIP.2019.2904460
Liu, Z., Zhang, Y., Gao, J., Wang, S.: VFMVAC: view-filtering-based multi-view aggregating convolution for 3d shape recognition and retrieval. Pattern Recognit. 129, 108774 (2022). https://doi.org/10.1016/j.patcog.2022.108774
Lin, D., Li, Y., Cheng, Y., Prasad, S., Nwe, T.L., Dong, S., Guo, A.: Multi-view 3d object retrieval leveraging the aggregation of view and instance attentive features. Knowl. Based Syst. 247, 108754 (2022). https://doi.org/10.1016/j.knosys.2022.108754
Rong, X.: word2vec parameter learning explained. CoRR abs/1411.2738 (2014) arXiv:1411.2738
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 1532–1543 (2014). https://www.aclweb.org/anthology/D14-1162/
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pp. 670–680 (2017). https://www.aclweb.org/anthology/D17-1070/
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019). https://www.aclweb.org/anthology/N19-1423/
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI, pp. 467–483 (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Kiela, D., Bottou, L.: Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 36–45 (2014). https://www.aclweb.org/anthology/D14-1005/
Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: Learning global representations for image search. In: Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI, pp. 241–257 (2016). https://doi.org/10.1007/978-3-319-46466-4_15
Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.J.: Learning representations and generative models for 3d point clouds. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings (2018). https://openreview.net/forum?id=r14RP5AUz
Chen, X., Chen, B., Mitra, N.J.: Unpaired point cloud completion on real scans using adversarial training. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 (2020). https://openreview.net/forum?id=HkgrZ0EYwB
Sitzmann, V., Thies, J., Heide, F., Nießner, M., Wetzstein, G., Zollhöfer, M.: Deepvoxels: Learning persistent 3d feature embeddings. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 2437–2446 (2019). https://doi.org/10.1109/CVPR.2019.00254. http://openaccess.thecvf.com/content_CVPR_2019/html/Sitzmann_DeepVoxels_Learning_Persistent_3D_Feature_Embeddings_CVPR_2019_paper.html
Li, H., Sun, L., Dong, S., Zhu, X., Cai, Q., Du, J.: Efficient 3d object retrieval based on compact views and hamming embedding. IEEE Access 6, 31854–31861 (2018). https://doi.org/10.1109/ACCESS.2018.2845362
Turchenko, V., Chalmers, E., Luczak, A.: A deep convolutional auto-encoder with pooling - unpooling layers in caffe. CoRR abs/1701.04949 (2017) arXiv:1701.04949
David, O.E., Netanyahu, N.S.: Deeppainter: Painter classification using deep convolutional autoencoders. In: Villa, A.E.P., Masulli, P., Rivero, A.J.P. (eds.) Artificial Neural Networks and Machine Learning - ICANN 2016 - 25th International Conference on Artificial Neural Networks, Barcelona, Spain, September 6-9, 2016, Proceedings, Part II. Lecture Notes in Computer Science, vol. 9887, pp. 20–28 (2016). https://doi.org/10.1007/978-3-319-44781-0_3
Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning. CoRR abs/1603.07285 (2016) arXiv:1603.07285
Ahmed, E., Saint, A., Shabayek, A.E.R., Cherenkova, K., Das, R., Gusev, G., Aouada, D., Ottersten, B.E.: Deep learning advances on different 3d data representations: a survey. CoRR abs/1808.01462 (2018) arXiv:1808.01462
Zhao, L., Liang, S., Jia, J., Wei, Y.: Learning best views of 3d shapes from sketch contour. Vis. Comput. 31(6–8), 765–774 (2015). https://doi.org/10.1007/s00371-015-1091-1
Zuva, K., Zuva, T.: Evaluation of information retrieval systems. Int. J. Comput. Sci. Inf. Technol. 4(3), 35 (2012)
Chang, A.X., Funkhouser, T.A., Guibas, L.J., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: Shapenet: an information-rich 3d model repository. CoRR abs/1512.03012 (2015) arXiv:1512.03012
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp. 1912–1920 (2015). https://doi.org/10.1109/CVPR.2015.7298801
Kanezaki, A., Matsushita, Y., Nishida, Y.: Rotationnet for joint object categorization and unsupervised pose estimation from multi-view images. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 269–283 (2021). https://doi.org/10.1109/TPAMI.2019.2922640
Acknowledgements
This work was funded by ANID - Millennium Science Initiative Program-Code ICN17_002.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors declare that they have no competing interests nor financial or personal conflict, that may influence the content and results of the paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Labrada, A., Bustos, B. & Sipiran, I. A convolutional architecture for 3D model embedding using image views. Vis Comput 40, 1601–1615 (2024). https://doi.org/10.1007/s00371-023-02872-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02872-4