Skip to main content
Log in

A convolutional architecture for 3D model embedding using image views

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

During the last years, many advances have been made in tasks like 3D model retrieval, 3D model classification, and 3D model segmentation. The typical 3D representations such as point clouds, voxels, and polygon meshes are mostly suitable for rendering purposes, while their use for cognitive processes (retrieval, classification, segmentation) is limited due to their high redundancy and complexity. We propose a deep learning architecture to handle 3D models represented as sets of image views as input. Our proposed architecture combines other standard architectures, like Convolutional Neural Networks and autoencoders, for computing 3D model embeddings using sets of image views extracted from the 3D models, avoiding the common view pooling layer approach used in these cases. Our goal is to represent a 3D model as a vector with enough information so it can substitute the 3D model for high-level tasks. Since this vector is a learned representation which tries to capture the relevant information of a 3D model, we show that the embedding representation conveys semantic information that helps to deal with the similarity assessment of 3D objects. We compare our proposed embedding technique with state-of-the-art techniques for 3D Model Retrieval using the ShapeNet and ModelNet datasets. We show that the embeddings obtained with our proposed architecture allow us to obtain a high effectiveness score in both normalized and perturbed versions of the ShapeNet dataset while improving the training and inference times compared to the standard state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://github.com/panmari/stanford-shapenet-renderer.

  2. https://github.com/CabinfeverB/RotationNet-TensorFlow.

  3. https://github.com/AnTao97/PointCloudDatasets.

  4. https://github.com/charlesq34/pointnet, https://github.com/Yochengliu/DensePoint.

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a Meeting Held December 3-6, 2012, Lake Tahoe, Nevada, United States., pp. 1106–1114 (2012)

  2. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  3. Ioannidou, A., Chatzilari, E., Nikolopoulos, S., Kompatsiaris, I.: Deep learning advances in computer vision with 3d data: A survey. ACM Comput. Surv. 50(2), 20–12038 (2017)

    Google Scholar 

  4. Sun, K., Zhang, J., Liu, J., Yu, R., Song, Z.: DRCNN: dynamic routing convolutional neural network for multi-view 3d object recognition. IEEE Trans. Image Process. 30, 868–877 (2021). https://doi.org/10.1109/TIP.2020.3039378

    Article  PubMed  ADS  Google Scholar 

  5. Li, B., Godil, A., Aono, M., Bai, X., Furuya, T., Li, L., López-Sastre, R.J., Johan, H., Ohbuchi, R., Redondo-Cabrera, C., Tatsuma, A., Yanagimachi, T., Zhang, S.: Shrec’12 track: Generic 3d shape retrieval. In: Eurographics Workshop on 3D Object Retrieval 2012, Cagliari, Italy, May 13, 2012. Proceedings, pp. 119–126 (2012). https://doi.org/10.2312/3DOR/3DOR12/119-126

  6. Shamir, A.: A survey on mesh segmentation techniques. Comput. Graph. Forum 27(6), 1539–1556 (2008). https://doi.org/10.1111/j.1467-8659.2007.01103.x

    Article  Google Scholar 

  7. Authors: a brief survey on 3d semantic segmentation of lidar point cloud with deep learning. In: 3rd Novel Intelligent and Leading Emerging Sciences Conference, NILES 2021, Giza, Egypt, October 23-25, 2021, pp. 405–408 (2021). https://doi.org/10.1109/NILES53778.2021.9600493

  8. Nguyen, A., Le, B.: 3d point cloud segmentation: a survey. In: IEEE 6th International Conference on Robotics, Automation and Mechatronics, RAM 2013, Manila, Philippines, November 12-15, 2013, pp. 225–230 (2013). https://doi.org/10.1109/RAM.2013.6758588

  9. Cerri, A., Biasotti, S., Abdelrahman, M., Angulo, J., Berger, K., Chevallier, L., El-Melegy, M.T., Farag, A.A., Lefebvre, F., Giachetti, A., Guermoud, H., Liu, Y.-, Velasco-Forero, S., Vigouroux, J., Xu, C.-, Zhang, J.-: Shrec’13 track: Retrieval on textured 3d models. In: Eurographics Workshop on 3D Object Retrieval, Girona, Spain, 2013. Proceedings, pp. 73–80 (2013). https://doi.org/10.2312/3DOR/3DOR13/073-080

  10. Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., Bennamoun, M.: Deep learning for 3d point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4338–4364 (2021). https://doi.org/10.1109/TPAMI.2020.3005434

    Article  PubMed  Google Scholar 

  11. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 77–85 (2017). https://doi.org/10.1109/CVPR.2017.16

  12. Liu, Y., Fan, B., Meng, G., Lu, J., Xiang, S., Pan, C.: Densepoint: learning densely contextual representation for efficient point cloud processing. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 5238–5247 (2019). https://doi.org/10.1109/ICCV.2019.00534

  13. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38(5), 146–114612 (2019). https://doi.org/10.1145/3326362

    Article  Google Scholar 

  14. Liu, Y., Fan, B., Xiang, S., Pan, C.: Relation-shape convolutional neural network for point cloud analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 8895–8904 (2019). https://doi.org/10.1109/CVPR.2019.00910. http://openaccess.thecvf.com/content_CVPR_2019/html/Liu_Relation-Shape_Convolutional_Neural_Network_for_Point_Cloud_Analysis_CVPR_2019_paper.html

  15. Wei, X., Yu, R., Sun, J.: View-gcn: View-based graph convolutional network for 3d shape analysis. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 1847–1856 (2020). https://doi.org/10.1109/CVPR42600.2020.00192. https://openaccess.thecvf.com/content_CVPR_2020/html/Wei_View-GCN_View-Based_Graph_Convolutional_Network_for_3D_Shape_Analysis_CVPR_2020_paper.html

  16. Guo, H., Wang, J., Gao, Y., Li, J., Lu, H.: Multi-view 3d object retrieval with deep embedding network. IEEE Trans. Image Process. 25(12), 5526–5537 (2016). https://doi.org/10.1109/TIP.2016.2609814

    Article  MathSciNet  PubMed  ADS  Google Scholar 

  17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). arXiv:1409.1556

  18. Aktar, S., Al Mamun, M.: Multi-view 3d object retrieval using autoencoder & deep embedding network. In: 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 1–6 (2019). IEEE

  19. Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view cnns for object classification on 3d data. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 5648–5656 (2016). https://doi.org/10.1109/CVPR.2016.609

  20. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.G.: Multi-view convolutional neural networks for 3d shape recognition. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp. 945–953 (2015). https://doi.org/10.1109/ICCV.2015.114

  21. Sfikas, K., Pratikakis, I., Theoharis, T.: Ensemble of panorama-based convolutional neural networks for 3d model classification and retrieval. Comput. Graph. 71, 208–218 (2018)

    Article  Google Scholar 

  22. Papadakis, P., Pratikakis, I., Theoharis, T., Perantonis, S.J.: PANORAMA: a 3d shape descriptor based on panoramic views for unsupervised 3d object retrieval. Int. J. Comput. Vis. 89(2–3), 177–192 (2010)

    Article  Google Scholar 

  23. Han, Z., Lu, H., Liu, Z., Vong, C., Liu, Y., Zwicker, M., Han, J., Chen, C.L.P.: 3d2seqviews: aggregating sequential views for 3d global feature learning by CNN with hierarchical attention aggregation. IEEE Trans. Image Process. 28(8), 3986–3999 (2019). https://doi.org/10.1109/TIP.2019.2904460

    Article  MathSciNet  PubMed  ADS  Google Scholar 

  24. Liu, Z., Zhang, Y., Gao, J., Wang, S.: VFMVAC: view-filtering-based multi-view aggregating convolution for 3d shape recognition and retrieval. Pattern Recognit. 129, 108774 (2022). https://doi.org/10.1016/j.patcog.2022.108774

    Article  Google Scholar 

  25. Lin, D., Li, Y., Cheng, Y., Prasad, S., Nwe, T.L., Dong, S., Guo, A.: Multi-view 3d object retrieval leveraging the aggregation of view and instance attentive features. Knowl. Based Syst. 247, 108754 (2022). https://doi.org/10.1016/j.knosys.2022.108754

    Article  Google Scholar 

  26. Rong, X.: word2vec parameter learning explained. CoRR abs/1411.2738 (2014) arXiv:1411.2738

  27. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 1532–1543 (2014). https://www.aclweb.org/anthology/D14-1162/

  28. Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pp. 670–680 (2017). https://www.aclweb.org/anthology/D17-1070/

  29. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019). https://www.aclweb.org/anthology/N19-1423/

  30. Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI, pp. 467–483 (2016). https://doi.org/10.1007/978-3-319-46466-4_28

  31. Kiela, D., Bottou, L.: Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A Meeting of SIGDAT, a Special Interest Group of The ACL, pp. 36–45 (2014). https://www.aclweb.org/anthology/D14-1005/

  32. Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: Learning global representations for image search. In: Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI, pp. 241–257 (2016). https://doi.org/10.1007/978-3-319-46466-4_15

  33. Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.J.: Learning representations and generative models for 3d point clouds. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings (2018). https://openreview.net/forum?id=r14RP5AUz

  34. Chen, X., Chen, B., Mitra, N.J.: Unpaired point cloud completion on real scans using adversarial training. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 (2020). https://openreview.net/forum?id=HkgrZ0EYwB

  35. Sitzmann, V., Thies, J., Heide, F., Nießner, M., Wetzstein, G., Zollhöfer, M.: Deepvoxels: Learning persistent 3d feature embeddings. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 2437–2446 (2019). https://doi.org/10.1109/CVPR.2019.00254. http://openaccess.thecvf.com/content_CVPR_2019/html/Sitzmann_DeepVoxels_Learning_Persistent_3D_Feature_Embeddings_CVPR_2019_paper.html

  36. Li, H., Sun, L., Dong, S., Zhu, X., Cai, Q., Du, J.: Efficient 3d object retrieval based on compact views and hamming embedding. IEEE Access 6, 31854–31861 (2018). https://doi.org/10.1109/ACCESS.2018.2845362

    Article  Google Scholar 

  37. Turchenko, V., Chalmers, E., Luczak, A.: A deep convolutional auto-encoder with pooling - unpooling layers in caffe. CoRR abs/1701.04949 (2017) arXiv:1701.04949

  38. David, O.E., Netanyahu, N.S.: Deeppainter: Painter classification using deep convolutional autoencoders. In: Villa, A.E.P., Masulli, P., Rivero, A.J.P. (eds.) Artificial Neural Networks and Machine Learning - ICANN 2016 - 25th International Conference on Artificial Neural Networks, Barcelona, Spain, September 6-9, 2016, Proceedings, Part II. Lecture Notes in Computer Science, vol. 9887, pp. 20–28 (2016). https://doi.org/10.1007/978-3-319-44781-0_3

  39. Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning. CoRR abs/1603.07285 (2016) arXiv:1603.07285

  40. Ahmed, E., Saint, A., Shabayek, A.E.R., Cherenkova, K., Das, R., Gusev, G., Aouada, D., Ottersten, B.E.: Deep learning advances on different 3d data representations: a survey. CoRR abs/1808.01462 (2018) arXiv:1808.01462

  41. Zhao, L., Liang, S., Jia, J., Wei, Y.: Learning best views of 3d shapes from sketch contour. Vis. Comput. 31(6–8), 765–774 (2015). https://doi.org/10.1007/s00371-015-1091-1

    Article  Google Scholar 

  42. Zuva, K., Zuva, T.: Evaluation of information retrieval systems. Int. J. Comput. Sci. Inf. Technol. 4(3), 35 (2012)

    Google Scholar 

  43. Chang, A.X., Funkhouser, T.A., Guibas, L.J., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: Shapenet: an information-rich 3d model repository. CoRR abs/1512.03012 (2015) arXiv:1512.03012

  44. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pp. 1912–1920 (2015). https://doi.org/10.1109/CVPR.2015.7298801

  45. Kanezaki, A., Matsushita, Y., Nishida, Y.: Rotationnet for joint object categorization and unsupervised pose estimation from multi-view images. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 269–283 (2021). https://doi.org/10.1109/TPAMI.2019.2922640

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

This work was funded by ANID - Millennium Science Initiative Program-Code ICN17_002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arniel Labrada.

Ethics declarations

Conflict of interest

Authors declare that they have no competing interests nor financial or personal conflict, that may influence the content and results of the paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Labrada, A., Bustos, B. & Sipiran, I. A convolutional architecture for 3D model embedding using image views. Vis Comput 40, 1601–1615 (2024). https://doi.org/10.1007/s00371-023-02872-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02872-4

Keywords

Navigation