Skip to main content
Log in

Semantically guided projection for zero-shot 3D model classification and retrieval

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

The most existing methods for 3D model classification and retrieval rely on the fully supervised training scheme, which are prohibitive and time-consuming to collect and label 3D models of wide different categories. How to make full use of the existing known data to represent the unknown data is a crucial topic. Inspired by the zero-shot learning in 2D image domain, we propose the semantically guided projection method to classify and retrieve unseen 3D models by exploring the semantic relationship between seen and unseen 3D models. First, we explore the multi-view information of 3D models to construct the semantic attributes as the prior knowledge to represent 3D models. Then, we learn bidirectional projections from visual features to semantics and from semantics to visual features, which can eliminate the gap between seen and unseen domains. Extensive experiments for zero-shot 3D model classification and retrieval on two popular datasets, ModelNet40 and ShapeNetCore55, have demonstrated the effectiveness and superiority of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1425–1438 (2016)

    Article  Google Scholar 

  2. Chang, A.X., Funkhouser, T.A., Guibas, L.J., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: Shapenet: An information-rich 3d model repository. CoRR abs/1512.03012 (2015)

  3. Chi, J., Peng, Y.: Zero-shot cross-media embedding learning with dual adversarial distribution network. IEEE Trans. Circuits Syst. Video Technol. 30(4), 1173–1187 (2020)

    Article  Google Scholar 

  4. Dai, G., Xie, J., Fang, Y.: Siamese cnn-bilstm architecture for 3d shape representation learning. In: IJCAI, pp. 670–676 (2018)

  5. Elhoseiny, M., Saleh, B., Elgammal, A.M.: Write a classifier: zero-shot learning using purely textual descriptions. In: ICCV, pp. 2584–2591 (2013)

  6. Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: GVCNN: group-view convolutional neural networks for 3d shape recognition. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 264–272 (2018)

  7. Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: GVCNN: group-view convolutional neural networks for 3d shape recognition. In: CVPR, pp. 264–272 (2018)

  8. Han, Z., Shang, M., Liu, Y., Zwicker, M.: View inter-prediction GAN: unsupervised representation learning for 3d shapes by learning global shape memories to support local view predictions. In: The Thirty-Third AAAI Conference on Artificial Intelligence, pp. 8376–8384 (2019)

  9. Huang, H., Wang, C., Yu, P.S., Wang, C.: Generative dual adversarial network for generalized zero-shot learning. In: CVPR, pp. 801–810 (2019)

  10. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)

    Article  Google Scholar 

  11. Kampffmeyer, M., Chen, Y., Liang, X., Wang, H., Zhang, Y., Xing, E.P.: Rethinking knowledge graph propagation for zero-shot learning. In: CVPR, pp. 11487–11496 (2019)

  12. Ko, Y.: A study of term weighting schemes using class information for text classification. In: The 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, Portland, OR, USA, August 12–16, 2012, pp. 1029–1030 (2012)

  13. Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: CVPR, pp. 4447–4456 (2017)

  14. Kwon, G., Al Regib, G.: A gating model for bias calibration in generalized zero-shot learning. IEEE Transactions on Image Processing (2022)

  15. Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR, pp. 951–958 (2009)

  16. Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: AAAI, pp. 646–651 (2008)

  17. Lei Ba, J., Swersky, K., Fidler, S., et al.: Predicting deep zero-shot convolutional neural networks using textual descriptions. In: ICCV, pp. 4247–4255 (2015)

  18. Li, J., Jing, M., Lu, K., Ding, Z., Zhu, L., Huang, Z.: Leveraging the invariant side of generative zero-shot learning. In: CVPR, pp. 7402–7411 (2019)

  19. Li, F., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR, pp. 524–531 (2005)

  20. Liu, L., Wu, S., Chen, R., Zhou, M.: Zero-shot image classification via coupled discriminative dictionary learning. In: ICSEE, pp. 363–372 (2017)

  21. Liu, A., Nie, W., Su, Y.: 3d object retrieval based on multi-view latent variable model. IEEE Trans. Circuits Syst. Video Technol. 29(3), 868–880 (2019)

    Article  Google Scholar 

  22. Liu, A., Zhou, H., Nie, W., Liu, Z., Liu, W., Xie, H., Mao, Z., Li, X., Song, D.: Hierarchical multi-view context modelling for 3d object classification and retrieval. Inf. Sci. 547, 984–995 (2021)

    Article  Google Scholar 

  23. Ma, Y., Yu, D., Wu, T., Wang, H.: Paddlepaddle: an open-source deep learning platform from industrial practice. Front. Data Domput. 1(1), 105–115 (2019)

    Google Scholar 

  24. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)

  25. Paddlepaddle: Paddlepaddle: An Easy-to-Use, Easy-to-Learn Deep Learning Platform. http://www.paddlepaddle.org/

  26. Parikh, D., Grauman, K.: Relative attributes. In: ICCV, pp. 503–510 (2011)

  27. Qiao, R., Liu, L., Shen, C., van den Hengel, A.: Less is more: zero-shot learning from online textual documents with noise suppression. In: CVPR, pp. 2249–2257 (2016)

  28. Reed, S.E., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: CVPR, pp. 49–58 (2016)

  29. Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I., Schiele, B.: What helps where - and why? semantic relatedness for knowledge transfer. In: CVPR, pp. 910–917 (2010)

  30. Sariyildiz, M.B., Cinbis, R.G.: Gradient matching generative networks for zero-shot learning. In: CVPR, pp. 2168–2178 (2019)

  31. Schönfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., Akata, Z.: Generalized zero- and few-shot learning via aligned variational autoencoders. In: CVPR, pp. 8247–8255 (2019)

  32. Sivic, J., Zisserman, A.: Efficient visual search of videos cast as text retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 591–606 (2009)

    Article  Google Scholar 

  33. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.G.: Multi-view convolutional neural networks for 3d shape recognition. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7=-13, 2015, pp. 945–953 (2015)

  34. Tian, Y., Kong, Y., Ruan, Q., An, G., Fu, Y.: Aligned dynamic-preserving embedding for zero-shot action recognition. IEEE Trans. Circuits Syst. Video Technol. 30(6), 1597–1612 (2020)

    Article  Google Scholar 

  35. Visualizing data using t-sne: Maaten, L.v.d., Hinton, G. Journal of machine learning research 9(Nov), 2579–2605 (2008)

  36. Wang, D., Li, Y., Lin, Y., Zhuang, Y.: Relational knowledge transfer for zero-shot learning. In: AAAI, pp. 2145–2151 (2016)

  37. Wang, X., Ye, Y., Gupta, A.: Zero-shot recognition via semantic embeddings and knowledge graphs. In: CVPR, pp. 6857–6866 (2018)

  38. Wang, W., Zheng, V.W., Yu, H., Miao, C.: A survey of zero-shot learning: settings, methods, and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 13 (2019)

    Google Scholar 

  39. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D shapenets: a deep representation for volumetric shapes. In: CVPR, pp. 1912–1920 (2015)

  40. Wu, T., Wang, H., Ma, Y., Yu, D.: Paddlepaddle: an open-source deep learning platform from industrial practice. Front. Data Comput. 1, 105–115 (2019)

    Google Scholar 

  41. Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., Schiele, B.: Latent embeddings for zero-shot classification. In: CVPR, pp. 69–77 (2016)

  42. Xu, C., Li, Z., Qiu, Q., Leng, B., Jiang, J.: Enhancing 2D representation via adjacent views for 3D shape retrieval. In: ICCV, pp. 3732–3740 (2019)

  43. Zhang, L., Wang, P., Liu, L., Shen, C., Wei, W., Zhang, Y., van den Hengel, A.: Towards effective deep embedding for zero-shot learning. IEEE Trans. Circuits Syst. Video Technol. 30(9), 2843–2852 (2020)

    Article  Google Scholar 

  44. Zhao, A., Ding, M., Guan, J., Lu, Z., Xiang, T., Wen, J.: Domain-invariant projection learning for zero-shot recognition. In: NIPS, pp. 1027–1038 (2018)

  45. Zhao, B., Wu, B., Wu, T., Wang, Y.: Zero-shot learning posed as a missing data problem. In: ICCV, pp. 2616–2622 (2017)

  46. Zheng, V.W., Hu, D.H., Yang, Q.: Cross-domain activity recognition. In: UbiComp 2009: Ubiquitous Computing, 11th International Conference, UbiComp 2009, Orlando, Florida, USA, September 30 - October 3, 2009, Proceedings, pp. 61–70 (2009)

Download references

Acknowledgements

This work was supported in part by the National Key Research and Development Program of China (2020YFB1709201), the National Natural Science Foundation of China (U21B2024, 61802277), and the Baidu Program.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wenhui Li or An-An Liu.

Additional information

Communicated by B.-K. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Su, Y., Li, J., Li, W. et al. Semantically guided projection for zero-shot 3D model classification and retrieval. Multimedia Systems 28, 2437–2451 (2022). https://doi.org/10.1007/s00530-022-00970-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-022-00970-2

Keywords

Navigation