Skip to main content
Log in

Learning visual-and-semantic knowledge embedding for zero-shot image classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Recently, several classifier prediction methods have emerged exploiting knowledge graphs and Graph Convolutional Neural Network (GCN), achieving excellent results in the field of Zero-Shot Learning (ZSL). However, existing methods only rely on pre-trained seen class classifier parameters to guide the model’s training, prohibiting the discriminative visual features from being mined and not guaranteeing the effective use of semantic features. Therefore, this work presents a novel Knowledge-Assisted ZSL Model (KAZSLM), which improves the classification ability by embedding visual information and semantic information into the classifier space. In this work, GCN classifier prediction network promoted by word embedding and inter-class relationships is employed as the Basic Framework (BF), which is then combined with a Visual Knowledge Assistant (VKA) module and a Semantic Knowledge Assistant (SKA) module to form KAZSLM. In the VKA module, the average visual feature of all the samples in each seen class and its corresponding class label are used to guide the model to refine the classifier at a lower computational cost. Regarding the SKA module, the samples’ semantic features per class are applied to refine the classifier through a GCN with a loss function related to reconstruct each classes’ semantic features from the corresponding classifier parameters. These two assistant modules allow visual knowledge and semantic knowledge to force the whole model to acquire more precise classifier. Moreover, a simple convolutional residual network is taken to further reinforce the performance of the model on the AWA2 dataset. Experimental results on the AWA2 and ImageNet datasets demonstrate that KAZSLM achieves a better image classification performance than current ZSL classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Sun X, Gu J, Sun H (2021) Research progress of zero-shot learning. Appl Intell 51 (6):3600–3614

    Article  Google Scholar 

  2. Imrattanatrai W, Kato MP, Yoshikawa M (2019) Identifying entity properties from text with zero-shot learning. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, pp 195–204

  3. Hong M, Li G, Zhang X, Huang Q (2020) Generalized zero-shot video classification via generative adversarial networks. In: Proceedings of the 28th ACM international conference on multimedia, pp 2419–2426

  4. Yang C, Wu W, Wang Y, Zhou H (2021) A novel feature-based model for zero-shot object detection with simulated attributes. Appl Intell, 1–10

  5. Shen L, Yeung S, Hoffman J, Mori G, Fei-Fei L (2018) Scaling human-object interaction recognition through zero-shot learning. In: 2018 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 1568–1576

  6. Tian Y, Kong Y, Ruan Q, An G, Fu Y (2019) Aligned dynamic-preserving embedding for zero-shot action recognition. IEEE Trans Circuits Syst Video Technol 30(6):1597–1612

    Article  Google Scholar 

  7. Akata Z, Perronnin F, Harchaoui Z, Schmid C (2013) Label-embedding for attribute-based classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 819–826

  8. Li Y, Wang D, Hu H, Lin Y, Zhuang Y (2017) Zero-shot recognition using dual visual-semantic mapping paths. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3279–3287

  9. Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T (2013) Devise: a deep visual-semantic embedding model. Advances in neural information processing systems, 26

  10. Wang X, Ye Y, Gupta A (2018) Zero-shot recognition via semantic embeddings and knowledge graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6857–6866

  11. Kampffmeyer M, Chen Y, Liang X, Wang H, Zhang Y, Xing EP (2019) Rethinking knowledge graph propagation for zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11487–11496

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  13. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp 248–255

  14. Miller GA (1995) Wordnet: A lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  15. Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (9):2251–2265

    Article  Google Scholar 

  16. Hayashi T, Fujita H, Hernandez-Matamoros A (2021) Less complexity one-class classification approach using construction error of convolutional image transformation network. Inf Sci 560:217–234

    Article  MathSciNet  Google Scholar 

  17. Lampert C.H, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 951–958

  18. Chao W-L, Changpinyo S, Gong B, Sha F (2016) An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: European conference on computer vision. Springer, pp 52–68

  19. Hayashi T, Fujita H (2021) Cluster-based zero-shot learning for multivariate data. J Ambient Intell Human Comput 12(2):1897–1911

    Article  Google Scholar 

  20. Li X, Zhang D, Ye M, Li X, Dou Q, Lv Q (2020) Bidirectional generative transductive zero-shot learning. Neural Comput & Applic, 1–14

  21. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Process Syst, 27

  22. Chen Z, Luo Y, Qiu R, Wang S, Huang Z, Li J, Zhang Z (2021) Semantics disentangling for generalized zero-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8712–8720

  23. Long Y, Liu L, Shao L, Shen F, Ding G, Han J (2017) From zero-shot learning to conventional supervised classification: Unseen visual data synthesis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1627–1636

  24. Shen T, Lei T, Barzilay R, Jaakkola T (2017) Style transfer from non-parallel text by cross-alignment. Adv Neural Informa Processi Syst, 6831–6842

  25. Xian Y, Lorenz T, Schiele B, Akata Z (2018) Feature generating networks for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5542–5551

  26. Felix R, Reid I, Carneiro G (2018) β L Multi-modal cycle-consistent generalized zero-shot learning. In: Proceedings of the european conference on computer vision, pp 21–37

  27. Sariyildiz MB, Cinbis RG (2019) Gradient matching generative networks for zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2168–2178

  28. Verma VK, Brahma D, Rai P (2020) Meta-learning for generalized zero-shot learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 6062–6069

  29. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning. PMLR, pp 1126–1135

  30. Li J, Jing M, Lu K, Zhu L, Yang Y, Huang Z (2019) Alleviating feature confusion for generative zero-shot learning. In: Proceedings of the 27th ACM international conference on multimedia, pp 1587–1595

  31. Chen S, Wang W, Xia B, Peng Q, You X, Zheng F, Shao L (2021) Free: Feature refinement for generalized zero-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 122–131

  32. Che T, Li Y, Jacob AP, Bengio Y, Li W (2017) Mode regularized generative adversarial networks. In: 5Th international conference on learning representations, ICLR 2017

  33. Chou Y-Y, Lin H-T, Liu T-L (2020) Adaptive and generative zero-shot learning. In: International conference on learning representations

  34. Bucher M, Herbin S, Jurie F (2016) Improving semantic embedding consistency by metric learning for zero-shot classiffication. In: European conference on computer vision. Springer, pp 730–746

  35. Ji Z, Cui B, Yu Y, Pang Y, Zhang Z (2021) Zero-shot classification with unseen prototype learning. Neural Comput & Applic, 1–11

  36. Mancini M, Naeem MF, Xian Y, Akata Z (2021) Open world compositional zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5222–5230

  37. Li K, Min MR, Fu Y (2019) Rethinking zero-shot learning: a conditional visual classification perspective. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3583–3592

  38. Kodirov E, Xiang T, Gong S (2017) Semantic autoencoder for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3174– 3183

  39. Xu W, Xian Y, Wang J, Schiele B, Akata Z (2020) Attribute prototype network for zero-shot learning. arXiv e-prints, 2008

  40. Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  41. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch

  42. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  43. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: 3Rd international conference on learning representations, ICLR 2015

  44. Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado GS, Dean J (2014) Zero-shot learning by convex combination of semantic embeddings. In: 2Nd international conference on learning representations, ICLR 2014

  45. Changpinyo S, Chao W.-L., Sha F (2017) Predicting visual exemplars of unseen classes for zero-shot learning. In: Proceedings of the IEEE international conference on computer vision, pp 3476–3485

  46. Changpinyo S, Chao W-L, Gong B, Sha F (2016) Synthesized classifiers for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5327–5336

  47. Liu S, Chen J, Pan L, Ngo C-W, Chua T-S, Jiang Y-G (2020) Hyperbolic visual embedding learning for zero-shot recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9273–9281

  48. Zhu Y, Long Y, Guan Y, Newsam S, Shao L (2018) Towards universal representation for unseen action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9436–9445

  49. Long J, Zhang S, Li C (2019) Evolving deep echo state networks for intelligent fault diagnosis. IEEE Transactions on Industrial Informatics 16(7):4928–4937

    Article  Google Scholar 

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China(No. 62172022), Beijing Natural Science Foundation(4202003).

Funding

National Natural Science Foundation of China(No. 61772049), Beijing Natural Science Foundation (4202003).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, Dehui Kong, Xiliang Li; Methodology Dehui Kong, Xiliang Li; Software, Xiliang Li; Validation, Shaofan Wang; Formal analysis, Jinghua Li; Resources, Baocai Yin;Writing - Original Draft, Xiliang Li; Writing - Review & Editing, Dehui Kong, Xiliang Li.

Corresponding authors

Correspondence to Dehui Kong or Xiliang Li.

Ethics declarations

Conflict of Interests

The authors declare that they have no any conflicts of interest about this work.

Additional information

Dehui Kong and Xiliang Li These authors contributed equally to this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kong, D., Li, X., Wang, S. et al. Learning visual-and-semantic knowledge embedding for zero-shot image classification. Appl Intell 53, 2250–2264 (2023). https://doi.org/10.1007/s10489-022-03443-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03443-1

Keywords

Navigation