Learning visual-and-semantic knowledge embedding for zero-shot image classification

Kong, Dehui; Li, Xiliang; Wang, Shaofan; Li, Jinghua; Yin, Baocai

doi:10.1007/s10489-022-03443-1

Learning visual-and-semantic knowledge embedding for zero-shot image classification

Published: 06 May 2022

Volume 53, pages 2250–2264, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Dehui Kong ORCID: orcid.org/0000-0001-7722-7172¹,
Xiliang Li¹,
Shaofan Wang¹,
Jinghua Li¹ &
…
Baocai Yin¹

767 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Recently, several classifier prediction methods have emerged exploiting knowledge graphs and Graph Convolutional Neural Network (GCN), achieving excellent results in the field of Zero-Shot Learning (ZSL). However, existing methods only rely on pre-trained seen class classifier parameters to guide the model’s training, prohibiting the discriminative visual features from being mined and not guaranteeing the effective use of semantic features. Therefore, this work presents a novel Knowledge-Assisted ZSL Model (KAZSLM), which improves the classification ability by embedding visual information and semantic information into the classifier space. In this work, GCN classifier prediction network promoted by word embedding and inter-class relationships is employed as the Basic Framework (BF), which is then combined with a Visual Knowledge Assistant (VKA) module and a Semantic Knowledge Assistant (SKA) module to form KAZSLM. In the VKA module, the average visual feature of all the samples in each seen class and its corresponding class label are used to guide the model to refine the classifier at a lower computational cost. Regarding the SKA module, the samples’ semantic features per class are applied to refine the classifier through a GCN with a loss function related to reconstruct each classes’ semantic features from the corresponding classifier parameters. These two assistant modules allow visual knowledge and semantic knowledge to force the whole model to acquire more precise classifier. Moreover, a simple convolutional residual network is taken to further reinforce the performance of the model on the AWA2 dataset. Experimental results on the AWA2 and ImageNet datasets demonstrate that KAZSLM achieves a better image classification performance than current ZSL classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics

Article Open access 15 May 2019

Semantic embeddings of generic objects for zero-shot learning

Article Open access 15 January 2019

Generalized Few-Shot Classification with Knowledge Graph

Article 20 April 2023

References

Sun X, Gu J, Sun H (2021) Research progress of zero-shot learning. Appl Intell 51 (6):3600–3614
Article Google Scholar
Imrattanatrai W, Kato MP, Yoshikawa M (2019) Identifying entity properties from text with zero-shot learning. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, pp 195–204
Hong M, Li G, Zhang X, Huang Q (2020) Generalized zero-shot video classification via generative adversarial networks. In: Proceedings of the 28th ACM international conference on multimedia, pp 2419–2426
Yang C, Wu W, Wang Y, Zhou H (2021) A novel feature-based model for zero-shot object detection with simulated attributes. Appl Intell, 1–10
Shen L, Yeung S, Hoffman J, Mori G, Fei-Fei L (2018) Scaling human-object interaction recognition through zero-shot learning. In: 2018 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 1568–1576
Tian Y, Kong Y, Ruan Q, An G, Fu Y (2019) Aligned dynamic-preserving embedding for zero-shot action recognition. IEEE Trans Circuits Syst Video Technol 30(6):1597–1612
Article Google Scholar
Akata Z, Perronnin F, Harchaoui Z, Schmid C (2013) Label-embedding for attribute-based classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 819–826
Li Y, Wang D, Hu H, Lin Y, Zhuang Y (2017) Zero-shot recognition using dual visual-semantic mapping paths. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3279–3287
Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T (2013) Devise: a deep visual-semantic embedding model. Advances in neural information processing systems, 26
Wang X, Ye Y, Gupta A (2018) Zero-shot recognition via semantic embeddings and knowledge graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6857–6866
Kampffmeyer M, Chen Y, Liang X, Wang H, Zhang Y, Xing EP (2019) Rethinking knowledge graph propagation for zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11487–11496
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp 248–255
Miller GA (1995) Wordnet: A lexical database for english. Commun ACM 38(11):39–41
Article Google Scholar
Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (9):2251–2265
Article Google Scholar
Hayashi T, Fujita H, Hernandez-Matamoros A (2021) Less complexity one-class classification approach using construction error of convolutional image transformation network. Inf Sci 560:217–234
Article MathSciNet Google Scholar
Lampert C.H, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 951–958
Chao W-L, Changpinyo S, Gong B, Sha F (2016) An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: European conference on computer vision. Springer, pp 52–68
Hayashi T, Fujita H (2021) Cluster-based zero-shot learning for multivariate data. J Ambient Intell Human Comput 12(2):1897–1911
Article Google Scholar
Li X, Zhang D, Ye M, Li X, Dou Q, Lv Q (2020) Bidirectional generative transductive zero-shot learning. Neural Comput & Applic, 1–14
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Process Syst, 27
Chen Z, Luo Y, Qiu R, Wang S, Huang Z, Li J, Zhang Z (2021) Semantics disentangling for generalized zero-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8712–8720
Long Y, Liu L, Shao L, Shen F, Ding G, Han J (2017) From zero-shot learning to conventional supervised classification: Unseen visual data synthesis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1627–1636
Shen T, Lei T, Barzilay R, Jaakkola T (2017) Style transfer from non-parallel text by cross-alignment. Adv Neural Informa Processi Syst, 6831–6842
Xian Y, Lorenz T, Schiele B, Akata Z (2018) Feature generating networks for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5542–5551
Felix R, Reid I, Carneiro G (2018) β L Multi-modal cycle-consistent generalized zero-shot learning. In: Proceedings of the european conference on computer vision, pp 21–37
Sariyildiz MB, Cinbis RG (2019) Gradient matching generative networks for zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2168–2178
Verma VK, Brahma D, Rai P (2020) Meta-learning for generalized zero-shot learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 6062–6069
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning. PMLR, pp 1126–1135
Li J, Jing M, Lu K, Zhu L, Yang Y, Huang Z (2019) Alleviating feature confusion for generative zero-shot learning. In: Proceedings of the 27th ACM international conference on multimedia, pp 1587–1595
Chen S, Wang W, Xia B, Peng Q, You X, Zheng F, Shao L (2021) Free: Feature refinement for generalized zero-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 122–131
Che T, Li Y, Jacob AP, Bengio Y, Li W (2017) Mode regularized generative adversarial networks. In: 5Th international conference on learning representations, ICLR 2017
Chou Y-Y, Lin H-T, Liu T-L (2020) Adaptive and generative zero-shot learning. In: International conference on learning representations
Bucher M, Herbin S, Jurie F (2016) Improving semantic embedding consistency by metric learning for zero-shot classiffication. In: European conference on computer vision. Springer, pp 730–746
Ji Z, Cui B, Yu Y, Pang Y, Zhang Z (2021) Zero-shot classification with unseen prototype learning. Neural Comput & Applic, 1–11
Mancini M, Naeem MF, Xian Y, Akata Z (2021) Open world compositional zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5222–5230
Li K, Min MR, Fu Y (2019) Rethinking zero-shot learning: a conditional visual classification perspective. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3583–3592
Kodirov E, Xiang T, Gong S (2017) Semantic autoencoder for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3174– 3183
Xu W, Xian Y, Wang J, Schiele B, Akata Z (2020) Attribute prototype network for zero-shot learning. arXiv e-prints, 2008
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1):1929–1958
MathSciNet MATH Google Scholar
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: 3Rd international conference on learning representations, ICLR 2015
Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado GS, Dean J (2014) Zero-shot learning by convex combination of semantic embeddings. In: 2Nd international conference on learning representations, ICLR 2014
Changpinyo S, Chao W.-L., Sha F (2017) Predicting visual exemplars of unseen classes for zero-shot learning. In: Proceedings of the IEEE international conference on computer vision, pp 3476–3485
Changpinyo S, Chao W-L, Gong B, Sha F (2016) Synthesized classifiers for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5327–5336
Liu S, Chen J, Pan L, Ngo C-W, Chua T-S, Jiang Y-G (2020) Hyperbolic visual embedding learning for zero-shot recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9273–9281
Zhu Y, Long Y, Guan Y, Newsam S, Shao L (2018) Towards universal representation for unseen action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9436–9445
Long J, Zhang S, Li C (2019) Evolving deep echo state networks for intelligent fault diagnosis. IEEE Transactions on Industrial Informatics 16(7):4928–4937
Article Google Scholar

Download references

Acknowledgements

The work is supported by the National Natural Science Foundation of China(No. 62172022), Beijing Natural Science Foundation(4202003).

Funding

National Natural Science Foundation of China(No. 61772049), Beijing Natural Science Foundation (4202003).

Author information

Authors and Affiliations

Beijing Institute of Artificial Intelligence, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Faculty of Information Technology, Beijing University of Technology, Chaoyang District, 100124, Beijing, China
Dehui Kong, Xiliang Li, Shaofan Wang, Jinghua Li & Baocai Yin

Authors

Dehui Kong
View author publications
You can also search for this author in PubMed Google Scholar
Xiliang Li
View author publications
You can also search for this author in PubMed Google Scholar
Shaofan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jinghua Li
View author publications
You can also search for this author in PubMed Google Scholar
Baocai Yin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, Dehui Kong, Xiliang Li; Methodology Dehui Kong, Xiliang Li; Software, Xiliang Li; Validation, Shaofan Wang; Formal analysis, Jinghua Li; Resources, Baocai Yin;Writing - Original Draft, Xiliang Li; Writing - Review & Editing, Dehui Kong, Xiliang Li.

Corresponding authors

Correspondence to Dehui Kong or Xiliang Li.

Ethics declarations

Conflict of Interests

The authors declare that they have no any conflicts of interest about this work.

Additional information

Dehui Kong and Xiliang Li These authors contributed equally to this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kong, D., Li, X., Wang, S. et al. Learning visual-and-semantic knowledge embedding for zero-shot image classification. Appl Intell 53, 2250–2264 (2023). https://doi.org/10.1007/s10489-022-03443-1

Download citation

Accepted: 25 February 2022
Published: 06 May 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10489-022-03443-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning visual-and-semantic knowledge embedding for zero-shot image classification

Abstract

Access this article

Similar content being viewed by others

Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics

Semantic embeddings of generic objects for zero-shot learning

Generalized Few-Shot Classification with Knowledge Graph

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning visual-and-semantic knowledge embedding for zero-shot image classification

Abstract

Access this article

Similar content being viewed by others

Zero-shot Fine-grained Classification by Deep Feature Learning with Semantics

Semantic embeddings of generic objects for zero-shot learning

Generalized Few-Shot Classification with Knowledge Graph

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation