Abstract
Recently, several classifier prediction methods have emerged exploiting knowledge graphs and Graph Convolutional Neural Network (GCN), achieving excellent results in the field of Zero-Shot Learning (ZSL). However, existing methods only rely on pre-trained seen class classifier parameters to guide the model’s training, prohibiting the discriminative visual features from being mined and not guaranteeing the effective use of semantic features. Therefore, this work presents a novel Knowledge-Assisted ZSL Model (KAZSLM), which improves the classification ability by embedding visual information and semantic information into the classifier space. In this work, GCN classifier prediction network promoted by word embedding and inter-class relationships is employed as the Basic Framework (BF), which is then combined with a Visual Knowledge Assistant (VKA) module and a Semantic Knowledge Assistant (SKA) module to form KAZSLM. In the VKA module, the average visual feature of all the samples in each seen class and its corresponding class label are used to guide the model to refine the classifier at a lower computational cost. Regarding the SKA module, the samples’ semantic features per class are applied to refine the classifier through a GCN with a loss function related to reconstruct each classes’ semantic features from the corresponding classifier parameters. These two assistant modules allow visual knowledge and semantic knowledge to force the whole model to acquire more precise classifier. Moreover, a simple convolutional residual network is taken to further reinforce the performance of the model on the AWA2 dataset. Experimental results on the AWA2 and ImageNet datasets demonstrate that KAZSLM achieves a better image classification performance than current ZSL classification methods.
Similar content being viewed by others
References
Sun X, Gu J, Sun H (2021) Research progress of zero-shot learning. Appl Intell 51 (6):3600–3614
Imrattanatrai W, Kato MP, Yoshikawa M (2019) Identifying entity properties from text with zero-shot learning. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, pp 195–204
Hong M, Li G, Zhang X, Huang Q (2020) Generalized zero-shot video classification via generative adversarial networks. In: Proceedings of the 28th ACM international conference on multimedia, pp 2419–2426
Yang C, Wu W, Wang Y, Zhou H (2021) A novel feature-based model for zero-shot object detection with simulated attributes. Appl Intell, 1–10
Shen L, Yeung S, Hoffman J, Mori G, Fei-Fei L (2018) Scaling human-object interaction recognition through zero-shot learning. In: 2018 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 1568–1576
Tian Y, Kong Y, Ruan Q, An G, Fu Y (2019) Aligned dynamic-preserving embedding for zero-shot action recognition. IEEE Trans Circuits Syst Video Technol 30(6):1597–1612
Akata Z, Perronnin F, Harchaoui Z, Schmid C (2013) Label-embedding for attribute-based classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 819–826
Li Y, Wang D, Hu H, Lin Y, Zhuang Y (2017) Zero-shot recognition using dual visual-semantic mapping paths. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3279–3287
Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T (2013) Devise: a deep visual-semantic embedding model. Advances in neural information processing systems, 26
Wang X, Ye Y, Gupta A (2018) Zero-shot recognition via semantic embeddings and knowledge graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6857–6866
Kampffmeyer M, Chen Y, Liang X, Wang H, Zhang Y, Xing EP (2019) Rethinking knowledge graph propagation for zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11487–11496
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp 248–255
Miller GA (1995) Wordnet: A lexical database for english. Commun ACM 38(11):39–41
Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (9):2251–2265
Hayashi T, Fujita H, Hernandez-Matamoros A (2021) Less complexity one-class classification approach using construction error of convolutional image transformation network. Inf Sci 560:217–234
Lampert C.H, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 951–958
Chao W-L, Changpinyo S, Gong B, Sha F (2016) An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: European conference on computer vision. Springer, pp 52–68
Hayashi T, Fujita H (2021) Cluster-based zero-shot learning for multivariate data. J Ambient Intell Human Comput 12(2):1897–1911
Li X, Zhang D, Ye M, Li X, Dou Q, Lv Q (2020) Bidirectional generative transductive zero-shot learning. Neural Comput & Applic, 1–14
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Process Syst, 27
Chen Z, Luo Y, Qiu R, Wang S, Huang Z, Li J, Zhang Z (2021) Semantics disentangling for generalized zero-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8712–8720
Long Y, Liu L, Shao L, Shen F, Ding G, Han J (2017) From zero-shot learning to conventional supervised classification: Unseen visual data synthesis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1627–1636
Shen T, Lei T, Barzilay R, Jaakkola T (2017) Style transfer from non-parallel text by cross-alignment. Adv Neural Informa Processi Syst, 6831–6842
Xian Y, Lorenz T, Schiele B, Akata Z (2018) Feature generating networks for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5542–5551
Felix R, Reid I, Carneiro G (2018) β L Multi-modal cycle-consistent generalized zero-shot learning. In: Proceedings of the european conference on computer vision, pp 21–37
Sariyildiz MB, Cinbis RG (2019) Gradient matching generative networks for zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2168–2178
Verma VK, Brahma D, Rai P (2020) Meta-learning for generalized zero-shot learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 6062–6069
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning. PMLR, pp 1126–1135
Li J, Jing M, Lu K, Zhu L, Yang Y, Huang Z (2019) Alleviating feature confusion for generative zero-shot learning. In: Proceedings of the 27th ACM international conference on multimedia, pp 1587–1595
Chen S, Wang W, Xia B, Peng Q, You X, Zheng F, Shao L (2021) Free: Feature refinement for generalized zero-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 122–131
Che T, Li Y, Jacob AP, Bengio Y, Li W (2017) Mode regularized generative adversarial networks. In: 5Th international conference on learning representations, ICLR 2017
Chou Y-Y, Lin H-T, Liu T-L (2020) Adaptive and generative zero-shot learning. In: International conference on learning representations
Bucher M, Herbin S, Jurie F (2016) Improving semantic embedding consistency by metric learning for zero-shot classiffication. In: European conference on computer vision. Springer, pp 730–746
Ji Z, Cui B, Yu Y, Pang Y, Zhang Z (2021) Zero-shot classification with unseen prototype learning. Neural Comput & Applic, 1–11
Mancini M, Naeem MF, Xian Y, Akata Z (2021) Open world compositional zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5222–5230
Li K, Min MR, Fu Y (2019) Rethinking zero-shot learning: a conditional visual classification perspective. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3583–3592
Kodirov E, Xiang T, Gong S (2017) Semantic autoencoder for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3174– 3183
Xu W, Xian Y, Wang J, Schiele B, Akata Z (2020) Attribute prototype network for zero-shot learning. arXiv e-prints, 2008
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1):1929–1958
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: 3Rd international conference on learning representations, ICLR 2015
Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado GS, Dean J (2014) Zero-shot learning by convex combination of semantic embeddings. In: 2Nd international conference on learning representations, ICLR 2014
Changpinyo S, Chao W.-L., Sha F (2017) Predicting visual exemplars of unseen classes for zero-shot learning. In: Proceedings of the IEEE international conference on computer vision, pp 3476–3485
Changpinyo S, Chao W-L, Gong B, Sha F (2016) Synthesized classifiers for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5327–5336
Liu S, Chen J, Pan L, Ngo C-W, Chua T-S, Jiang Y-G (2020) Hyperbolic visual embedding learning for zero-shot recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9273–9281
Zhu Y, Long Y, Guan Y, Newsam S, Shao L (2018) Towards universal representation for unseen action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9436–9445
Long J, Zhang S, Li C (2019) Evolving deep echo state networks for intelligent fault diagnosis. IEEE Transactions on Industrial Informatics 16(7):4928–4937
Acknowledgements
The work is supported by the National Natural Science Foundation of China(No. 62172022), Beijing Natural Science Foundation(4202003).
Funding
National Natural Science Foundation of China(No. 61772049), Beijing Natural Science Foundation (4202003).
Author information
Authors and Affiliations
Contributions
Conceptualization, Dehui Kong, Xiliang Li; Methodology Dehui Kong, Xiliang Li; Software, Xiliang Li; Validation, Shaofan Wang; Formal analysis, Jinghua Li; Resources, Baocai Yin;Writing - Original Draft, Xiliang Li; Writing - Review & Editing, Dehui Kong, Xiliang Li.
Corresponding authors
Ethics declarations
Conflict of Interests
The authors declare that they have no any conflicts of interest about this work.
Additional information
Dehui Kong and Xiliang Li These authors contributed equally to this work.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kong, D., Li, X., Wang, S. et al. Learning visual-and-semantic knowledge embedding for zero-shot image classification. Appl Intell 53, 2250–2264 (2023). https://doi.org/10.1007/s10489-022-03443-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03443-1