Abstract
Recently, Zero-Shot Learning (ZSL) has gained great attention due to its significant classification performance for novel unobserved classes. As seen and unseen classes are completely disjoint, the current ZSL methods inevitably suffer from the domain shift problem when transferring the knowledge between the observed and unseen classes. Additionally, most ZSL methods especially those targeting the semantic space may cause the hubness problem due to their use of nearest-neighbor classifiers in high-dimensional space. To tackle these issues, we propose a novel pathway termed Regularized Label Relaxation-based Stacked Autoencoder (RLRSA) to diminish the domain difference between seen and unseen classes by exploiting an effective label space, which has some notable advantages. First, the proposed method establishes the tight relations among the visual representation, semantic information and label space using via the stacked autoencoder, which is beneficial for avoiding the projection domain shift. Second, by incorporating a slack variable matrix into the label space, our RLRSA method has more freedom to fit the test samples whether they come from the observed or unseen classes, resulting in a very robust and discriminative projection. Third, we construct a manifold regularization based on a class compactness graph to further reduce the domain gap between the seen and unseen classes. Finally, the learned projection is utilized to predict the class label of the target sample, thus the hubness issue can be prevented. Extensive experiments conducted on benchmark datasets clearly show that our RLRSA method produces new state-of-the-art results under two standard ZSL settings. For example, the RLRSA obtains the highest average accuracy of 67.82% on five benchmark datasets under the pure ZSL setting. For the generalized ZSL task, the proposed RLRSA is still highly effective, e.g., it achieves the best H result of 58.9% on the AwA2 dataset.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The data used during this study are public datasets, which can be obtained directly from the references or provided as required.
References
H. Touvron, M. Cord, A. Sablayrolles, G. Synnaeve, H. Jégou, Going deeper with image transformers, in: Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 32–42
Wang P, Fan E, Wang P (2021) Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recogn. Lett. 141:61–67
Wei W, Zheng VW, Han Y, Miao C (2019) A survey of zero-shot learning: Settings, methods, and applications. ACM Trans. Intell. Syst. Technol. 10(2):1–37
R. Socher, M. Ganjoo, C. D. Manning, A. Ng, Zero-shot learning through cross-modal transfer, in: Advances in Neural Information Processing Systems, 2013, pp. 935–943
A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, T. Mikolov, Devise: a deep visual-semantic embedding model, in: Advances in Neural Information Processing Systems, 2013, pp. 2121–2129
J. Li, M. Jing, K. Lu, Z. Ding, L. Zhu, Z. Huang, Leveraging the invariant side of generative zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 7402–7411
Zhang R, Zhu Q, Xu X, Zhang D, Huang S-J (2021) Visual-guided attentive attributes embedding for zero-shot learning. Neural Networks 143:709–718
E. Kodirov, T. Xiang, S. Gong, Semantic autoencoder for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4447–4456
Liu Y, Gao X, Gao Q, Han J, Shao L (2020) Label-activating framework for zero-shot learning. Neural Networks 121:1–9
Ji Z, Wang J, Yu Y, Pang Y, Han J (2019) Class-specific synthesized dictionary model for zero-shot learning. Neurocomputing 329:339–347
M. Bucher, S. Herbin, F. Jurie, Improving semantic embedding consistency by metric learning for zero-shot classiffication, in: Proceedings of the European Conference on Computer Vision, 2016, pp. 730–746
Pan C, Huang J, Hao J, Gong J (2020) Towards zero-shot learning generalization via a cosine distance loss. Neurocomputing 381:167–176
A. Mishra, S. Krishna Reddy, A. Mittal, H. A. Murthy, A generative model for zero shot learning using conditional variational autoencoders, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2188–2196
Y. Xian, T. Lorenz, B. Schiele, Z. Akata, Feature generating networks for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5542–5551
Ma Y, Xu X, Shen F, Shen HT (2020) Similarity preserving feature generating networks for zero-shot learning. Neurocomputing 406:333–342
W. Wang, Y. Pu, V. K. Verma, K. Fan, Y. Zhang, C. Chen, P. Rai, L. Carin, Zero-shot learning via class-conditioned deep generative models, in: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018, pp. 4211–4218
M. Ye, Y. Guo, Progressive ensemble networks for zero-shot recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11728–11736
J. Song, C. Shen, Y. Yang, Y. Liu, M. Song, Transductive unbiased embedding for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1024–1033
S. Changpinyo, W.-L. Chao, B. Gong, F. Sha, Synthesized classifiers for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5327–5336
Guan J, Lu Z, Xiang T, Li A, Zhao A, Wen J-R (2021) Zero and few shot learning with semantic feature synthesis and competitive learning. IEEE Trans. Pattern Anal. Mach. Intell. 43(7):2510–2523
Y. Liu, Q. Gao, J. Han, S. Wang, X. Gao, Graph and autoencoder based feature extraction for zero-shot learning, in: Proceedings of the International Joint Conference on Artificial Intelligence, 2019, pp. 15–36
Wu H, Yan Y, Chen S, Huang X, Wu Q, Ng MK (2021) Joint visual and semantic optimization for zero-shot learning. Knowl. Based Syst. 215:106773
B. Romera-Paredes, P. Torr, An embarrassingly simple approach to zero-shot learning, in: Proceedings of the International Conference on Machine Learning, 2015, pp. 2152–2161
Z. Akata, S. Reed, D. Walter, H. Lee, B. Schiele, Evaluation of output embeddings for fine-grained image classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2927–2936
Xian Y, Schiele B, Akata Z (2019) Zero-shot learning-the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9):2251–2265
Guo J, Guo S (2021) A novel perspective to zero-shot learning: Towards an alignment of manifold structures via semantic feature expansion. IEEE Trans. Multim. 23:524–537
L. Zhang, T. Xiang, S. Gong, et al., Learning a deep embedding model for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3010–3019
Pan C, Huang J, Hao J, Gong J (2020) Towards zero-shot learning generalization via a cosine distance loss. Neurocomputing 381:167–176
Shen F, Zhou X, Yu J, Yang Y, Liu L, Shen HT (2019) Scalable zero-shot learning via binary visual-semantic embeddings. IEEE Trans. Image Process. 28(7):3662–3674
Gao R, Hou X, Qin J, Chen J, Liu L, Zhu F, Zhang Z, Shao L (2020) Zero-vae-gan: Generating unseen features for generalized and transductive zero-shot learning. IEEE Trans. Image Process. 29:3665–3680
Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans. Neural Netw. Learn. Syst. 23(11):1738–1754
Han N, Wu J, Fang X, Wong WK, Xu Y, Yang J, Li X (2020) Double relaxed regression for image classification. IEEE Trans. Circuits Syst. Video Technol. 30(2):307–319
Ma J, Zhou S (2022) Discriminative least squares regression for multiclass classification based on within-class scatter minimization. Appl. Intell. 52(1):622–635
Han H, Li W, Wang J, Qin G, Qin X (2022) Enhance explainability of manifold learning. Neurocomputing 500:877–895
Bartels RH, Stewart GW (1972) Solution of the matrix equation ax+xb=c [f4]. Commun. ACM 15(9):820–826
J. Song, G. Shi, X. Xie, D. Gao, Zero-shot learning using stacked autoencoder with manifold regularizations, in: Proceedings of the IEEE International Conference on Image Processing, 2019, pp. 3651–3655
Luo X, Wu H, Wang Z, Wang J, Meng D (2022) A novel approach to large-scale dynamically weighted directed network representation. IEEE Trans. Pattern Anal. Mach. Intell. 44(12):9756–9773
Han D, Sun D, Zhang L (2018) Linear rate convergence of the alternating direction method of multipliers for convex composite programming. Math. Oper. Res. 43(2):622–637
A. Farhadi, I. Endres, D. Hoiem, D. Forsyth, Describing objects by their attributes, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1778–1785
C. Wah, S. Branson, P. Welinder, P. Perona, S. Belongie, The caltech-ucsd birds-200-2011 dataset, Tech. rep. (2011)
Patterson G, Xu C, Su H, Hays J (2014) The sun attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision 108:59–81
Yang H, Sun B, Li B, Yang C, Wang Z, Chen J, Wang L, Li H (2023) Iterative class prototype calibration for transductive zero-shot learning. IEEE Trans. Circuits Syst. Video Technol. 33(3):1236–1246
Long T, Xu X, Shen F, Liu L, Xie N, Yang Y (2018) Zero-shot learning via discriminative representation extraction. Pattern Recogn. Lett. 109:27-34
V. K. Verma, G. Arora, A. Mishra, P. Rai, Generalized zero-shot learning via synthesized examples, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4281–4289
Rahman S, Khan S, Porikli F (2018) A unified approach for conventional zero-shot, generalized zero-shot, and few-shot learning. IEEE Trans. Image Process. 27(11):5652–5667
Yu Y, Zhong J, Li X, Guo J, Zhang Z, Ling H, Wu F (2018) Transductive zero-shot learning with a self-training dictionary approach, IEEE Trans. Syst. Man. Cybern. B Cybern. 48(10):2908–2919
Y. Guo, G. Ding, X. Jin, J. Wang, Transductive zero-shot recognition via shared model space learning, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp. 3494–3500
Yu Y, Ji Z, Guo J, Pang Y (2018) Transductive zero-shot learning with adaptive structural embedding. IEEE Trans. Neural Netw. Learn. Syst. 29(9):4116–4127
V. K. Verma, P. Rai, A simple exponential family framework for zero-shot learning, in: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017, pp. 792–808
Ji Z, Sun Y, Yu Y, Guo J, Pang Y (2018) Semantic softmax loss for zero-shot learning. Neurocomputing 316:369–375
Yu Y, Ji Z, Guo J, Zhang Z (2019) Zero-shot learning via latent space encoding. IEEE Trans. Cybern. 49(10):3755–3766
E. Kodirov, T. Xiang, Z. Fu, S. Gong, Unsupervised domain adaptation for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision, 2015, pp. 2452–2460
Lampert CH, Nickisch H, Harmeling S (2014) Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3):453–465
Y. Zhu, M. Elhoseiny, B. Liu, X. Peng, A. Elgammal, A generative adversarial approach for zero-shot learning from noisy texts, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1004–1013
L. v. d. Maaten, G. Hinton, Visualizing data using t-sne, Journal of Machine Learning Research 9 (11) (2008) 2579-2605
Acknowledgements
This paper is supported by the National Natural Science Foundation of China (No. 61876139), the Project of Science and Technology of Henan (No. 212102310383), the Key Technologies R & D Program of Anyang (No. 2022C01SF112), the Research Foundation of Anyang Institute of Technology (No. YPY2021007), and the Research Start-up Foundation of Dr. Song Jianqiang (No. BSJ2022026).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they do have not any pertinent or potential conflicts which could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Song, J., Zhao, H., Wei, X. et al. Regularized label relaxation-based stacked autoencoder for zero-shot learning. Appl Intell 53, 22348–22362 (2023). https://doi.org/10.1007/s10489-023-04686-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04686-2