Skip to main content
Log in

Regularized label relaxation-based stacked autoencoder for zero-shot learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Recently, Zero-Shot Learning (ZSL) has gained great attention due to its significant classification performance for novel unobserved classes. As seen and unseen classes are completely disjoint, the current ZSL methods inevitably suffer from the domain shift problem when transferring the knowledge between the observed and unseen classes. Additionally, most ZSL methods especially those targeting the semantic space may cause the hubness problem due to their use of nearest-neighbor classifiers in high-dimensional space. To tackle these issues, we propose a novel pathway termed Regularized Label Relaxation-based Stacked Autoencoder (RLRSA) to diminish the domain difference between seen and unseen classes by exploiting an effective label space, which has some notable advantages. First, the proposed method establishes the tight relations among the visual representation, semantic information and label space using via the stacked autoencoder, which is beneficial for avoiding the projection domain shift. Second, by incorporating a slack variable matrix into the label space, our RLRSA method has more freedom to fit the test samples whether they come from the observed or unseen classes, resulting in a very robust and discriminative projection. Third, we construct a manifold regularization based on a class compactness graph to further reduce the domain gap between the seen and unseen classes. Finally, the learned projection is utilized to predict the class label of the target sample, thus the hubness issue can be prevented. Extensive experiments conducted on benchmark datasets clearly show that our RLRSA method produces new state-of-the-art results under two standard ZSL settings. For example, the RLRSA obtains the highest average accuracy of 67.82% on five benchmark datasets under the pure ZSL setting. For the generalized ZSL task, the proposed RLRSA is still highly effective, e.g., it achieves the best H result of 58.9% on the AwA2 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The data used during this study are public datasets, which can be obtained directly from the references or provided as required.

References

  1. H. Touvron, M. Cord, A. Sablayrolles, G. Synnaeve, H. Jégou, Going deeper with image transformers, in: Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 32–42

  2. Wang P, Fan E, Wang P (2021) Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recogn. Lett. 141:61–67

    Article  Google Scholar 

  3. Wei W, Zheng VW, Han Y, Miao C (2019) A survey of zero-shot learning: Settings, methods, and applications. ACM Trans. Intell. Syst. Technol. 10(2):1–37

    Article  Google Scholar 

  4. R. Socher, M. Ganjoo, C. D. Manning, A. Ng, Zero-shot learning through cross-modal transfer, in: Advances in Neural Information Processing Systems, 2013, pp. 935–943

  5. A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, T. Mikolov, Devise: a deep visual-semantic embedding model, in: Advances in Neural Information Processing Systems, 2013, pp. 2121–2129

  6. J. Li, M. Jing, K. Lu, Z. Ding, L. Zhu, Z. Huang, Leveraging the invariant side of generative zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 7402–7411

  7. Zhang R, Zhu Q, Xu X, Zhang D, Huang S-J (2021) Visual-guided attentive attributes embedding for zero-shot learning. Neural Networks 143:709–718

    Article  MATH  Google Scholar 

  8. E. Kodirov, T. Xiang, S. Gong, Semantic autoencoder for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4447–4456

  9. Liu Y, Gao X, Gao Q, Han J, Shao L (2020) Label-activating framework for zero-shot learning. Neural Networks 121:1–9

    Article  Google Scholar 

  10. Ji Z, Wang J, Yu Y, Pang Y, Han J (2019) Class-specific synthesized dictionary model for zero-shot learning. Neurocomputing 329:339–347

  11. M. Bucher, S. Herbin, F. Jurie, Improving semantic embedding consistency by metric learning for zero-shot classiffication, in: Proceedings of the European Conference on Computer Vision, 2016, pp. 730–746

  12. Pan C, Huang J, Hao J, Gong J (2020) Towards zero-shot learning generalization via a cosine distance loss. Neurocomputing 381:167–176

    Article  Google Scholar 

  13. A. Mishra, S. Krishna Reddy, A. Mittal, H. A. Murthy, A generative model for zero shot learning using conditional variational autoencoders, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2188–2196

  14. Y. Xian, T. Lorenz, B. Schiele, Z. Akata, Feature generating networks for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5542–5551

  15. Ma Y, Xu X, Shen F, Shen HT (2020) Similarity preserving feature generating networks for zero-shot learning. Neurocomputing 406:333–342

    Article  Google Scholar 

  16. W. Wang, Y. Pu, V. K. Verma, K. Fan, Y. Zhang, C. Chen, P. Rai, L. Carin, Zero-shot learning via class-conditioned deep generative models, in: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018, pp. 4211–4218

  17. M. Ye, Y. Guo, Progressive ensemble networks for zero-shot recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11728–11736

  18. J. Song, C. Shen, Y. Yang, Y. Liu, M. Song, Transductive unbiased embedding for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1024–1033

  19. S. Changpinyo, W.-L. Chao, B. Gong, F. Sha, Synthesized classifiers for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5327–5336

  20. Guan J, Lu Z, Xiang T, Li A, Zhao A, Wen J-R (2021) Zero and few shot learning with semantic feature synthesis and competitive learning. IEEE Trans. Pattern Anal. Mach. Intell. 43(7):2510–2523

    Article  Google Scholar 

  21. Y. Liu, Q. Gao, J. Han, S. Wang, X. Gao, Graph and autoencoder based feature extraction for zero-shot learning, in: Proceedings of the International Joint Conference on Artificial Intelligence, 2019, pp. 15–36

  22. Wu H, Yan Y, Chen S, Huang X, Wu Q, Ng MK (2021) Joint visual and semantic optimization for zero-shot learning. Knowl. Based Syst. 215:106773

    Article  Google Scholar 

  23. B. Romera-Paredes, P. Torr, An embarrassingly simple approach to zero-shot learning, in: Proceedings of the International Conference on Machine Learning, 2015, pp. 2152–2161

  24. Z. Akata, S. Reed, D. Walter, H. Lee, B. Schiele, Evaluation of output embeddings for fine-grained image classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2927–2936

  25. Xian Y, Schiele B, Akata Z (2019) Zero-shot learning-the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9):2251–2265

    Article  Google Scholar 

  26. Guo J, Guo S (2021) A novel perspective to zero-shot learning: Towards an alignment of manifold structures via semantic feature expansion. IEEE Trans. Multim. 23:524–537

    Article  MathSciNet  Google Scholar 

  27. L. Zhang, T. Xiang, S. Gong, et al., Learning a deep embedding model for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3010–3019

  28. Pan C, Huang J, Hao J, Gong J (2020) Towards zero-shot learning generalization via a cosine distance loss. Neurocomputing 381:167–176

    Article  Google Scholar 

  29. Shen F, Zhou X, Yu J, Yang Y, Liu L, Shen HT (2019) Scalable zero-shot learning via binary visual-semantic embeddings. IEEE Trans. Image Process. 28(7):3662–3674

    Article  MathSciNet  MATH  Google Scholar 

  30. Gao R, Hou X, Qin J, Chen J, Liu L, Zhu F, Zhang Z, Shao L (2020) Zero-vae-gan: Generating unseen features for generalized and transductive zero-shot learning. IEEE Trans. Image Process. 29:3665–3680

    Article  MATH  Google Scholar 

  31. Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans. Neural Netw. Learn. Syst. 23(11):1738–1754

    Article  Google Scholar 

  32. Han N, Wu J, Fang X, Wong WK, Xu Y, Yang J, Li X (2020) Double relaxed regression for image classification. IEEE Trans. Circuits Syst. Video Technol. 30(2):307–319

    Article  Google Scholar 

  33. Ma J, Zhou S (2022) Discriminative least squares regression for multiclass classification based on within-class scatter minimization. Appl. Intell. 52(1):622–635

    Article  Google Scholar 

  34. Han H, Li W, Wang J, Qin G, Qin X (2022) Enhance explainability of manifold learning. Neurocomputing 500:877–895

    Article  Google Scholar 

  35. Bartels RH, Stewart GW (1972) Solution of the matrix equation ax+xb=c [f4]. Commun. ACM 15(9):820–826

    Article  MATH  Google Scholar 

  36. J. Song, G. Shi, X. Xie, D. Gao, Zero-shot learning using stacked autoencoder with manifold regularizations, in: Proceedings of the IEEE International Conference on Image Processing, 2019, pp. 3651–3655

  37. Luo X, Wu H, Wang Z, Wang J, Meng D (2022) A novel approach to large-scale dynamically weighted directed network representation. IEEE Trans. Pattern Anal. Mach. Intell. 44(12):9756–9773

    Article  Google Scholar 

  38. Han D, Sun D, Zhang L (2018) Linear rate convergence of the alternating direction method of multipliers for convex composite programming. Math. Oper. Res. 43(2):622–637

    Article  MathSciNet  MATH  Google Scholar 

  39. A. Farhadi, I. Endres, D. Hoiem, D. Forsyth, Describing objects by their attributes, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1778–1785

  40. C. Wah, S. Branson, P. Welinder, P. Perona, S. Belongie, The caltech-ucsd birds-200-2011 dataset, Tech. rep. (2011)

  41. Patterson G, Xu C, Su H, Hays J (2014) The sun attribute database: Beyond categories for deeper scene understanding. International Journal of Computer Vision 108:59–81

    Article  Google Scholar 

  42. Yang H, Sun B, Li B, Yang C, Wang Z, Chen J, Wang L, Li H (2023) Iterative class prototype calibration for transductive zero-shot learning. IEEE Trans. Circuits Syst. Video Technol. 33(3):1236–1246

    Article  Google Scholar 

  43. Long T, Xu X, Shen F, Liu L, Xie N, Yang Y (2018) Zero-shot learning via discriminative representation extraction. Pattern Recogn. Lett. 109:27-34

    Article  Google Scholar 

  44. V. K. Verma, G. Arora, A. Mishra, P. Rai, Generalized zero-shot learning via synthesized examples, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4281–4289

  45. Rahman S, Khan S, Porikli F (2018) A unified approach for conventional zero-shot, generalized zero-shot, and few-shot learning. IEEE Trans. Image Process. 27(11):5652–5667

    Article  MathSciNet  Google Scholar 

  46. Yu Y, Zhong J, Li X, Guo J, Zhang Z, Ling H, Wu F (2018) Transductive zero-shot learning with a self-training dictionary approach, IEEE Trans. Syst. Man. Cybern. B Cybern. 48(10):2908–2919

    Google Scholar 

  47. Y. Guo, G. Ding, X. Jin, J. Wang, Transductive zero-shot recognition via shared model space learning, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp. 3494–3500

  48. Yu Y, Ji Z, Guo J, Pang Y (2018) Transductive zero-shot learning with adaptive structural embedding. IEEE Trans. Neural Netw. Learn. Syst. 29(9):4116–4127

    Article  Google Scholar 

  49. V. K. Verma, P. Rai, A simple exponential family framework for zero-shot learning, in: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017, pp. 792–808

  50. Ji Z, Sun Y, Yu Y, Guo J, Pang Y (2018) Semantic softmax loss for zero-shot learning. Neurocomputing 316:369–375

    Article  Google Scholar 

  51. Yu Y, Ji Z, Guo J, Zhang Z (2019) Zero-shot learning via latent space encoding. IEEE Trans. Cybern. 49(10):3755–3766

    Article  Google Scholar 

  52. E. Kodirov, T. Xiang, Z. Fu, S. Gong, Unsupervised domain adaptation for zero-shot learning, in: Proceedings of the IEEE Conference on Computer Vision, 2015, pp. 2452–2460

  53. Lampert CH, Nickisch H, Harmeling S (2014) Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3):453–465

    Article  Google Scholar 

  54. Y. Zhu, M. Elhoseiny, B. Liu, X. Peng, A. Elgammal, A generative adversarial approach for zero-shot learning from noisy texts, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1004–1013

  55. L. v. d. Maaten, G. Hinton, Visualizing data using t-sne, Journal of Machine Learning Research 9 (11) (2008) 2579-2605

Download references

Acknowledgements

This paper is supported by the National Natural Science Foundation of China (No. 61876139), the Project of Science and Technology of Henan (No. 212102310383), the Key Technologies R & D Program of Anyang (No. 2022C01SF112), the Research Foundation of Anyang Institute of Technology (No. YPY2021007), and the Research Start-up Foundation of Dr. Song Jianqiang (No. BSJ2022026).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianqiang Song.

Ethics declarations

Conflicts of interest

The authors declare that they do have not any pertinent or potential conflicts which could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, J., Zhao, H., Wei, X. et al. Regularized label relaxation-based stacked autoencoder for zero-shot learning. Appl Intell 53, 22348–22362 (2023). https://doi.org/10.1007/s10489-023-04686-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04686-2

Keywords

Navigation