Skip to main content
Log in

Hierarchical contrastive representation for zero shot learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Zero-shot learning aims to identify unseen (novel) objects, using only labeled samples from seen (base) classes. Existing methods usually learn visual-semantic interactions or generate absent visual features of unseen classes to compensate for the data imbalance problem. However, existing methods ignore the representation quality of visual-semantic pairs, resulting in unsatisfactory alignment and prediction bias. To tackle these issues, we propose a Hierarchical Contrastive Representation learning paradigm, termed HCR, which fully exploits model representation capability and discriminative information. Specifically, we first propose a contrastive embedding, which preserves not only high quality representations but also discriminative enough information from class-level and instance-level supervision. Then, we introduce a regressor by valuable prior knowledge for conducting more desirable visual-semantic alignment for unseen classes. A pluggable calibrator is also aggregated to further alleviate prediction bias in contrastive embedding. Extensive experiments show that the proposed HCR can significantly outperform the state-of-the-arts on popular benchmarks under ZSL and challenging GZSL settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability and access

The datasets generated or analyzed during this study are available in the [7], https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/zero-shot-learning/zero-shot-learning-the-good-the-bad-and-the-ugly.

References

  1. Lu Z, Yu Y, Lu Z-M, Shen F-L, Zhang Z (2020) Attentive semantic preservation network for zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 682–683

  2. Lu Z, Lu Z, Yu Y, Wang Z (2022) Learn more from less: generalized zero-shot learning with severely limited labeled data. Neurocomputing 477:25–35

    Article  Google Scholar 

  3. Ou G, Yu G, Domeniconi C, Lu X, Zhang X (2020) Multi-label zero-shot learning with graph convolutional networks. Neural Netw 132:333–341

    Article  Google Scholar 

  4. Xian Y, Lorenz T, Schiele B, Akata Z (2018) Feature generating networks for zero-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 5542–5551

  5. Li J, Jing M, Lu K, Ding Z, Zhu L, Huang Z (2019) Leveraging the invariant side of generative zero-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 7402–7411

  6. Xu B, Zeng Z, Lian C, Ding Z (2022) Generative mixup networks for zero-shot learning. IEEE Trans Neural Netw Learn Syst

  7. Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans Pattern Anal Mach Intell 41(9):2251–2265

    Article  Google Scholar 

  8. Min S, Yao H, Xie H, Wang C, Zha Z-J, Zhang Y (2020) Domain-aware visual bias eliminating for generalized zero-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 12664–12673

  9. Zhang L, Xiang T, Gong S (2017) Learning a deep embedding model for zero-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 2021–2030

  10. Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, pp 1597–1607. PMLR

  11. Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. Adv Neural Inf Process Syst 33:18661–18673

    Google Scholar 

  12. Chen X, Fan H, Girshick R, He K (2020) Improved baselines with momentum contrastive learning. Preprint at arXiv:2003.04297

  13. Ye H-J, Ming L, Zhan D-C, Chao W-L (2022) Few-shot learning with a strong teacher. IEEE Trans Pattern Anal Mach Intell

  14. Zhang J, Gao L, Luo X, Shen H, Song J (2023) Deta: Denoised task adaptation for few-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 11541–11551

  15. Wu J, Zhang Y, Sun S, Li Q, Zhao X (2022) Generalized zero-shot emotion recognition from body gestures. Appl Intell 1–19

  16. Kumar Verma V, Arora G, Mishra A, Rai P (2018) Generalized zero-shot learning via synthesized examples. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 4281–4289

  17. Gao R, Hou X, Qin J, Chen J, Liu L, Zhu F, Zhang Z, Shao L (2020) Zero-vae-gan: generating unseen features for generalized and transductive zero-shot learning. IEEE Trans Image Process 29:3665–3680

    Article  Google Scholar 

  18. Han Z, Fu Z, Yang J (2020) Learning the redundancy-free features for generalized zero-shot object recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12865–12874

  19. Huang H, Wang C, Yu PS, Wang C-D (2019) Generative dual adversarial network for generalized zero-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 801–810

  20. Li Y, Liu Z, Yao L, Wang X, McAuley J, Chang X (2022) An entropy-guided reinforced partial convolutional network for zero-shot learning. IEEE Trans Circuits Syst Video Technol 32(8):5175–5186

    Article  Google Scholar 

  21. Ji Z, Wang Q, Cui B, Pang Y, Cao X, Li X (2021) A semi-supervised zero-shot image classification method based on soft-target. Neural Netw 143:88–96

    Article  Google Scholar 

  22. Akata Z, Perronnin F, Harchaoui Z, Schmid C (2013) Label-embedding for attribute-based classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 819–826

  23. Akata Z, Reed S, Walter D, Lee H, Schiele B (2015) Evaluation of output embeddings for fine-grained image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2927–2936

  24. Sung F, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1199–1208

  25. Zhang L, Wang P, Liu L, Shen C, Wei W, Zhang Y, Van Den Hengel A (2020) Towards effective deep embedding for zero-shot learning. IEEE Trans Circuits Syst Video Technol 30(9):2843–2852

    Article  Google Scholar 

  26. Zhu Y, Elhoseiny M, Liu B, Peng X, Elgammal A (2018) A generative adversarial approach for zero-shot learning from noisy texts. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1004–1013

  27. Schonfeld E, Ebrahimi S, Sinha S, Darrell T, Akata Z (2019) Generalized zero-and few-shot learning via aligned variational autoencoders. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 8247–8255

  28. He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738

  29. Li J, Wei Y, Wang C, Hu Q, Liu Y, Xu L (2022) 3-d cnn-based multichannel contrastive learning for alzheimer’s disease automatic diagnosis. IEEE Trans Instrum Meas 71:1–11

    Article  Google Scholar 

  30. Han Z, Fu Z, Chen S, Yang J (2021) Contrastive embedding for generalized zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2371–2381

  31. Cheng D, Wang G, Wang N, Zhang D, Zhang Q, Gao X (2023) Discriminative and robust attribute alignment for zero-shot learning. IEEE Trans Circuits Syst Video Technol

  32. Zhu F, Zhang W, Chen X, Gao X, Ye N (2023) Large margin distribution multi-class supervised novelty detection. Expert Syst Appl 224:119937

    Article  Google Scholar 

  33. Hendrycks D, Gimpel K (2016) A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: International conference on learning representations

  34. Zhang J, Gao L, Hao B, Huang H, Song J, Shen H (2023) From global to local: Multi-scale out-of-distribution detection. IEEE Trans Image Process

  35. Yang J, Zhou K, Liu Z (2023) Full-spectrum out-of-distribution detection. Int J Comput Vis 1–16

  36. Socher R, Ganjoo M, Manning CD, Ng A (2013) Zero-shot learning through cross-modal transfer. In: Advances in neural information processing systems (NeurIPS), pp 935–943

  37. Chao W-L, Changpinyo S, Gong B, Sha F (2016) An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pp. 52–68. Springer

  38. Atzmon Y, Chechik G (2019) Adaptive confidence smoothing for generalized zero-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 11671–11680

  39. Chen X, Lan X, Sun F, Zheng N (2020) A boundary based out-of-distribution classifier for generalized zero-shot learning. In: European conference on computer vision (ECCV), pp 572–588

  40. Su H, Li J, Chen Z, Zhu L, Lu K (2022) Distinguishing unseen from seen for generalized zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7885–7894

  41. Mettes P, Pol E, Snoek C (2019) Hyperspherical prototype networks. Adv Neural Inf Process Syst 32

  42. Wang T, Isola P (2020) Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International conference on machine learning, pp. 9929–9939. PMLR

  43. Borodachov SV, Hardin DP, Saff EB (2019) Discrete energy on rectifiable sets. 3

  44. Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1778–1785

  45. Nilsback M-E, Zisserman A (2008) Automated flower classification over a large number of classes. In: Indian conference on computer vision, graphics & image processing, pp 722–729

  46. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset

  47. Felix R, Kumar VB, Reid I, Carneiro G (2018) Multi-modal cycle-consistent generalized zero-shot learning. In: European conference on computer vision (ECCV), pp 21–37

  48. Li Q, Hou M, Lai H, Yang M (2022) Cross-modal distribution alignment embedding network for generalized zero-shot learning. Neural Netw 148:176–182

    Article  Google Scholar 

  49. Annadani Y, Biswas S (2018) Preserving semantic relations for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7603–7612

  50. Zhang R, Zhu Q, Xu X, Zhang D, Huang S-J (2021) Visual-guided attentive attributes embedding for zero-shot learning. Neural Netw 143:709–718

    Article  Google Scholar 

  51. Changpinyo S, Chao W-L, Gong B, Sha F (2020) Classifier and exemplar synthesis for zero-shot learning. Int J Comput Vis 128:166–201

    Article  MathSciNet  Google Scholar 

  52. Gao R, Hou X, Qin J, Shen Y, Long Y, Liu L, Zhang Z, Shao L (2022) Visual-semantic aligned bidirectional network for zero-shot learning. IEEE Trans Multimedia

  53. Li Y, Liu Z, Yao L, Chang X (2021) Attribute-modulated generative meta learning for zero-shot learning. IEEE Trans Multimedia 25:1600–1610

    Article  Google Scholar 

  54. Chen Z, Huang Y, Chen J, Geng Y, Zhang W, Fang Y, Pan JZ, Chen H (2023) Duet: Cross-modal semantic grounding for contrastive zero-shot learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 405–413

  55. Cheng D, Wang G, Wang B, Zhang Q, Han J, Zhang D (2023) Hybrid routing transformer for zero-shot learning. Pattern Recognit 137:109270

    Article  Google Scholar 

  56. Han Z, Fu Z, Li G, Yang J (2021) Inference guided feature generation for generalized zero-shot learning. Neurocomputing 430:150–158

    Article  Google Scholar 

  57. Chen L, Zhang H, Xiao J, Liu W, Chang S-F (2018) Zero-shot visual recognition using semantics-preserving adversarial embedding networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1043–1052

  58. Chen S, Xie G, Liu Y, Peng Q, Sun B, Li H, You X, Shao L (2021) Hsva: Hierarchical semantic-visual adaptation for zero-shot learning. Adv Neural Inf Process Syst 34:16622–16634

    Google Scholar 

  59. Xian Y, Sharma S, Schiele B, Akata Z (2019) f-vaegan-d2: A feature generating framework for any-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 10275–10284

  60. Ding B, Fan Y, He Y, Zhao J (2023) Enhanced vaegan: a zero-shot image classification method. Appl Intell 53(8):9235–9246

    Article  Google Scholar 

  61. Yun Y, Wang S, Hou M, Gao Q (2022) Attributes learning network for generalized zero-shot learning. Neural Netw 150:112–118

    Article  Google Scholar 

  62. Li K, Min MR, Fu Y (2019) Rethinking zero-shot learning: A conditional visual classification perspective. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 3583–3592

  63. Shen J, Xiao Z, Zhen X, Zhang L (2021) Spherical zero-shot learning. IEEE Trans Circuits Syst Video Technol 32(2):634–645

    Article  Google Scholar 

  64. Huynh D, Elhamifar E (2020) Fine-grained generalized zero-shot learning via dense attribute-based attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4483–4493

  65. Li X, Xu Z, Wei K, Deng C (2021) Generalized zero-shot learning via disentangled representation. In: the Association for the advancement of artificial intelligence (AAAI), vol 35, pp 1966–1974

  66. Chen S, Hong Z, Liu Y, Xie G-S, Sun B, Li H, Peng Q, Lu K, You X (2022) Transzero: attribute-guided transformer for zero-shot learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 36, pp 330–338

  67. Chen S, Hong Z, Xie G-S, Yang W, Peng Q, Wang K, Zhao J, You X (2022) Msdn: Mutually semantic distillation network for zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7612–7621

  68. Li Z, Chen Q, Liu Q (2021) Augmented semantic feature based generative network for generalized zero-shot learning. Neural Netw 143:1–11

    Article  Google Scholar 

  69. Chen S, Wang W, Xia B, Peng Q, You X, Zheng F, Shao L (2021) Free: Feature refinement for generalized zero-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 122–131

  70. Yue Z, Wang T, Sun Q, Hua X-S, Zhang H (2021) Counterfactual zero-shot and open-set visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15404–15414

  71. Romera-Paredes B, Torr P (2015) An embarrassingly simple approach to zero-shot learning. In: International conference on machine learning (ICML), pp 2152–2161

  72. Kwon G, Al Regib G (2022) A gating model for bias calibration in generalized zero-shot learning. IEEE Trans Image Process

Download references

Funding

As for funding, this work was supported in part by the National Key Research and Development Program of China under Grant No.2020AAA0140004 and in part by the China Postdoctoral Science Foundation under Grant No. 2022M712792. This work was also partially supported by Ningbo Science and Technology Innovation 2025 major project under grants 2020Z106 and 2023Z040.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization and Methodology were performed by Ziqian Lu. Software and Programming were designed by Zewei He. Validation was performed by Xuecheng Sun. Formal analysis and writing were performed by Hao Luo. Supervision was performed by Yangming Zheng. Funding came from Zheming Lu and Zewei He.

Corresponding authors

Correspondence to Zheming Lu, Hao Luo or Yangming Zheng.

Ethics declarations

Ethical and informed consent for data used

Written informed consent for publication of this paper was obtained from the Zhejiang University and all authors. And this study did not involve human or animal subjects, and thus, no ethical approval was required. The study protocol adhered to the guidelines established by the journal.

Competing Interests

All authors come from school of aeronautics and astronautics, Zhejiang University. The authors declare no potential conflicts of interest with respect to the research, author- ship, and publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, Z., Lu, Z., He, Z. et al. Hierarchical contrastive representation for zero shot learning. Appl Intell 54, 9213–9229 (2024). https://doi.org/10.1007/s10489-024-05531-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05531-w

Keywords