Abstract
Zero-shot learning aims to identify unseen (novel) objects, using only labeled samples from seen (base) classes. Existing methods usually learn visual-semantic interactions or generate absent visual features of unseen classes to compensate for the data imbalance problem. However, existing methods ignore the representation quality of visual-semantic pairs, resulting in unsatisfactory alignment and prediction bias. To tackle these issues, we propose a Hierarchical Contrastive Representation learning paradigm, termed HCR, which fully exploits model representation capability and discriminative information. Specifically, we first propose a contrastive embedding, which preserves not only high quality representations but also discriminative enough information from class-level and instance-level supervision. Then, we introduce a regressor by valuable prior knowledge for conducting more desirable visual-semantic alignment for unseen classes. A pluggable calibrator is also aggregated to further alleviate prediction bias in contrastive embedding. Extensive experiments show that the proposed HCR can significantly outperform the state-of-the-arts on popular benchmarks under ZSL and challenging GZSL settings.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability and access
The datasets generated or analyzed during this study are available in the [7], https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/zero-shot-learning/zero-shot-learning-the-good-the-bad-and-the-ugly.
References
Lu Z, Yu Y, Lu Z-M, Shen F-L, Zhang Z (2020) Attentive semantic preservation network for zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 682–683
Lu Z, Lu Z, Yu Y, Wang Z (2022) Learn more from less: generalized zero-shot learning with severely limited labeled data. Neurocomputing 477:25–35
Ou G, Yu G, Domeniconi C, Lu X, Zhang X (2020) Multi-label zero-shot learning with graph convolutional networks. Neural Netw 132:333–341
Xian Y, Lorenz T, Schiele B, Akata Z (2018) Feature generating networks for zero-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 5542–5551
Li J, Jing M, Lu K, Ding Z, Zhu L, Huang Z (2019) Leveraging the invariant side of generative zero-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 7402–7411
Xu B, Zeng Z, Lian C, Ding Z (2022) Generative mixup networks for zero-shot learning. IEEE Trans Neural Netw Learn Syst
Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans Pattern Anal Mach Intell 41(9):2251–2265
Min S, Yao H, Xie H, Wang C, Zha Z-J, Zhang Y (2020) Domain-aware visual bias eliminating for generalized zero-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 12664–12673
Zhang L, Xiang T, Gong S (2017) Learning a deep embedding model for zero-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 2021–2030
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, pp 1597–1607. PMLR
Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D (2020) Supervised contrastive learning. Adv Neural Inf Process Syst 33:18661–18673
Chen X, Fan H, Girshick R, He K (2020) Improved baselines with momentum contrastive learning. Preprint at arXiv:2003.04297
Ye H-J, Ming L, Zhan D-C, Chao W-L (2022) Few-shot learning with a strong teacher. IEEE Trans Pattern Anal Mach Intell
Zhang J, Gao L, Luo X, Shen H, Song J (2023) Deta: Denoised task adaptation for few-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 11541–11551
Wu J, Zhang Y, Sun S, Li Q, Zhao X (2022) Generalized zero-shot emotion recognition from body gestures. Appl Intell 1–19
Kumar Verma V, Arora G, Mishra A, Rai P (2018) Generalized zero-shot learning via synthesized examples. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 4281–4289
Gao R, Hou X, Qin J, Chen J, Liu L, Zhu F, Zhang Z, Shao L (2020) Zero-vae-gan: generating unseen features for generalized and transductive zero-shot learning. IEEE Trans Image Process 29:3665–3680
Han Z, Fu Z, Yang J (2020) Learning the redundancy-free features for generalized zero-shot object recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12865–12874
Huang H, Wang C, Yu PS, Wang C-D (2019) Generative dual adversarial network for generalized zero-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 801–810
Li Y, Liu Z, Yao L, Wang X, McAuley J, Chang X (2022) An entropy-guided reinforced partial convolutional network for zero-shot learning. IEEE Trans Circuits Syst Video Technol 32(8):5175–5186
Ji Z, Wang Q, Cui B, Pang Y, Cao X, Li X (2021) A semi-supervised zero-shot image classification method based on soft-target. Neural Netw 143:88–96
Akata Z, Perronnin F, Harchaoui Z, Schmid C (2013) Label-embedding for attribute-based classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 819–826
Akata Z, Reed S, Walter D, Lee H, Schiele B (2015) Evaluation of output embeddings for fine-grained image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2927–2936
Sung F, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1199–1208
Zhang L, Wang P, Liu L, Shen C, Wei W, Zhang Y, Van Den Hengel A (2020) Towards effective deep embedding for zero-shot learning. IEEE Trans Circuits Syst Video Technol 30(9):2843–2852
Zhu Y, Elhoseiny M, Liu B, Peng X, Elgammal A (2018) A generative adversarial approach for zero-shot learning from noisy texts. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1004–1013
Schonfeld E, Ebrahimi S, Sinha S, Darrell T, Akata Z (2019) Generalized zero-and few-shot learning via aligned variational autoencoders. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 8247–8255
He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9729–9738
Li J, Wei Y, Wang C, Hu Q, Liu Y, Xu L (2022) 3-d cnn-based multichannel contrastive learning for alzheimer’s disease automatic diagnosis. IEEE Trans Instrum Meas 71:1–11
Han Z, Fu Z, Chen S, Yang J (2021) Contrastive embedding for generalized zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2371–2381
Cheng D, Wang G, Wang N, Zhang D, Zhang Q, Gao X (2023) Discriminative and robust attribute alignment for zero-shot learning. IEEE Trans Circuits Syst Video Technol
Zhu F, Zhang W, Chen X, Gao X, Ye N (2023) Large margin distribution multi-class supervised novelty detection. Expert Syst Appl 224:119937
Hendrycks D, Gimpel K (2016) A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: International conference on learning representations
Zhang J, Gao L, Hao B, Huang H, Song J, Shen H (2023) From global to local: Multi-scale out-of-distribution detection. IEEE Trans Image Process
Yang J, Zhou K, Liu Z (2023) Full-spectrum out-of-distribution detection. Int J Comput Vis 1–16
Socher R, Ganjoo M, Manning CD, Ng A (2013) Zero-shot learning through cross-modal transfer. In: Advances in neural information processing systems (NeurIPS), pp 935–943
Chao W-L, Changpinyo S, Gong B, Sha F (2016) An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pp. 52–68. Springer
Atzmon Y, Chechik G (2019) Adaptive confidence smoothing for generalized zero-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 11671–11680
Chen X, Lan X, Sun F, Zheng N (2020) A boundary based out-of-distribution classifier for generalized zero-shot learning. In: European conference on computer vision (ECCV), pp 572–588
Su H, Li J, Chen Z, Zhu L, Lu K (2022) Distinguishing unseen from seen for generalized zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7885–7894
Mettes P, Pol E, Snoek C (2019) Hyperspherical prototype networks. Adv Neural Inf Process Syst 32
Wang T, Isola P (2020) Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International conference on machine learning, pp. 9929–9939. PMLR
Borodachov SV, Hardin DP, Saff EB (2019) Discrete energy on rectifiable sets. 3
Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1778–1785
Nilsback M-E, Zisserman A (2008) Automated flower classification over a large number of classes. In: Indian conference on computer vision, graphics & image processing, pp 722–729
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset
Felix R, Kumar VB, Reid I, Carneiro G (2018) Multi-modal cycle-consistent generalized zero-shot learning. In: European conference on computer vision (ECCV), pp 21–37
Li Q, Hou M, Lai H, Yang M (2022) Cross-modal distribution alignment embedding network for generalized zero-shot learning. Neural Netw 148:176–182
Annadani Y, Biswas S (2018) Preserving semantic relations for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7603–7612
Zhang R, Zhu Q, Xu X, Zhang D, Huang S-J (2021) Visual-guided attentive attributes embedding for zero-shot learning. Neural Netw 143:709–718
Changpinyo S, Chao W-L, Gong B, Sha F (2020) Classifier and exemplar synthesis for zero-shot learning. Int J Comput Vis 128:166–201
Gao R, Hou X, Qin J, Shen Y, Long Y, Liu L, Zhang Z, Shao L (2022) Visual-semantic aligned bidirectional network for zero-shot learning. IEEE Trans Multimedia
Li Y, Liu Z, Yao L, Chang X (2021) Attribute-modulated generative meta learning for zero-shot learning. IEEE Trans Multimedia 25:1600–1610
Chen Z, Huang Y, Chen J, Geng Y, Zhang W, Fang Y, Pan JZ, Chen H (2023) Duet: Cross-modal semantic grounding for contrastive zero-shot learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 405–413
Cheng D, Wang G, Wang B, Zhang Q, Han J, Zhang D (2023) Hybrid routing transformer for zero-shot learning. Pattern Recognit 137:109270
Han Z, Fu Z, Li G, Yang J (2021) Inference guided feature generation for generalized zero-shot learning. Neurocomputing 430:150–158
Chen L, Zhang H, Xiao J, Liu W, Chang S-F (2018) Zero-shot visual recognition using semantics-preserving adversarial embedding networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1043–1052
Chen S, Xie G, Liu Y, Peng Q, Sun B, Li H, You X, Shao L (2021) Hsva: Hierarchical semantic-visual adaptation for zero-shot learning. Adv Neural Inf Process Syst 34:16622–16634
Xian Y, Sharma S, Schiele B, Akata Z (2019) f-vaegan-d2: A feature generating framework for any-shot learning. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 10275–10284
Ding B, Fan Y, He Y, Zhao J (2023) Enhanced vaegan: a zero-shot image classification method. Appl Intell 53(8):9235–9246
Yun Y, Wang S, Hou M, Gao Q (2022) Attributes learning network for generalized zero-shot learning. Neural Netw 150:112–118
Li K, Min MR, Fu Y (2019) Rethinking zero-shot learning: A conditional visual classification perspective. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 3583–3592
Shen J, Xiao Z, Zhen X, Zhang L (2021) Spherical zero-shot learning. IEEE Trans Circuits Syst Video Technol 32(2):634–645
Huynh D, Elhamifar E (2020) Fine-grained generalized zero-shot learning via dense attribute-based attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4483–4493
Li X, Xu Z, Wei K, Deng C (2021) Generalized zero-shot learning via disentangled representation. In: the Association for the advancement of artificial intelligence (AAAI), vol 35, pp 1966–1974
Chen S, Hong Z, Liu Y, Xie G-S, Sun B, Li H, Peng Q, Lu K, You X (2022) Transzero: attribute-guided transformer for zero-shot learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 36, pp 330–338
Chen S, Hong Z, Xie G-S, Yang W, Peng Q, Wang K, Zhao J, You X (2022) Msdn: Mutually semantic distillation network for zero-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7612–7621
Li Z, Chen Q, Liu Q (2021) Augmented semantic feature based generative network for generalized zero-shot learning. Neural Netw 143:1–11
Chen S, Wang W, Xia B, Peng Q, You X, Zheng F, Shao L (2021) Free: Feature refinement for generalized zero-shot learning. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 122–131
Yue Z, Wang T, Sun Q, Hua X-S, Zhang H (2021) Counterfactual zero-shot and open-set visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15404–15414
Romera-Paredes B, Torr P (2015) An embarrassingly simple approach to zero-shot learning. In: International conference on machine learning (ICML), pp 2152–2161
Kwon G, Al Regib G (2022) A gating model for bias calibration in generalized zero-shot learning. IEEE Trans Image Process
Funding
As for funding, this work was supported in part by the National Key Research and Development Program of China under Grant No.2020AAA0140004 and in part by the China Postdoctoral Science Foundation under Grant No. 2022M712792. This work was also partially supported by Ningbo Science and Technology Innovation 2025 major project under grants 2020Z106 and 2023Z040.
Author information
Authors and Affiliations
Contributions
Conceptualization and Methodology were performed by Ziqian Lu. Software and Programming were designed by Zewei He. Validation was performed by Xuecheng Sun. Formal analysis and writing were performed by Hao Luo. Supervision was performed by Yangming Zheng. Funding came from Zheming Lu and Zewei He.
Corresponding authors
Ethics declarations
Ethical and informed consent for data used
Written informed consent for publication of this paper was obtained from the Zhejiang University and all authors. And this study did not involve human or animal subjects, and thus, no ethical approval was required. The study protocol adhered to the guidelines established by the journal.
Competing Interests
All authors come from school of aeronautics and astronautics, Zhejiang University. The authors declare no potential conflicts of interest with respect to the research, author- ship, and publication of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lu, Z., Lu, Z., He, Z. et al. Hierarchical contrastive representation for zero shot learning. Appl Intell 54, 9213–9229 (2024). https://doi.org/10.1007/s10489-024-05531-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05531-w