Abstract
Generalized zero-shot learning (GZSL) can classify both seen and unseen class samples, which plays a significant role in practical applications such as emerging species recognition and medical image recognition. However, most existing GZSL methods directly use the pre-trained deep model to learn the image feature. Due to the data distribution inconsistency between the GZSL dataset and the pre-training dataset, the obtained image features have an inferior performance. The distribution of different class image features is similar, which makes them difficult to distinguish. To solve this problem, we propose a dual-path feature enhancement (DPFE) model, which consists of four modules: the feature generation network (FGN), the local fine-grained feature enhancement (LFFE) module, the global coarse-grained feature enhancement (GCFE) module, and the feedback module (FM). The feature generation network can synthesize unseen class image features. We enhance the image features’ discriminative and semantic relevance from both local and global perspectives. To focus on the image’s local discriminative regions, the LFFE module processes the image in blocks and minimizes the semantic cycle-consistency loss to ensure that the region block features contain key classification semantic information. To prevent information loss caused by image blocking, we design the GCFE module. It ensures the consistency between the global image features and the semantic centers, thereby improving the discriminative power of the features. In addition, the feedback module feeds the discriminator network’s middle layer information back to the generator network. As a result, the synthesized image features are more similar to the real features. Experimental results demonstrate that the proposed DPFE method outperforms the state-of-the-arts on four zero-shot learning benchmark datasets.













Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author, Zhen Wang, upon reasonable request.
References
Ding, F., Guo, B., Jia, X., Chi, H., Xu, W.: Improving GAN-based feature extraction for hyperspectral images classification. J. Electron. Imaging (2021). https://doi.org/10.1117/1.jei.30.6.063011
Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: AAAI. p. 3 (2008)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2009)
Chao, W.-L., Changpinyo, S., Gong, B., Sha, F.: An empirical study and analysis of generalized zero-shot learning for object recognition in the wild (2016)
An, R., Miao, Z., Li, Q., Xu, W., Zhang, Q.: Spatiotemporal visual-semantic embedding network for zero-shot action recognition. J. Electron. Imaging 28, 023007–023007 (2019)
Biswas, S., Annadani, Y.: Preserving semantic relations for zero-shot learning. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2018)
Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3174–3183 (2017)
Li, Q., Hou, M., Lai, H., Yang, M.: Cross-modal distribution alignment embedding network for generalized zero-shot learning. Neural Netw. 148, 176–182 (2022)
Min, S., Yao, H., Xie, H., Wang, C., Zha, Z.-J., Zhang, Y.: Domain-aware visual bias eliminating for generalized zero-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12664–12673 (2020)
Huynh, D., Elhamifar, E.: Fine-grained generalized zero-shot learning via dense attribute-based attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4483–4493 (2020)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Xian, Y., Sharma, S., Schiele, B., Akata, Z.: F-VAEGAN-D2: a feature generating framework for any-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10275–10284 (2019)
Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR 2011. pp. 1521–1528 (2011)
Xian, Y., Lorenz, T., Schiele, B., Akata, Z.: Feature generating networks for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5542–5551 (2018)
Vyas, M.R., Venkateswara, H., Panchanathan, S.: Leveraging seen and unseen semantic relationships for generative zero-shot learning. In: Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXX 16. pp. 70–86 (2020)
Zhang, J., Liao, S., Zhang, H., Long, Y., Zhang, Z., Liu, L.: Data driven recurrent generative adversarial network for generalized zero shot image classification. Inf. Sci. 625, 536–552 (2023)
Xie, G.-S., Zhang, Z., Liu, G., Zhu, F., Liu, L., Shao, L., Li, X.: Generalized zero-shot learning with multiple graph adaptive generative networks. IEEE Trans. Neural Netw. Learn. Syst. 33, 2903–2915 (2021)
Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., Akata, Z.: Generalized zero-and few-shot learning via aligned variational autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8247–8255 (2019)
Li, J., Jing, M., Zhu, L., Ding, Z., Lu, K., Yang, Y.: Learning modality-invariant latent representations for generalized zero-shot learning. In: Proceedings of the 28th ACM International Conference on multimedia. pp. 1348–1356 (2020)
Ma, P., Hu, X.: A variational autoencoder with deep embedding model for generalized zero-shot learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp. 11733–11740 (2020)
Li, X., Xu, Z., Wei, K., Deng, C.: Generalized zero-shot learning via disentangled representation. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp. 1966–1974 (2021)
Zhou, L., Liu, Y., Zhang, P., Bai, X., Gu, L., Zhou, J., Yao, Y., Harada, T., Zheng, J., Hancock, E.: Information bottleneck and selective noise supervision for zero-shot learning. Mach Learn. 112, 2239–2261 (2023)
Zhai, Z., Li, X., Chang, Z.: Center-VAE with discriminative and semantic-relevant fine-tuning features for generalized zero-shot learning. Signal Process. Image Commun. 111, 116897 (2023)
Fang, Z., Zhu, X., Yang, C., Han, Z., Qin, J., Yin, X.-C.: Learning aligned cross-modal representation for generalized zero-shot classification. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp. 6605–6613 (2022)
Zhang, Z., Cao, W.: Visual-semantic consistency matching network for generalized zero-shot learning. Neurocomputing 536, 30–39 (2023)
Chen, Z., Huang, Z., Li, J., Zhang, Z.: Entropy-based uncertainty calibration for generalized zero-shot learning. In: Databases Theory and Applications: 32nd Australasian Database Conference, ADC 2021, Dunedin, New Zealand, January 29–February 5, 2021, Proceedings 32. pp. 139–151 (2021)
Kwon, G., Al Regib, G.: A gating model for bias calibration in generalized zero-shot learning. IEEE Trans. Image Process. (2022). https://doi.org/10.1109/TIP.2022.3153138
Narayan, S., Gupta, A., Khan, F.S., Snoek, C., Shao, L.: Latent embedding feedback and discriminative features for zero-shot classification. In: Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16. pp. 479–495 (2020)
Chen, S., Wang, W., Xia, B., Peng, Q., You, X., Zheng, F., Shao, L.: FREE: feature refinement for generalized zero-shot learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 122–131 (2021)
Luo, Y., Wang, X., Pourpanah, F.: Dual VAEGAN: a generative model for generalized zero-shot learning. Appl. Soft Comput. 107, 107352 (2021)
Ding, B., Fan, Y., He, Y., Zhao, J.: Enhanced VAEGAN: a zero-shot image classification method. Appl. Intell. 53, 9235–9246 (2023)
Li, Y., Zhang, J., Zhang, J., Huang, K.: Discriminative learning of latent features for zero-shot recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7463–7471 (2018)
Kornblith, S., Shlens, J., Le, Q.V.: Do better ImageNet models transfer better? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2661–2671 (2019)
Shama, F., Mechrez, R., Shoshan, A., Zelnik-Manor, L.: Adversarial feedback loop. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 3205–3214 (2019)
Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 951–958 (2009)
Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2251–2265 (2019)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.J.: The Caltech-UCSD Birds-200-2011 dataset (2011)
Patterson, G., Hays, J.: SUN attribute database: discovering, annotating, and recognizing scene attributes. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2012)
Acknowledgements
This research was funded by the National Natural Science Foundation of China, grant number 61841602, the Natural Science Foundation of Shandong Province of China, grant number ZR2021MF017, ZR2020MF147 and ZR2018PF005, the Youth Innovation Science and Technology Team Foundation of Shandong Higher School, grant number 2021KJ031 and the Fundamental Research Funds for the Central Universities, JLU, grant number 93K172021K12.
Author information
Authors and Affiliations
Contributions
Conceptualization, Z.W. and X.C.; methodology, Z.W. and X.C.; software, W.L.; validation, L.G., B.Y. and W.L.; formal analysis, Z.W. and X.C.; investigation, Z.W. and W.L. resources, L.G.; data curation, B.Y.; writing—original draft preparation, X.C.; writing—review and editing, Z.W.; visualization, L.G. and X.C.; supervision, Z.W.; project administration, Z.W.; funding acquisition, Z.W. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interest to declare that are relevant to the content of this article. The authors have no conflict of interest as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Additional information
Communicated by Bing-kun Bao.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chang, X., Wang, Z., Liu, W. et al. Generating generalized zero-shot learning based on dual-path feature enhancement. Multimedia Systems 30, 273 (2024). https://doi.org/10.1007/s00530-024-01485-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-024-01485-8