Skip to main content
Log in

Generating generalized zero-shot learning based on dual-path feature enhancement

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Generalized zero-shot learning (GZSL) can classify both seen and unseen class samples, which plays a significant role in practical applications such as emerging species recognition and medical image recognition. However, most existing GZSL methods directly use the pre-trained deep model to learn the image feature. Due to the data distribution inconsistency between the GZSL dataset and the pre-training dataset, the obtained image features have an inferior performance. The distribution of different class image features is similar, which makes them difficult to distinguish. To solve this problem, we propose a dual-path feature enhancement (DPFE) model, which consists of four modules: the feature generation network (FGN), the local fine-grained feature enhancement (LFFE) module, the global coarse-grained feature enhancement (GCFE) module, and the feedback module (FM). The feature generation network can synthesize unseen class image features. We enhance the image features’ discriminative and semantic relevance from both local and global perspectives. To focus on the image’s local discriminative regions, the LFFE module processes the image in blocks and minimizes the semantic cycle-consistency loss to ensure that the region block features contain key classification semantic information. To prevent information loss caused by image blocking, we design the GCFE module. It ensures the consistency between the global image features and the semantic centers, thereby improving the discriminative power of the features. In addition, the feedback module feeds the discriminator network’s middle layer information back to the generator network. As a result, the synthesized image features are more similar to the real features. Experimental results demonstrate that the proposed DPFE method outperforms the state-of-the-arts on four zero-shot learning benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author, Zhen Wang, upon reasonable request.

References

  1. Ding, F., Guo, B., Jia, X., Chi, H., Xu, W.: Improving GAN-based feature extraction for hyperspectral images classification. J. Electron. Imaging (2021). https://doi.org/10.1117/1.jei.30.6.063011

    Article  Google Scholar 

  2. Larochelle, H., Erhan, D., Bengio, Y.: Zero-data learning of new tasks. In: AAAI. p. 3 (2008)

  3. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2009)

    Article  Google Scholar 

  4. Chao, W.-L., Changpinyo, S., Gong, B., Sha, F.: An empirical study and analysis of generalized zero-shot learning for object recognition in the wild (2016)

  5. An, R., Miao, Z., Li, Q., Xu, W., Zhang, Q.: Spatiotemporal visual-semantic embedding network for zero-shot action recognition. J. Electron. Imaging 28, 023007–023007 (2019)

    Article  Google Scholar 

  6. Biswas, S., Annadani, Y.: Preserving semantic relations for zero-shot learning. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE (2018)

  7. Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3174–3183 (2017)

  8. Li, Q., Hou, M., Lai, H., Yang, M.: Cross-modal distribution alignment embedding network for generalized zero-shot learning. Neural Netw. 148, 176–182 (2022)

    Article  Google Scholar 

  9. Min, S., Yao, H., Xie, H., Wang, C., Zha, Z.-J., Zhang, Y.: Domain-aware visual bias eliminating for generalized zero-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12664–12673 (2020)

  10. Huynh, D., Elhamifar, E.: Fine-grained generalized zero-shot learning via dense attribute-based attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4483–4493 (2020)

  11. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)

  12. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

  13. Xian, Y., Sharma, S., Schiele, B., Akata, Z.: F-VAEGAN-D2: a feature generating framework for any-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10275–10284 (2019)

  14. Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR 2011. pp. 1521–1528 (2011)

  15. Xian, Y., Lorenz, T., Schiele, B., Akata, Z.: Feature generating networks for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5542–5551 (2018)

  16. Vyas, M.R., Venkateswara, H., Panchanathan, S.: Leveraging seen and unseen semantic relationships for generative zero-shot learning. In: Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXX 16. pp. 70–86 (2020)

  17. Zhang, J., Liao, S., Zhang, H., Long, Y., Zhang, Z., Liu, L.: Data driven recurrent generative adversarial network for generalized zero shot image classification. Inf. Sci. 625, 536–552 (2023)

    Article  Google Scholar 

  18. Xie, G.-S., Zhang, Z., Liu, G., Zhu, F., Liu, L., Shao, L., Li, X.: Generalized zero-shot learning with multiple graph adaptive generative networks. IEEE Trans. Neural Netw. Learn. Syst. 33, 2903–2915 (2021)

    Article  Google Scholar 

  19. Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., Akata, Z.: Generalized zero-and few-shot learning via aligned variational autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8247–8255 (2019)

  20. Li, J., Jing, M., Zhu, L., Ding, Z., Lu, K., Yang, Y.: Learning modality-invariant latent representations for generalized zero-shot learning. In: Proceedings of the 28th ACM International Conference on multimedia. pp. 1348–1356 (2020)

  21. Ma, P., Hu, X.: A variational autoencoder with deep embedding model for generalized zero-shot learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp. 11733–11740 (2020)

  22. Li, X., Xu, Z., Wei, K., Deng, C.: Generalized zero-shot learning via disentangled representation. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp. 1966–1974 (2021)

  23. Zhou, L., Liu, Y., Zhang, P., Bai, X., Gu, L., Zhou, J., Yao, Y., Harada, T., Zheng, J., Hancock, E.: Information bottleneck and selective noise supervision for zero-shot learning. Mach Learn. 112, 2239–2261 (2023)

    Article  MathSciNet  Google Scholar 

  24. Zhai, Z., Li, X., Chang, Z.: Center-VAE with discriminative and semantic-relevant fine-tuning features for generalized zero-shot learning. Signal Process. Image Commun. 111, 116897 (2023)

    Article  Google Scholar 

  25. Fang, Z., Zhu, X., Yang, C., Han, Z., Qin, J., Yin, X.-C.: Learning aligned cross-modal representation for generalized zero-shot classification. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp. 6605–6613 (2022)

  26. Zhang, Z., Cao, W.: Visual-semantic consistency matching network for generalized zero-shot learning. Neurocomputing 536, 30–39 (2023)

    Article  Google Scholar 

  27. Chen, Z., Huang, Z., Li, J., Zhang, Z.: Entropy-based uncertainty calibration for generalized zero-shot learning. In: Databases Theory and Applications: 32nd Australasian Database Conference, ADC 2021, Dunedin, New Zealand, January 29–February 5, 2021, Proceedings 32. pp. 139–151 (2021)

  28. Kwon, G., Al Regib, G.: A gating model for bias calibration in generalized zero-shot learning. IEEE Trans. Image Process. (2022). https://doi.org/10.1109/TIP.2022.3153138

    Article  Google Scholar 

  29. Narayan, S., Gupta, A., Khan, F.S., Snoek, C., Shao, L.: Latent embedding feedback and discriminative features for zero-shot classification. In: Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16. pp. 479–495 (2020)

  30. Chen, S., Wang, W., Xia, B., Peng, Q., You, X., Zheng, F., Shao, L.: FREE: feature refinement for generalized zero-shot learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 122–131 (2021)

  31. Luo, Y., Wang, X., Pourpanah, F.: Dual VAEGAN: a generative model for generalized zero-shot learning. Appl. Soft Comput. 107, 107352 (2021)

    Article  Google Scholar 

  32. Ding, B., Fan, Y., He, Y., Zhao, J.: Enhanced VAEGAN: a zero-shot image classification method. Appl. Intell. 53, 9235–9246 (2023)

    Article  Google Scholar 

  33. Li, Y., Zhang, J., Zhang, J., Huang, K.: Discriminative learning of latent features for zero-shot recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7463–7471 (2018)

  34. Kornblith, S., Shlens, J., Le, Q.V.: Do better ImageNet models transfer better? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2661–2671 (2019)

  35. Shama, F., Mechrez, R., Shoshan, A., Zelnik-Manor, L.: Adversarial feedback loop. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 3205–3214 (2019)

  36. Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 951–958 (2009)

  37. Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2251–2265 (2019)

    Article  Google Scholar 

  38. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.J.: The Caltech-UCSD Birds-200-2011 dataset (2011)

  39. Patterson, G., Hays, J.: SUN attribute database: discovering, annotating, and recognizing scene attributes. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2012)

Download references

Acknowledgements

This research was funded by the National Natural Science Foundation of China, grant number 61841602, the Natural Science Foundation of Shandong Province of China, grant number ZR2021MF017, ZR2020MF147 and ZR2018PF005, the Youth Innovation Science and Technology Team Foundation of Shandong Higher School, grant number 2021KJ031 and the Fundamental Research Funds for the Central Universities, JLU, grant number 93K172021K12.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, Z.W. and X.C.; methodology, Z.W. and X.C.; software, W.L.; validation, L.G., B.Y. and W.L.; formal analysis, Z.W. and X.C.; investigation, Z.W. and W.L. resources, L.G.; data curation, B.Y.; writing—original draft preparation, X.C.; writing—review and editing, Z.W.; visualization, L.G. and X.C.; supervision, Z.W.; project administration, Z.W.; funding acquisition, Z.W. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Zhen Wang.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article. The authors have no conflict of interest as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Additional information

Communicated by Bing-kun Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chang, X., Wang, Z., Liu, W. et al. Generating generalized zero-shot learning based on dual-path feature enhancement. Multimedia Systems 30, 273 (2024). https://doi.org/10.1007/s00530-024-01485-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-024-01485-8

Keywords

Navigation