Skip to main content

Cross-Modal N-Pair Network for Generalized Zero-Shot Learning

  • Conference paper
  • First Online:
Neural Computing for Advanced Applications (NCAA 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1265))

Included in the following conference series:

  • 966 Accesses

Abstract

Generalized Zero-Shot Learning (GZSL), which aims at transferring information from seen categories to unseen categories and recognizing both during the test stage, has been attracting a lot of attention in recent years. As a cross-modal task, the key to aligning visual and semantic representations is accurately measuring the distance in the projection space. Although many methods achieve GZSL by utilizing metric learning, few of them leverage intra- and inter-category information sufficiently and would suffer from domain shift problem. In this paper, we introduce a novel Cross Modal N-Pairs Network (CMNPN) to alleviate this issue. Our CMNPN firstly maps visual features and semantic prototypes into a common space with an embedding network, and then employs two N-pairs networks, VMNPN and SMNPN, to optimize an N-pair loss in both visual and semantic spaces, where we utilize a cross-modal N-pair mining strategy to mine information from all classes. Specifically, we select the hard visual representation for its corresponding semantic prototype and combines the two features with (\(N-1\)) negative samples to form an N-pair. In VMNPN, the negative samples are hard visual representations of the other (\(N-1\)) categories, while in SMNPN, they are the other (\(N-1\)) semantic prototypes. Extensive experiments on three benchmark datasets demonstrate that our proposed CMNPN achieves the state-of-the-art results on GZSL tasks.

Supported by the National Natural Science Foundation of China under Grant 61771329.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1425–1438 (2015)

    Article  Google Scholar 

  2. Bucher, M., Herbin, S., Jurie, F.: Improving semantic embedding consistency by metric learning for zero-shot classiffication. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 730–746. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_44

    Chapter  Google Scholar 

  3. Changpinyo, S., Chao, W.L., Gong, B., Sha, F.: Synthesized classifiers for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5327–5336 (2016)

    Google Scholar 

  4. Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1785. IEEE (2009)

    Google Scholar 

  5. Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems, pp. 2121–2129 (2013)

    Google Scholar 

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  7. Ji, Z., et al.: Deep ranking for image zero-shot multi-label classification. IEEE Trans. Image Process. 29, 6549–6560 (2020)

    Article  MathSciNet  Google Scholar 

  8. Ji, Z., Sun, Y., Yu, Y., Pang, Y., Han, J.: Attribute-guided network for cross-modal zero-shot hashing. IEEE Trans. Neural Netw. 31(1), 321–330 (2020)

    Article  Google Scholar 

  9. Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3174–3183 (2017)

    Google Scholar 

  10. Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 951–958. IEEE (2009)

    Google Scholar 

  11. Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2013)

    Article  Google Scholar 

  12. Li, J., Jing, M., Lu, K., Ding, Z., Zhu, L., Huang, Z.: Leveraging the invariant side of generative zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7402–7411 (2019)

    Google Scholar 

  13. Norouzi, M., et al.: Zero-shot learning by convex combination of semantic embeddings (2013). arXiv preprint arXiv:1312.5650

  14. Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: International Conference on Machine Learning, pp. 2152–2161 (2015)

    Google Scholar 

  15. Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., Matsumoto, Y.: Ridge regression, hubness, and zero-shot learning. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 135–151. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23528-8_9

    Chapter  Google Scholar 

  16. Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: Advances in Neural Information Processing Systems, pp. 935–943 (2013)

    Google Scholar 

  17. Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: In: Advances in Neural Information Processing Systems, pp. 1857–1865 (2016)

    Google Scholar 

  18. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)

    Google Scholar 

  19. Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., Schiele, B.: Latent embeddings for zero-shot classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 69–77 (2016)

    Google Scholar 

  20. Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning–a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2251–2265 (2018)

    Article  Google Scholar 

  21. Xian, Y., Lorenz, T., Schiele, B., Akata, Z.: Feature generating networks for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5542–5551 (2018)

    Google Scholar 

  22. Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning-the good, the bad and the ugly. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4582–4591 (2017)

    Google Scholar 

  23. Xian, Y., Sharma, S., Schiele, B., Akata, Z.: f-VAEGAN-D2: a feature generating framework for any-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10275–10284 (2019)

    Google Scholar 

  24. Yang, G., Liu, J., Xu, J., Li, X.: Dissimilarity representation learning for generalized zero-shot recognition. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 2032–2039 (2018)

    Google Scholar 

  25. Yu, Y., Ji, Z., Fu, Y., Guo, J., Pang, Y., Zhang, Z.M., et al.: Stacked semantics-guided attention model for fine-grained zero-shot learning. In: Advances in Neural Information Processing Systems, pp. 5995–6004 (2018)

    Google Scholar 

  26. Zhang, H., Long, Y., Yang, W., Shao, L.: Dual-verification network for zero-shot learning. Inf. Sci. 470, 43–57 (2019)

    Article  MathSciNet  Google Scholar 

  27. Zhang, L., Xiang, T., Gong, S.: Learning a deep embedding model for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2021–2030 (2017)

    Google Scholar 

  28. Zhang, Z., Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4166–4174 (2015)

    Google Scholar 

  29. Zhu, Y., Elhoseiny, M., Liu, B., Peng, X., Elgammal, A.: A generative adversarial approach for zero-shot learning from noisy texts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1004–1013 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhong Ji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cui, B., Ji, Z., Wang, H. (2020). Cross-Modal N-Pair Network for Generalized Zero-Shot Learning. In: Zhang, H., Zhang, Z., Wu, Z., Hao, T. (eds) Neural Computing for Advanced Applications. NCAA 2020. Communications in Computer and Information Science, vol 1265. Springer, Singapore. https://doi.org/10.1007/978-981-15-7670-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-7670-6_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-7669-0

  • Online ISBN: 978-981-15-7670-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics