Cross-Modal N-Pair Network for Generalized Zero-Shot Learning

Cui, Biying; Ji, Zhong; Wang, Hai

doi:10.1007/978-981-15-7670-6_21

Biying Cui¹⁰,
Zhong Ji¹⁰ &
Hai Wang¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1265))

Included in the following conference series:

International Conference on Neural Computing for Advanced Applications

966 Accesses

Abstract

Generalized Zero-Shot Learning (GZSL), which aims at transferring information from seen categories to unseen categories and recognizing both during the test stage, has been attracting a lot of attention in recent years. As a cross-modal task, the key to aligning visual and semantic representations is accurately measuring the distance in the projection space. Although many methods achieve GZSL by utilizing metric learning, few of them leverage intra- and inter-category information sufficiently and would suffer from domain shift problem. In this paper, we introduce a novel Cross Modal N-Pairs Network (CMNPN) to alleviate this issue. Our CMNPN firstly maps visual features and semantic prototypes into a common space with an embedding network, and then employs two N-pairs networks, VMNPN and SMNPN, to optimize an N-pair loss in both visual and semantic spaces, where we utilize a cross-modal N-pair mining strategy to mine information from all classes. Specifically, we select the hard visual representation for its corresponding semantic prototype and combines the two features with (\(N-1\)) negative samples to form an N-pair. In VMNPN, the negative samples are hard visual representations of the other (\(N-1\)) categories, while in SMNPN, they are the other (\(N-1\)) semantic prototypes. Extensive experiments on three benchmark datasets demonstrate that our proposed CMNPN achieves the state-of-the-art results on GZSL tasks.

Supported by the National Natural Science Foundation of China under Grant 61771329.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1425–1438 (2015)
Article Google Scholar
Bucher, M., Herbin, S., Jurie, F.: Improving semantic embedding consistency by metric learning for zero-shot classiffication. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 730–746. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_44
Chapter Google Scholar
Changpinyo, S., Chao, W.L., Gong, B., Sha, F.: Synthesized classifiers for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5327–5336 (2016)
Google Scholar
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1785. IEEE (2009)
Google Scholar
Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems, pp. 2121–2129 (2013)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Ji, Z., et al.: Deep ranking for image zero-shot multi-label classification. IEEE Trans. Image Process. 29, 6549–6560 (2020)
Article MathSciNet Google Scholar
Ji, Z., Sun, Y., Yu, Y., Pang, Y., Han, J.: Attribute-guided network for cross-modal zero-shot hashing. IEEE Trans. Neural Netw. 31(1), 321–330 (2020)
Article Google Scholar
Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3174–3183 (2017)
Google Scholar
Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 951–958. IEEE (2009)
Google Scholar
Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 453–465 (2013)
Article Google Scholar
Li, J., Jing, M., Lu, K., Ding, Z., Zhu, L., Huang, Z.: Leveraging the invariant side of generative zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7402–7411 (2019)
Google Scholar
Norouzi, M., et al.: Zero-shot learning by convex combination of semantic embeddings (2013). arXiv preprint arXiv:1312.5650
Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: International Conference on Machine Learning, pp. 2152–2161 (2015)
Google Scholar
Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., Matsumoto, Y.: Ridge regression, hubness, and zero-shot learning. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9284, pp. 135–151. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23528-8_9
Chapter Google Scholar
Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: Advances in Neural Information Processing Systems, pp. 935–943 (2013)
Google Scholar
Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: In: Advances in Neural Information Processing Systems, pp. 1857–1865 (2016)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset (2011)
Google Scholar
Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., Schiele, B.: Latent embeddings for zero-shot classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 69–77 (2016)
Google Scholar
Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning–a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2251–2265 (2018)
Article Google Scholar
Xian, Y., Lorenz, T., Schiele, B., Akata, Z.: Feature generating networks for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5542–5551 (2018)
Google Scholar
Xian, Y., Schiele, B., Akata, Z.: Zero-shot learning-the good, the bad and the ugly. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4582–4591 (2017)
Google Scholar
Xian, Y., Sharma, S., Schiele, B., Akata, Z.: f-VAEGAN-D2: a feature generating framework for any-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10275–10284 (2019)
Google Scholar
Yang, G., Liu, J., Xu, J., Li, X.: Dissimilarity representation learning for generalized zero-shot recognition. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 2032–2039 (2018)
Google Scholar
Yu, Y., Ji, Z., Fu, Y., Guo, J., Pang, Y., Zhang, Z.M., et al.: Stacked semantics-guided attention model for fine-grained zero-shot learning. In: Advances in Neural Information Processing Systems, pp. 5995–6004 (2018)
Google Scholar
Zhang, H., Long, Y., Yang, W., Shao, L.: Dual-verification network for zero-shot learning. Inf. Sci. 470, 43–57 (2019)
Article MathSciNet Google Scholar
Zhang, L., Xiang, T., Gong, S.: Learning a deep embedding model for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2021–2030 (2017)
Google Scholar
Zhang, Z., Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4166–4174 (2015)
Google Scholar
Zhu, Y., Elhoseiny, M., Liu, B., Peng, X., Elgammal, A.: A generative adversarial approach for zero-shot learning from noisy texts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1004–1013 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Tianjin University, Tianjin, 300072, China
Biying Cui, Zhong Ji & Hai Wang

Authors

Biying Cui
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Ji
View author publications
You can also search for this author in PubMed Google Scholar
Hai Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhong Ji .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Shenzhen, China
Haijun Zhang
Hefei University of Technology, Hefei, China
Zhao Zhang
Chongqing University, Chongqing, China
Zhou Wu
South China Normal University, Guangzhou, China
Tianyong Hao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cui, B., Ji, Z., Wang, H. (2020). Cross-Modal N-Pair Network for Generalized Zero-Shot Learning. In: Zhang, H., Zhang, Z., Wu, Z., Hao, T. (eds) Neural Computing for Advanced Applications. NCAA 2020. Communications in Computer and Information Science, vol 1265. Springer, Singapore. https://doi.org/10.1007/978-981-15-7670-6_21

Download citation

DOI: https://doi.org/10.1007/978-981-15-7670-6_21
Published: 13 August 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7669-0
Online ISBN: 978-981-15-7670-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics