Abstract
Zero-shot learning (ZSL) now has gained a great deal of focus due to its ability of recognizing unseen categories by training with samples of only seen categories. Existing efforts have been devoted to learn a projection between semantic space and feature space, which has made a big progress in ZSL. However, simply establishing a projection often suffers from the visual semantic ambiguity problem and hubness problem. Specifically, visual patterns and semantic concepts often can not properly match each other, and lead to inaccurate recognition result. To this end, in this paper, we propose a novel ZSL model, namely Asymmetric Graph-based Zero Shot Learning (AGZSL), to simultaneously preserve class level semantic manifold and instance level visual manifold in a latent space. In addition, to make the model more discriminative, we also constrain the latent space to be orthogonal, which means that the projected visual features and semantic embeddings in the latent space are orthogonal when they belong to different categories. We test our approach on four benchmark datasets under both standard zero-shot setting and more realistic generalized zero-shot learning (GZSL) setting, and the results show that our AGZSL can significantly improve the final performance comparing to the state-of-the-art methods.
Similar content being viewed by others
References
Akata Z, Perronnin F, Harchaoui Z, Schmid C (2013) Label-embedding for attribute-based classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 819–826
Akata Z, Perronnin F, Harchaoui Z, Schmid C (2016) Label-embedding for image classification. IEEE Trans Pattern Anal Mach Intell 38(7):1425–1438
Akata Z, Reed S, Walter D, Lee H, Schiele B (2015) Evaluation of output embeddings for fine-grained image classification. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 2927–2936
Akata Z, Reed S, Walter D, Lee H, Schiele B (2015) Evaluation of output embeddings for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2927–2936
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv Neural Inf Proces Syst 14(6):585–591
Bittorf V, Recht B, Ré C, Tropp JA (2012) Factoring nonnegative matrices with linear programs. In: Advances in neural information processing systems, pp 1214–1222
Changpinyo S, Chao WL, Gong B, Sha F (2016) Synthesized classifiers for zero-shot learning. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 5327–5336
Chao WL, Changpinyo S, Gong B, Sha F (2016) An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: European conference on computer vision, pp 52–68
Deutsch S, Kolouri S, Kim K, Owechko Y, Soatto S (2017) Zero shot learning via multi-scale manifold regularization. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 7112–7119
Ding Z, Shao M, Fu Y (2017) Low-rank embedded ensemble semantic dictionary for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2050–2058
Ding Z, Shao M, Fu Y (2018) Generative zero-shot learning via low-rank embedded semantic dictionary. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2018.2867870
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: International conference on international conference on machine learning, pp 647–655
Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 1778–1785
Ferrari V, Zisserman A (2008) Learning visual attributes. In: Advances in neural information processing systems, pp 433–440
Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Mikolov T, et al. (2013) Devise: a deep visual-semantic embedding model. In: Advances in neural information processing systems, pp 2121–2129
Fu Y, Hospedales TM, Xiang T, Fu Z, Gong S (2014) Transductive multi-view embedding for zero-shot recognition and annotation. In: European conference on computer vision, pp 584–599
Fu Y, Xiang T, Jiang YG, Xue X, Sigal L, Gong S (2018) Recent advances in zero-shot recognition. IEEE Signal Process Mag 35(1):112–125
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Ji Z, Yu Y, Pang Y, Guo J, Zhang Z (2017) Manifold regularized cross-modal embedding for zero-shot learning. Inf Sci 378:48–58
Jiang H, Wang R, Shan S, Chen X (2018) Learning class prototypes via structure alignment for zero-shot recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 118–134
Kodirov E, Xiang T, Fu Z, Gong S (2015) Unsupervised domain adaptation for zero-shot learning. In: Proceedings of the IEEE international conference on computer vision, pp 2452–2460
Kodirov E, Xiang T, Gong S (2017) Semantic autoencoder for zero-shot learning. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 3174–3183
Lampert CH, Hannes N, Stefan H (2014) Attribute-based classification for zero-shot visual object categorization. IEEE Trans Pattern Anal Mach Intell 36 (3):453–465
Lampert CH, Nickisch H, Harmeling S (2014) Attribute-based classification for zero-shot visual object categorization. IEEE Trans Pattern Anal Mach Intell 36 (3):453–465
Lee H, Pham PT, Yan L, Ng AY (2009) Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in neural information processing systems, pp 1096–1104
Li J, Lu K, Huang Z, Zhu L, Shen H (2019) Transfer independently together: a generalized framework for domain adaptation. IEEE Trans Cybern 46(6):2144–2155
Li J, Lu K, Zhu L, Li Z (2017) Locality-constrained transfer coding for heterogeneous domain adaptation. In: Australasian database conference, pp 193–204
Li J, Yue W, Ke L (2017) Structured domain adaptation. IEEE Trans Circuits Syst Video Technol 27(8):1700–1713
Li J, Zhao J, Lu K (2016) Joint feature selection and structure preservation for domain adaptation. In: International joint conferences on artificial intelligence (IJCAI), pp 1697–1703
Li J, Zhu L, Huang Z, Lu K, Zhao J (2018) I read, i saw, i tell: texts assisted fine-grained visual classification. In: 2018 ACM multimedia conference on multimedia conference, pp 663–671
Long Y, Liu L, Shao L (2016) Attribute embedding with visual-semantic ambiguity removal for zero-shot learning. In: BMVC
Long Y, Shao L (2017) Describing unseen classes by exemplars: Zero-shot learning using grouped simile ensemble. In: 2017 IEEE winter conference on applications of computer vision (WACV), pp 907–915
Maaten LVD, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605
Massei S, Palitta D, Robol L (2018) Solving rank-structured Sylvester and Lyapunov equations. SIAM J Matrix Anal Appl 39(4):1564–1590
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado GS, Dean J (2014) Zero-shot learning by convex combination of semantic embeddings. In: International conference on learning representation (ICLR)
Palatucci M, Pomerleau D, Hinton GE, Mitchell TM (2009) Zero-shot learning with semantic output codes. In: Advances in neural information processing systems, pp 1410–1418
Patterson G, Xu C, Su H, Hays J (2014) The sun attribute database: beyond categories for deeper scene understanding. Int J Comput Vis 108(1-2):59–81
Romera-Paredes B, Torr P (2015) An embarrassingly simple approach to zero-shot learning. In: International conference on international conference on machine learning, pp 2152–2161
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: integrated recognition, localization and detection using convolutional networks. In: International conference on learning representation (ICLR)
Shigeto Y, Suzuki I, Hara K, Shimbo M, Matsumoto Y (2015) Ridge regression, hubness, and zero-shot learning. In: Joint European conference on machine learning and knowledge discovery in databases, pp 135–151
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representation (ICLR)
Socher R, Ganjoo M, Sridhar H, Bastani O, Manning CD, Ng AY (2013) Zero-shot learning through cross-modal transfer. In: Advances in neural information processing systems, pp 935–943
Song J, Shen C, Yang Y, Liu Y, Song M (2018) Transductive unbiased embedding for zero-shot learning. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 1024–1033
Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3 (1):71–86
Verma VK, Rai P (2017) A simple exponential family framework for zero-shot learning. In: Joint European conference on machine learning and knowledge discovery in databases, pp 792–808
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-UCSD Birds-200-2011 Dataset. Tech rep
Wright J, Ganesh A, Rao S, Peng Y, Ma Y (2009) Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In: Advances in neural information processing systems, pp 2080–2088
Xian Y, Akata Z, Sharma G, Nguyen Q, Hein M, Schiele B (2016) Latent embeddings for zero-shot classification. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 69–77
Xian Y, Schiele B, Akata Z (2017) Zero-shot learning-the good, the bad and the ugly. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 4582–4591
Yan H, Ye Q, Yu DJ, Yuan X, Xu Y, Fu L, et al. (2018) Least squares twin bounded support vector machines based on l1-norm distance metric for classification. Pattern Recogn 74:434–447
Ye Q, Yang J, Liu F, Zhao C, Ye N, Yin T (2018) L1-norm distance linear discriminant analysis based on an effective iterative algorithm. IEEE Trans Circuits Syst Video Technol 28(1):114–129
Zhang H, Long Y, Guan Y, Shao L (2019) Triple verification network for generalized zero-shot learning. IEEE Trans Image Process 28(1):506–517
Zhang H, Long Y, Liu L, Shao L (2018) Adversarial unseen visual feature synthesis for zero-shot learning. Neurocomputing 329:12–20
Zhang H, Long Y, Shao L (2018) Zero-shot leaning and hashing with binary visual similes. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-018-6842-3
Zhang H, Long Y, Yang W, Shao L (2019) Dual-verification network for zero-shot learning. Inf Sci 470:43–57
Zhang L, Xiang T, Gong S (2017) Learning a deep embedding model for zero-shot learning. In: The IEEE conference on computer vision and pattern recognition (CVPR), pp 2021–2030
Zhang Z, Saligrama V (2015) Zero-shot learning via joint latent similarity embedding. In: 6034–6042
Zhang Z, Saligrama V (2015) Zero-shot learning via semantic similarity embedding. In: Proceedings of the IEEE international conference on computer vision, pp 4166–4174
Acknowledgments
This work was supported by National Natural Science Foundation of China (No.61872187).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Y., Zhang, H., Zhang, Z. et al. Asymmetric graph based zero shot learning. Multimed Tools Appl 79, 33689–33710 (2020). https://doi.org/10.1007/s11042-019-7689-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7689-y