Class Representative Learning for Zero-shot Learning Using Purely Visual Data

Chandrashekar, Mayanka; Lee, Yugyung

doi:10.1007/s42979-021-00648-y

Class Representative Learning for Zero-shot Learning Using Purely Visual Data

Original Research
Published: 01 June 2021

Volume 2, article number 313, (2021)
Cite this article

SN Computer Science Aims and scope Submit manuscript

434 Accesses
3 Citations
Explore all metrics

Abstract

The building of robust classifiers with high precision is an important goal. In reality, it is quite challenging to achieve such a goal with the data that are typically noise, sparse, or derived from heterogeneous sources. Thus, a considerable gap exists between a model built with training (seen) data and testing (unseen) data in applications. Recent works, including zero-shot learning (ZSL) and generalized zero-short learning (G-ZSL), have attempted to overcome the apparent gap through transfer learning. However, most of these works are required to build a model using visual input with associated data like semantics, attributes, and textual information. Furthermore, models are made with all of the training data. Thus, these models apply to more generic contexts but do not apply to the specific settings that will eventually be required for real-world applications. In this paper, we propose a novel model named class representative learning (CRL), a class-based classifier designed with the following unique contributions in machine learning: (1) the unique design of a latent feature vector, i.e., class representative, represents the abstract embedding space projects with the features extracted from a deep neural network learned only from input images. (2) Parallel ZSL algorithms with class representative learning; (3) a novel projection-based inferencing method uses the vector space model to reconcile the dominant difference between the seen classes and unseen classes. This study demonstrates the benefit of using the class-based approach with CRs for ZSL and G-ZSL on eight benchmark datasets. Extensive experimental results suggest that our proposed CRL model significantly outperforms the state-of-the-art methods in ZSL/G-ZSL based image classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

References

Wang W, Zheng VW, Yu H, Miao C. A survey of zero-shot learning: Settings, methods, and applications. ACM Trans Intell Syst Technol. 2019;10(2):13.
Google Scholar
Lake B, Salakhutdinov R, Gross J, Tenenbaum J. One shot learning of simple visual concepts. In: Proceedings of the Annual Meeting of the Cognitive Science Society; 2011. p. 33.
Vinyals O, Blundell C, Lillicrap T, Wierstra D, et al. Matching networks for one shot learning. In: Advances in neural information processing systems; 2016. pp. 3630–8.
Lampert CH, Nickisch H, Harmeling S. Attribute-based classification for zero-shot visual object categorization. IEEE Trans Pattern Anal Mach Intell. 2013;36(3):453–65.
Article Google Scholar
Akata Z, Perronnin F, Harchaoui Z, Schmid C. Label-embedding for image classification. IEEE Trans Pattern Anal Mach Intell. 2015;38(7):1425–38.
Article Google Scholar
Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado GS, Dean J. Zero-shot learning by convex combination of semantic embeddings. 2013. arXiv: 1312.5650.
Xian Y, Schiele B, Akata Z. Zero-shot learning-the good, the bad and the ugly. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. pp. 4582–91.
Xian Y, Lampert CH, Schiele B, Akata Z. Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans Pattern Anal Mach Intell. 2018;41(9):2251–65.
Article Google Scholar
Kingma DP, Mohamed S, Rezende DJ, Welling M. Semi-supervised learning with deep generative models. In: Advances in neural information processing systems; 2014. pp. 3581–9.
Diederik PK, Welling M, et al. Auto-encoding variational bayes. In: Proceedings of the International Conference on Learning Representations (ICLR); 2014.
Rezende DJ, Mohamed S, Danihelka I, Gregor K, Wierstra D. One-shot generalization in deep generative models. 2016. arXiv: 1603.05106.
Xu X, Tsang IW, Liu C. Complementary attributes: a new clue to zero-shot learning. IEEE Trans Cybern. 2019.
Akata Z, Malinowski M, Fritz M, Schiele B. Multi-cue zero-shot learning with strong supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016. pp. 59–68.
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L. ImageNet large scale visual recognition challenge. Int J Comput Vision. 2015;115(3):211–52.
Article MathSciNet Google Scholar
Kokkinos I. Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. pp. 6129–38.
Rebuffi S-A, Bilen H, Vedaldi A. Learning multiple visual domains with residual adapters. In: Advances in neural information processing systems; 2017. pp. 506–16.
Bilen H, Vedaldi, A. Universal representations: the missing link between faces, text, planktons, and cat breeds. 2017. arXiv: 1701.07275, 2017.
Rebuffi S-A, Bilen H, Vedaldi A. Efficient parametrization of multi-domain deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. pp. 8119–27.
Kapoor A, Lee B, Tan D, Horvitz E. Learning to learn: Algorithmic inspirations from human problem solving. In: Twenty-Sixth AAAI Conference on Artificial Intelligence; 2012.
Shen L, Sun G, Huang Q, Wang S, Lin Z, Wu E. Multi-level discriminative dictionary learning with application to large scale image classification. IEEE Trans Image Process. 2015;24(10):3109–23.
Article MathSciNet Google Scholar
Changpinyo S, Chao W-L, Sha F. Predicting visual exemplars of unseen classes for zero-shot learning. In: Proceedings of the IEEE international conference on computer vision; 2017. pp. 3476–85.
Zhang X, Gui S, Zhu Z, Zhao Y, Liu J. Hierarchical prototype learning for zero-shot recognition. IEEE Trans Multimedia. 2019.
Fu Z, Xiang T, Kodirov E, Gong S. Zero-shot learning on semantic class prototype graph. IEEE Trans Pattern Anal Mach Intell. 2017;40(8):2009–22.
Article Google Scholar
Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. In: Advances in neural information processing systems; 2017. pp. 4077–87.
Palatucci M, Pomerleau D, Hinton GE, Mitchell, TM. Zero-shot learning with semantic output codes. In: Advances in neural information processing systems; 2009. pp. 1410–18.
Wolf L, Martin, I. Robust boosting for learning from few examples. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). IEEE; 2005. pp. 1:359–64.
Torralba A, Murphy KP, Freeman WT. Sharing visual features for multiclass and multiview object detection. IEEE Trans Pattern Anal Mach Intell. 2007;5:854–69.
Article Google Scholar
Fleuret F, Blanchard, G. Pattern recognition from one example by chopping. In: Advances in neural information processing systems; 2006. pp. 371–78.
Bart E, Ullman S, Cross-generalization: Learning novel classes from a single example by feature replacement. In: IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol. 1. IEEE;2005. pp. 672–9.
Amit Y, Fink M, Srebro N, Ullman, S. Uncovering shared structures in multiclass classification. In: Proceedings of the 24th international conference on Machine learning. ACM; 2007. pp. 17–24.
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems; 2013. pp. 3111–19.
Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. pp. 1532–43.
Akata Z, Reed S, Walter D, Lee H, Schiele B. Evaluation of output embeddings for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015. pp. 2927–36.
Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T. Devise: a deep visual-semantic embedding model. In: Advances in neural information processing systems; 2013. pp. 2121–9.
Fu Z, Xiang T, Kodirov E, Gong S. Zero-shot object recognition by semantic manifold distance. In: The IEEE conference on computer vision and pattern recognition (CVPR). 2015.
Romera-Paredes B, Torr P. An embarrassingly simple approach to zero-shot learning. In: International conference on machine learning; 2015. pp. 2152–61.
Xian Y, Akata Z, Sharma G, Nguyen Q, Hein M, Schiele B. Latent embeddings for zero-shot classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. pp. 69–77.
Yu Y, Zhang Z, Han J. Meta-transfer networks for zero-shot learning. 2019. arXiv:1909.03360.
Wang Q, Chen K. Alternative semantic representations for zero-shot human action recognition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer; 2017. pp. 87–102.
Wang Q, Bu P, Breckon TP. Unifying unsupervised domain adaptation and zero-shot visual recognition. 2019. arXiv:1903.10601.
Zhu Y, Long Y, Guan Y, Newsam S, Shao L. Towards universal representation for unseen action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. pp. 9436–45.
Fu Y, Wang X, Dong H, Jiang Y-G, Wang M, Xue X, Sigal L. Vocabulary-informed zero-shot and open-set learning. IEEE Trans Pattern Anal Mach Intell. 2019.
Farhadi A, Endres I, Hoiem D, Forsyth D, Describing objects by their attributes. In: IEEE conference on computer vision and pattern recognition. IEEE; 2009. pp. 1778–85.
Socher R, Ganjoo M, Manning CD, Ng A. Zero-shot learning through cross-modal transfer. In: Advances in neural information processing systems. 2013; pp. 935–43.
Kodirov E, Xiang T, Gong S. Semantic autoencoder for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. pp. 3174–83.
Zhang L, Xiang T, Gong S. Learning a deep embedding model for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. pp. 2021–30.
Xian Y, Lorenz T, Schiele B, Akata Z. Feature generating networks for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. pp. 5542–51.
Schonfeld E, Ebrahimi S, Sinha S, Darrell T, Akata Z. Generalized zero-and few-shot learning via aligned variational autoencoders. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2019. pp. 8247–55.
Felix R, Kumar VB, Reid I, Carneiro G. Multi-modal cycle-consistent generalized zero-shot learning. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018. pp. 21–37.
Guan J, Lu Z, Xiang T, Li A, Zhao A, Wen J-R. Zero and few shot learning with semantic feature synthesis and competitive learning. IEEE Trans Pattern Anal Mach Intell. 2020.
Zhang Z, Saligrama V. Zero-shot learning via semantic similarity embedding. In: Proceedings of the IEEE international conference on computer vision; 2015. pp. 4166–74.
Changpinyo S, Chao W-L, Gong B, Sha F. Synthesized classifiers for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. pp. 5327–36.
Chen L, Zhang H, Xiao J, Liu W, Chang S-F. Zero-shot visual recognition using semantics-preserving adversarial embedding networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. pp. 1043–52.
Akata Z, Perronnin F, Harchaoui Z, Schmid C. Label-embedding for attribute-based classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2013. pp. 819–26.
Lei Ba J, Swersky K, Fidler S, et al. Predicting deep zero-shot convolutional neural networks using textual descriptions. In: Proceedings of the IEEE international conference on computer vision; 2015. pp. 4247–55.
Yazdani M, Henderson J. A model of zero-shot learning of spoken language understanding. In: Proceedings of the 2015 conference on empirical methods in natural language processing; 2015. pp. 244–9.
Mikolov T, Le QV, Sutskever I. Exploiting similarities among languages for machine translation. 2013. arXiv:1309.4168.
Shigeto Y, Suzuki I, Hara K, Shimbo M, Matsumoto Y. “Ridge regression, hubness, and zero-shot learning. In: Joint European Conference on machine learning and knowledge discovery in databases. Springer; 2015. pp. 135–51.
Long Y, Liu L, Shen F, Shao L, Li X. Zero-shot learning using synthesised unseen visual data with diffusion regularisation. IEEE Trans Pattern Anal Mach Intell. 2017;40(10):2498–512.
Article Google Scholar
Xu X, Hospedales T, Gong S. Transductive zero-shot action recognition by word-vector embedding. Int J Comput Vision. 2017;123(3):309–33.
Article MathSciNet Google Scholar
Kodirov E, Xiang T, Fu Z, Gong S. Unsupervised domain adaptation for zero-shot learning. In: Proceedings of the IEEE international conference on computer vision; 2015. pp. 2452–60.
Fu Y, Hospedales TM, Xiang T, Gong S. Learning multimodal latent attributes. IEEE Trans Pattern Anal Mach Intell. 2013;36(2):303–16.
Article Google Scholar
Fu Y, Sigal L. Semi-supervised vocabulary-informed learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. pp. 5337–46.
Wang Q, Chen K. Zero-shot visual recognition via bidirectional latent embedding. Int J Comput Vision. 2017;124(3):356–83.
Article MathSciNet Google Scholar
Xu B, Fu Y, Jiang Y-G, Li B, Sigal L. Video emotion recognition with transferred deep feature encodings. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval; 2016. pp. 15–22.
Xu X, Shen F, Yang Y, Zhang D, Tao Shen H, Song J. Matrix tri-factorization with manifold regularizations for zero-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. pp. 3798–07.
Tamaazousti Y, Le Borgne H, Hudelot C, Seddik MEA, Tamaazousti M. Learning more universal representations for transfer-learning. IEEE Trans Pattern Anal Mach Intell. 2019.
Chandrashekar M, Lee Y. Crl: Class representative learning for image classification. 2020. arXiv:2002.06619.
Raina R, Battle A, Lee H, Packer B, Ng AY. Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th international conference on machine learning. ACM; 2007. pp. 759–66.
Ben-David S, Blitzer J, Crammer K, Pereira F. Analysis of representations for domain adaptation. In: Advances in neural information processing systems; 2007. pp. 137–44.
Blitzer J, Crammer K, Kulesza A, Pereira F, Wortman J. Learning bounds for domain adaptation. In: Advances in neural information processing systems; 2008. pp. 129–36.
Kifer D, Ben-David S, Gehrke J. Detecting change in data streams. In: VLDB, vol. 4. Toronto; 2004. pp. 180–91.
Fei-Fei L, Fergus R, Perona P, Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In. Conference on computer vision and pattern recognition workshop. IEEE;2004. p. 178.
Griffin G, Holub A, Perona P. Caltech-256 object category dataset. 2007.
Welinder P, Branson S, Mita T, Wah C, Schroff F, Belongie S, Perona P. Caltech-UCSD Birds 200. Tech. Rep. CNS-TR-2010-001, California Institute of Technology. 2010.
Patterson G, Xu C, Su H, Hays J. The sun attribute database: beyond categories for deeper scene understanding. Int J Comput Vision. 2014;108(1–2):59–81.
Article Google Scholar
Pretrained inception-v3 convolutional neural network - matlab inceptionv3 - mathworks india. 2019. https://in.mathworks.com/help/deeplearning/ref/inceptionv3.html. (Accessed on 05/13/2019).
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. pp. 2818–26.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. pp. 770–778.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014. arXiv:1409.1556.
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. pp. 1–9.
Foundation AS “Apache spark”. 2019. Accessed on 26 May 2019.
Day O, Khoshgoftaar TM. A survey on heterogeneous transfer learning. J Big Data. 2017;4(1):29.
Article Google Scholar
Kornblith S, Shlens J, Le QV. Do better imagenet models transfer better? 2018. arXiv:1805.08974.
Ulusoy I, Bishop CM. Comparison of generative and discriminative techniques for object detection and classification. In: Toward category-level object recognition. Springer; 2006. pp. 173–95.
Wang W, Pu Y, Verma VK, Fan K, Zhang Y, Chen C, Rai P, Carin L. Zero-shot learning via class-conditioned deep generative models. In: Thirty-second AAAI conference on artificial intelligence. 2018.
Scheirer WJ, de Rezende Rocha A, Sapkota A, Boult TE. Toward open set recognition. IEEE Trans Pattern Anal Mach Intell. 2012;35:1757–72.
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by NSF CNS #1747751. The authors would like to thank Dr. Ye Wang, Associate Professor, UMKC, for constructive criticism of the manuscript

Author information

Authors and Affiliations

University of Missouri-Kansas City, Kansas City, MO, USA
Mayanka Chandrashekar & Yugyung Lee

Authors

Mayanka Chandrashekar
View author publications
You can also search for this author in PubMed Google Scholar
Yugyung Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mayanka Chandrashekar.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chandrashekar, M., Lee, Y. Class Representative Learning for Zero-shot Learning Using Purely Visual Data. SN COMPUT. SCI. 2, 313 (2021). https://doi.org/10.1007/s42979-021-00648-y

Download citation

Received: 11 January 2021
Accepted: 15 April 2021
Published: 01 June 2021
DOI: https://doi.org/10.1007/s42979-021-00648-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Class Representative Learning for Zero-shot Learning Using Purely Visual Data

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Class Representative Learning for Zero-shot Learning Using Purely Visual Data

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation