Exploiting Word Embeddings for Recognition of Previously Unseen Objects

Sharma, Karan; Dandu, Hemanth; Kumar, Arun C. S.; Kumar, Vinay; Bhandarkar, Suchendra M.

doi:10.1007/978-3-030-68780-9_27

Karan Sharma¹⁶,
Hemanth Dandu¹⁷,
Arun C. S. Kumar¹⁷,
Vinay Kumar¹⁷ &
…
Suchendra M. Bhandarkar¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12666))

Included in the following conference series:

International Conference on Pattern Recognition

2475 Accesses

Abstract

A notable characteristic of human cognition is its ability to derive reliable hypotheses in situations characterized by extreme uncertainty. Even in the absence of relevant knowledge to make a correct inference, humans are able to draw upon related knowledge to make an approximate inference that is semantically close to the correct inference. In the context of object recognition, this ability amounts to being able to hypothesize the identity of an object in an image without previously having ever seen any visual training examples of that object. The paradigm of low-shot (i.e., zero-shot and few-shot) classification has been traditionally used to address these situations. However, traditional zero-shot and few-shot approaches entail the training of classifiers in situations where a majority of classes are previously seen or visually observed whereas a minority of classes are previously unseen, in which case the classifiers for the unseen classes are learned by expressing them in terms of the classifiers for the seen classes. In this paper, we address the related but different problem of object recognition in situations where only a few object classes are visually observed whereas a majority of the object classes are previously unseen. Specifically, we pose the following questions: (a) Is it possible to hypothesize the identity of an object in an image without previously having seen any visual training examples for that object? and (b) Could the visual training examples of a few seen object classes provide reliable priors for hypothesizing the identities of objects in an image that belong to the majority unseen object classes? We propose a model for recognition of objects in an image in situations where visual classifiers are available for only a limited number of object classes. To this end, we leverage word embeddings trained on publicly available text corpora and use them as natural language priors for hypothesizing the identities of objects that belong to the unseen classes. Experimental results on the Microsoft Common Objects in Context (MS-COCO) data set show that it is possible to come up with reliable hypotheses with regard to object identities by exploiting word embeddings trained on the Wikipedia text corpus even in the absence of explicit visual classifiers for those object classes. To bolster our hypothesis, we conduct additional experiments on larger dataset of concepts (themes) that we created from the Conceptual Captions dataset. Even on this extremely challenging dataset, our results, though not entirely impressive, serve to provide an important proof-of-concept for the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Semantic embedding: scene image classification using scene-specific objects

Article 18 October 2022

Beyond One-Hot-Encoding: Injecting Semantics to Drive Image Classifiers

Webly Supervised Semantic Embeddings for Large Scale Zero-Shot Learning

References

Socher, R., et al.: Zero-shot learning through cross-modal transfer. In: Proceedings of NIPS, pp. 935–943 (2013)
Google Scholar
Sharma, K., Kumar, A., Bhandarkar, S.: Guessing objects in context. In: Proceedings of the ACM SIGGRAPH, p. 83 (2016)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Divvala, S.K., Hoiem, D., Hays, J.H., Efros, A.A., Hebert, M.: An empirical study of context in object detection. In: Proceedings of the IEEE Conference on CVPR (2009)
Google Scholar
Rabinovich, A., Belongie, S.: Scenes vs. objects: a comparative study of two approaches to context-based recognition. In: Proceedings of the International Workshop on Visual Scene Understanding, Miami, FL (2009)
Google Scholar
Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-ocurrence, location and appearance. In: Proceedings of the IEEE Conference on CVPR (2008)
Google Scholar
Malisiewicz, T., Efros, A.A.: Beyond categories: the visual memex model for reasoning about object relationships. In: Proceedings of NIPS (2009)
Google Scholar
Torralba, A.: The context challenge (2020). http://web.mit.edu/torralba/www/carsAndFacesInContext.html. Accessed 16 Nov 2020
Heitz, G., Koller, D.: Learning spatial context: using stuff to find things. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 30–43. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_4
Chapter Google Scholar
Choi, M., Torralba, A., Willsky, A.S.: A tree- based context model for object recognition. IEEE Trans. PAMI 34(2), 240–252 (2012)
Article Google Scholar
Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from Abbey to Zoo. In: Proceedings of IEEE Conference on CVPR (2010)
Google Scholar
Gkioxari, G.: Contextual Visual Recognition from Images and Videos. University of California, Berkeley (2016)
Google Scholar
Sun, J., Jacobs, D.W.: Seeing what is not there: learning context to determine where objects are missing. arXiv preprint arXiv:1702.07971 (2017)
Zhang, Y., Bai, M., Kohli, P., Izadi, S., Xiao, J.: DeepContext: context-encoding neural pathways for 3D holistic scene understanding. arXiv preprint arXiv:1603.04922 (2016)
Gonzalez-Garcia, A., Modolo, D., Ferrari, V.: Objects as context for part detection. arXiv preprint arXiv:1703.09529 (2017)
Bansal, A., Sikka, K., Sharma, G., Chellappa, R., Divakaran, A.: Zero-shot object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 397–414. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_24
Chapter Google Scholar
Rahman, S., Khan, S., Barnes, N.: Transductive learning for zero-shot object detection. In: Proceedings of ICCV, pp. 6082–6091 (2019)
Google Scholar
Sadhu, A., Chen, K., Nevatia, R.: Zero-shot grounding of objects from natural language queries. In: Proceedings of ICCV, pp. 4694–4703 (2019)
Google Scholar
Wang, Y., Ramanan, D., Hebert, M.: Meta-learning to detect rare objects. In: Proceedings of IEEE ICCV, pp. 9925–9934 (2019)
Google Scholar
Zellers, R., Yatskar, M., Thomson, S., Choi, Y.: Neural motifs: scene graph parsing with global context. In: Proceedings of IEEE Conference on CVPR, pp. 5831–5840 (2018)
Google Scholar
Van Der Maaten, L., Hinton, G.: Visualizing data using t-SNE. JMLR 9, 2579–2605 (2008)
MATH Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of ICLR (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of ICLR (2014)
Google Scholar
Preiss, J., Dehdari, J., King, J., Mehay, D.: Refining the most frequent sense baseline. In: Proceedings of ACL Workshop on Semantic Evaluations: Recent Achievements and Future Directions, pp. 10–18 (2009)
Google Scholar
Sharma, P., Ding, N., Goodman, S., Soricut, R.: Conceptual captions: a cleaned, hypernymed, image alt-text dataset for automatic image captioning. In: Proceedings of the Conference on ACL 1, pp. 2556–2565 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Keysight Technologies, Atlanta, GA, 30308, USA
Karan Sharma
Department of Computer Science, University of Georgia, Athens, GA, 30602-7404, USA
Hemanth Dandu, Arun C. S. Kumar, Vinay Kumar & Suchendra M. Bhandarkar

Authors

Karan Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Hemanth Dandu
View author publications
You can also search for this author in PubMed Google Scholar
Arun C. S. Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Vinay Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Suchendra M. Bhandarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karan Sharma .

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Alberto Del Bimbo
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Rita Cucchiara
Department of Computer Science, Boston University, Boston, MA, USA
Stan Sclaroff
Dipartimento di Matematica e Informatica, University of Catania, Catania, Italy
Giovanni Maria Farinella
Cloud & AI, JD.COM, Beijing, China
Tao Mei
Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Marco Bertini
Computational Sciences Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Tonantzintla, Puebla, Mexico
Hugo Jair Escalante
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Roberto Vezzani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharma, K., Dandu, H., Kumar, A.C.S., Kumar, V., Bhandarkar, S.M. (2021). Exploiting Word Embeddings for Recognition of Previously Unseen Objects. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12666. Springer, Cham. https://doi.org/10.1007/978-3-030-68780-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-68780-9_27
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68779-3
Online ISBN: 978-3-030-68780-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Exploiting Word Embeddings for Recognition of Previously Unseen Objects

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic embedding: scene image classification using scene-specific objects

Beyond One-Hot-Encoding: Injecting Semantics to Drive Image Classifiers

Webly Supervised Semantic Embeddings for Large Scale Zero-Shot Learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Exploiting Word Embeddings for Recognition of Previously Unseen Objects

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic embedding: scene image classification using scene-specific objects

Beyond One-Hot-Encoding: Injecting Semantics to Drive Image Classifiers

Webly Supervised Semantic Embeddings for Large Scale Zero-Shot Learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation