Skip to main content

Exploiting Word Embeddings for Recognition of Previously Unseen Objects

  • Conference paper
  • First Online:
Pattern Recognition. ICPR International Workshops and Challenges (ICPR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12666))

Included in the following conference series:

  • 2475 Accesses

Abstract

A notable characteristic of human cognition is its ability to derive reliable hypotheses in situations characterized by extreme uncertainty. Even in the absence of relevant knowledge to make a correct inference, humans are able to draw upon related knowledge to make an approximate inference that is semantically close to the correct inference. In the context of object recognition, this ability amounts to being able to hypothesize the identity of an object in an image without previously having ever seen any visual training examples of that object. The paradigm of low-shot (i.e., zero-shot and few-shot) classification has been traditionally used to address these situations. However, traditional zero-shot and few-shot approaches entail the training of classifiers in situations where a majority of classes are previously seen or visually observed whereas a minority of classes are previously unseen, in which case the classifiers for the unseen classes are learned by expressing them in terms of the classifiers for the seen classes. In this paper, we address the related but different problem of object recognition in situations where only a few object classes are visually observed whereas a majority of the object classes are previously unseen. Specifically, we pose the following questions: (a) Is it possible to hypothesize the identity of an object in an image without previously having seen any visual training examples for that object? and (b) Could the visual training examples of a few seen object classes provide reliable priors for hypothesizing the identities of objects in an image that belong to the majority unseen object classes? We propose a model for recognition of objects in an image in situations where visual classifiers are available for only a limited number of object classes. To this end, we leverage word embeddings trained on publicly available text corpora and use them as natural language priors for hypothesizing the identities of objects that belong to the unseen classes. Experimental results on the Microsoft Common Objects in Context (MS-COCO) data set show that it is possible to come up with reliable hypotheses with regard to object identities by exploiting word embeddings trained on the Wikipedia text corpus even in the absence of explicit visual classifiers for those object classes. To bolster our hypothesis, we conduct additional experiments on larger dataset of concepts (themes) that we created from the Conceptual Captions dataset. Even on this extremely challenging dataset, our results, though not entirely impressive, serve to provide an important proof-of-concept for the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Socher, R., et al.: Zero-shot learning through cross-modal transfer. In: Proceedings of NIPS, pp. 935–943 (2013)

    Google Scholar 

  2. Sharma, K., Kumar, A., Bhandarkar, S.: Guessing objects in context. In: Proceedings of the ACM SIGGRAPH, p. 83 (2016)

    Google Scholar 

  3. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  4. Divvala, S.K., Hoiem, D., Hays, J.H., Efros, A.A., Hebert, M.: An empirical study of context in object detection. In: Proceedings of the IEEE Conference on CVPR (2009)

    Google Scholar 

  5. Rabinovich, A., Belongie, S.: Scenes vs. objects: a comparative study of two approaches to context-based recognition. In: Proceedings of the International Workshop on Visual Scene Understanding, Miami, FL (2009)

    Google Scholar 

  6. Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-ocurrence, location and appearance. In: Proceedings of the IEEE Conference on CVPR (2008)

    Google Scholar 

  7. Malisiewicz, T., Efros, A.A.: Beyond categories: the visual memex model for reasoning about object relationships. In: Proceedings of NIPS (2009)

    Google Scholar 

  8. Torralba, A.: The context challenge (2020). http://web.mit.edu/torralba/www/carsAndFacesInContext.html. Accessed 16 Nov 2020

  9. Heitz, G., Koller, D.: Learning spatial context: using stuff to find things. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 30–43. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_4

    Chapter  Google Scholar 

  10. Choi, M., Torralba, A., Willsky, A.S.: A tree- based context model for object recognition. IEEE Trans. PAMI 34(2), 240–252 (2012)

    Article  Google Scholar 

  11. Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from Abbey to Zoo. In: Proceedings of IEEE Conference on CVPR (2010)

    Google Scholar 

  12. Gkioxari, G.: Contextual Visual Recognition from Images and Videos. University of California, Berkeley (2016)

    Google Scholar 

  13. Sun, J., Jacobs, D.W.: Seeing what is not there: learning context to determine where objects are missing. arXiv preprint arXiv:1702.07971 (2017)

  14. Zhang, Y., Bai, M., Kohli, P., Izadi, S., Xiao, J.: DeepContext: context-encoding neural pathways for 3D holistic scene understanding. arXiv preprint arXiv:1603.04922 (2016)

  15. Gonzalez-Garcia, A., Modolo, D., Ferrari, V.: Objects as context for part detection. arXiv preprint arXiv:1703.09529 (2017)

  16. Bansal, A., Sikka, K., Sharma, G., Chellappa, R., Divakaran, A.: Zero-shot object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 397–414. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_24

    Chapter  Google Scholar 

  17. Rahman, S., Khan, S., Barnes, N.: Transductive learning for zero-shot object detection. In: Proceedings of ICCV, pp. 6082–6091 (2019)

    Google Scholar 

  18. Sadhu, A., Chen, K., Nevatia, R.: Zero-shot grounding of objects from natural language queries. In: Proceedings of ICCV, pp. 4694–4703 (2019)

    Google Scholar 

  19. Wang, Y., Ramanan, D., Hebert, M.: Meta-learning to detect rare objects. In: Proceedings of IEEE ICCV, pp. 9925–9934 (2019)

    Google Scholar 

  20. Zellers, R., Yatskar, M., Thomson, S., Choi, Y.: Neural motifs: scene graph parsing with global context. In: Proceedings of IEEE Conference on CVPR, pp. 5831–5840 (2018)

    Google Scholar 

  21. Van Der Maaten, L., Hinton, G.: Visualizing data using t-SNE. JMLR 9, 2579–2605 (2008)

    MATH  Google Scholar 

  22. Bojanowski, P., Grave, E., Joulin, A., Mikolov T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)

  23. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of ICLR (2013)

    Google Scholar 

  24. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of ICLR (2014)

    Google Scholar 

  25. Preiss, J., Dehdari, J., King, J., Mehay, D.: Refining the most frequent sense baseline. In: Proceedings of ACL Workshop on Semantic Evaluations: Recent Achievements and Future Directions, pp. 10–18 (2009)

    Google Scholar 

  26. Sharma, P., Ding, N., Goodman, S., Soricut, R.: Conceptual captions: a cleaned, hypernymed, image alt-text dataset for automatic image captioning. In: Proceedings of the Conference on ACL 1, pp. 2556–2565 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karan Sharma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharma, K., Dandu, H., Kumar, A.C.S., Kumar, V., Bhandarkar, S.M. (2021). Exploiting Word Embeddings for Recognition of Previously Unseen Objects. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12666. Springer, Cham. https://doi.org/10.1007/978-3-030-68780-9_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68780-9_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68779-3

  • Online ISBN: 978-3-030-68780-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics