Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4170))

Abstract

We consider strategies for reducing ambiguity in multi-modal data, particularly in the domain of images and text. Large data sets containing images with associated text (and vice versa) are readily available, and recent work has exploited such data to learn models for linking visual elements to semantics. This requires addressing a correspondence ambiguity because it is generally not known which parts of the images connect with which language elements. In this paper we first discuss using language processing to reduce correspondence ambiguity in loosely labeled image data. We then consider a similar problem of using visual correlates to reduce ambiguity in text with associated images. Only rudimentary image understanding is needed for this task because the image only needs to help differentiate between a limited set of choices, namely the senses of a particular word.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Proceedings of Senseval-3: The Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (2004)

    Google Scholar 

  2. Agirre, E., Rigau, G.: Word sense disambiguation using conceptual density. In: Proceedings of COLING 1996, Copenhagen, Denmark, pp. 16–22 (1996)

    Google Scholar 

  3. Agirre, E., Rigau, G.: A proposal for word sense disambiguation using conceptual distance. In: Proceedings of the 1st International Conference on Recent Advances in Natural Language Processing (1995)

    Google Scholar 

  4. Amar, R.A., Dooly, D.R., Goldman, S.A., Zhang, Q.: Multiple instance learning of real-valued data. In: 18th Int. Conf. Machine Learning (2001)

    Google Scholar 

  5. Andrews, S., Hofmann, T., Tsochantaridis, I.: Multiple instance learning with generalized support vector machines. In: AAAI (2002)

    Google Scholar 

  6. Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Advances in Neural Information Processing Systems, 15 (2002)

    Google Scholar 

  7. Bar-Hillel, Y.: The present status of automatic translation of languages. In: Booth, D., Meagher, R.E. (eds.) Advances in Computers, pp. 91–163. Academic Press, New York (1960)

    Google Scholar 

  8. Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)

    Article  MATH  Google Scholar 

  9. Barnard, K., Duygulu, P., Forsyth, D.: Exploiting text and image feature co-occurrence statistics in large datasets. In: Veltkamp, R. (ed.) Trends and Advances in Content-Based Image and Video Retrieval. Springer, Heidelberg (to appear)

    Google Scholar 

  10. Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)

    Article  MATH  Google Scholar 

  11. Barnard, K., Duygulu, P., Raghavendra, K.G., Gabbur, P., Forsyth, D.: The effects of segmentation and feature choice in a translation model of object recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, vol.II, pp. 675–682 (2003)

    Google Scholar 

  12. Barnard, K., Fan, Q., Swaminathan, R., Hoogs, A., Collins, R., Rondot, P., Kaufhold, J.: Evaluation of localized semantics: data, methodology, and experiments. Technical report, University of Arizona (2005)

    Google Scholar 

  13. Barnard, K., Forsyth, D.: Learning the semantics of words and pictures. In: International Conference on Computer Vision, pp. II: 408–415 (2001)

    Google Scholar 

  14. Barnard, K., Johnson, M.: Word sense disambiguation with pictures. Artificial Intelligence 167, 13–30 (2005)

    Article  Google Scholar 

  15. Berg, A.C., Berg, T.L., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: CVPR (2005)

    Google Scholar 

  16. Brill, E.: A simple rule-based part of speech tagger. In: Third Conference on Applied Natural Language Processing. ACL (1992)

    Google Scholar 

  17. Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)

    Google Scholar 

  18. Brown, P.F., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Computational Linguistics 16, 79–85 (1990)

    Google Scholar 

  19. Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of machine translation: parameter estimation. Computational Linguistics 19(10), 263–311 (1993)

    Google Scholar 

  20. Carbonetto, P., de Freitas, N., Barnard, K.: A statistical model for general contextual object recognition. In: European Conference on Computer Vision, vol. 1, pp. 350–362 (2004)

    Google Scholar 

  21. Carson, C., Thomas, M., Belongie, S., Hellerstein, J.M., Malik, J.: Blobworld: A system for region-based image indexing and retrieval. In: Third International Conference on Visual Information Systems. Springer, Heidelberg (1999)

    Google Scholar 

  22. La Cascia, M., Sethi, S., Sclaroff, S.: Combining textual and visual cues for content based image retrieval on the web. In: IEEE Workshop on Content Based Access of Image and Video Libraries, pp. 24–28 (1998)

    Google Scholar 

  23. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  24. Deng, Y., Manjunath, B.S.: Unsupervised segmentation of color-texture regions in images and video. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(8), 800–810 (2001)

    Article  Google Scholar 

  25. Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  26. Edmonds, P., Kilgarriff, A. (eds.): Journal of Natural Language Engineering, vol. 9 (January 2003)

    Google Scholar 

  27. Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Workshop on Generative-Model Based Vision (2004)

    Google Scholar 

  28. Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: Proceedings of CVPR 2004, vol.2, pp.1002–1009 (2004)

    Google Scholar 

  29. Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2003)

    Google Scholar 

  30. Francis, W.N., Kucera, H.: Frequency Analysis of English Usage. Lexicon and Grammar. Houghton Mifflin (1981)

    Google Scholar 

  31. Miller, G., Leacock, C., Randee, T., Bunker, R.: A semantic concordance. In: Procedings of the 3rd DARPA Workshop on Human Language Technology, pp. 303–308 (1993)

    Google Scholar 

  32. Gale, W., Church, K., Yarowsky, D.: One sense per discourse. In: DARPA Workshop on Speech and Natural Language, pp. 233–237 (1992)

    Google Scholar 

  33. Gonzalo, J., Verdejo, F., Chugur, I., Cigarran, J.: Indexing with wordnet synsets can improve text retrieval. In: Proceedings of the COLING/ACL 1998 Workshop on Usage of WordNet for NLP, Montreal, Canada, pp. 38–44 (1998)

    Google Scholar 

  34. Hofmann, T., Puzicha, J.: Statistical models for co-occurrence data. Technical report, Massachusetts Institute of Technology (1998)

    Google Scholar 

  35. Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: SIGIR, pp. 119–126 (2003)

    Google Scholar 

  36. Johnson, M., Barnard, K.: ImCor: A linking of SemCor sense disambiguated text to corel image data (2004), http://kobus.ca/research/data/index.html

  37. Kaplan, A.: An experimental study of ambiguity in context (1950)

    Google Scholar 

  38. Maron, O., Lozano-Perez, T.: A framework for multiple-instance learning. In: Neural Information Processing Systems. MIT Press, Cambridge (1998)

    Google Scholar 

  39. Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. In: The Fifteenth International Conference on Machine Learning (1998)

    Google Scholar 

  40. Melamed, D.: Empirical methods for exploiting parallel texts. MIT Press, Cambridge (2001)

    Google Scholar 

  41. Mihalcea, R., Faruque, E.: Senselearner: Minimally supervised word sense disambiguation for all words in open text. In: Proceedings of ACL/SIGLEX Senseval-3, Barcelona, Spain (July 2004)

    Google Scholar 

  42. Mihalcea, R., Moldovan, D.: Word sense disambiguation based on semantic density. In: Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (August 1998)

    Google Scholar 

  43. Mihalcea, R., Moldovan, D.: An iterative approach to word sense disambiguation. In: Proceedings of Florida Artificial Intelligence Research Society Conference (FLAIRS 2000), Orlando, FL, pp. 219–223 (May 2000)

    Google Scholar 

  44. Montoyo, P.M.: Wordnet enrichment with classification systems. In: Proceedings of NAACL Workshop WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Carnegie Mellon University, Pittsburgh, USA, pp. 101–106 (2001)

    Google Scholar 

  45. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(9), 888–905 (2000)

    Google Scholar 

  46. Shirahatti, N.V., Barnard, K.: Evaluating image retrieval. In: Proceedings of CVPR 2005, vol.1, pp. 955–961 (2005)

    Google Scholar 

  47. Stetina, J., Kurohashi, S., Nagao, M.: General word sense disambiguation method based on A full sentential context. In: Harabagiu, S. (ed.) Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference, pp. 1–8. Association for Computational Linguistics, Somerset (1998)

    Google Scholar 

  48. Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol.II, pp. 762–769 (2004)

    Google Scholar 

  49. Traupman, J., Wilensky, R.: Experiments in improving unsupervised word sense disambiguation. Technical report, University of California at Berkeley (2003)

    Google Scholar 

  50. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd Conference on Applied Natural Language Processing. ACL (1995)

    Google Scholar 

  51. Yngve, V.: Syntax and the problem of multiple meaning. In: Locke, W., Booth, D. (eds.) Machine Translation of Languages, New York, pp. 208–226. Wiley, Chichester (1955)

    Google Scholar 

  52. Zhang, Q., Goldman, S.A.: Em-dd:an improved multiple-instance learning technique. In: Neural Information Processing Systems (2001)

    Google Scholar 

  53. Zhang, Q., Goldman, S.A., Yu, W., Fritts, J.E.: Content-based image retrieval using multiple-instance learning. In: 19th Int. Conf. Machine Learning (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Barnard, K., Yanai, K., Johnson, M., Gabbur, P. (2006). Cross Modal Disambiguation. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds) Toward Category-Level Object Recognition. Lecture Notes in Computer Science, vol 4170. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11957959_13

Download citation

  • DOI: https://doi.org/10.1007/11957959_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68794-8

  • Online ISBN: 978-3-540-68795-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics