Cross Modal Disambiguation

Barnard, Kobus; Yanai, Keiji; Johnson, Matthew; Gabbur, Prasad

doi:10.1007/11957959_13

Kobus Barnard²⁰,
Keiji Yanai²¹,
Matthew Johnson²² &
…
Prasad Gabbur²³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4170))

2777 Accesses
2 Citations

Abstract

We consider strategies for reducing ambiguity in multi-modal data, particularly in the domain of images and text. Large data sets containing images with associated text (and vice versa) are readily available, and recent work has exploited such data to learn models for linking visual elements to semantics. This requires addressing a correspondence ambiguity because it is generally not known which parts of the images connect with which language elements. In this paper we first discuss using language processing to reduce correspondence ambiguity in loosely labeled image data. We then consider a similar problem of using visual correlates to reduce ambiguity in text with associated images. Only rudimentary image understanding is needed for this task because the image only needs to help differentiate between a limited set of choices, namely the senses of a particular word.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Proceedings of Senseval-3: The Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (2004)
Google Scholar
Agirre, E., Rigau, G.: Word sense disambiguation using conceptual density. In: Proceedings of COLING 1996, Copenhagen, Denmark, pp. 16–22 (1996)
Google Scholar
Agirre, E., Rigau, G.: A proposal for word sense disambiguation using conceptual distance. In: Proceedings of the 1st International Conference on Recent Advances in Natural Language Processing (1995)
Google Scholar
Amar, R.A., Dooly, D.R., Goldman, S.A., Zhang, Q.: Multiple instance learning of real-valued data. In: 18th Int. Conf. Machine Learning (2001)
Google Scholar
Andrews, S., Hofmann, T., Tsochantaridis, I.: Multiple instance learning with generalized support vector machines. In: AAAI (2002)
Google Scholar
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Advances in Neural Information Processing Systems, 15 (2002)
Google Scholar
Bar-Hillel, Y.: The present status of automatic translation of languages. In: Booth, D., Meagher, R.E. (eds.) Advances in Computers, pp. 91–163. Academic Press, New York (1960)
Google Scholar
Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
Article MATH Google Scholar
Barnard, K., Duygulu, P., Forsyth, D.: Exploiting text and image feature co-occurrence statistics in large datasets. In: Veltkamp, R. (ed.) Trends and Advances in Content-Based Image and Video Retrieval. Springer, Heidelberg (to appear)
Google Scholar
Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
Article MATH Google Scholar
Barnard, K., Duygulu, P., Raghavendra, K.G., Gabbur, P., Forsyth, D.: The effects of segmentation and feature choice in a translation model of object recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, vol.II, pp. 675–682 (2003)
Google Scholar
Barnard, K., Fan, Q., Swaminathan, R., Hoogs, A., Collins, R., Rondot, P., Kaufhold, J.: Evaluation of localized semantics: data, methodology, and experiments. Technical report, University of Arizona (2005)
Google Scholar
Barnard, K., Forsyth, D.: Learning the semantics of words and pictures. In: International Conference on Computer Vision, pp. II: 408–415 (2001)
Google Scholar
Barnard, K., Johnson, M.: Word sense disambiguation with pictures. Artificial Intelligence 167, 13–30 (2005)
Article Google Scholar
Berg, A.C., Berg, T.L., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: CVPR (2005)
Google Scholar
Brill, E.: A simple rule-based part of speech tagger. In: Third Conference on Applied Natural Language Processing. ACL (1992)
Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)
Google Scholar
Brown, P.F., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Computational Linguistics 16, 79–85 (1990)
Google Scholar
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of machine translation: parameter estimation. Computational Linguistics 19(10), 263–311 (1993)
Google Scholar
Carbonetto, P., de Freitas, N., Barnard, K.: A statistical model for general contextual object recognition. In: European Conference on Computer Vision, vol. 1, pp. 350–362 (2004)
Google Scholar
Carson, C., Thomas, M., Belongie, S., Hellerstein, J.M., Malik, J.: Blobworld: A system for region-based image indexing and retrieval. In: Third International Conference on Visual Information Systems. Springer, Heidelberg (1999)
Google Scholar
La Cascia, M., Sethi, S., Sclaroff, S.: Combining textual and visual cues for content based image retrieval on the web. In: IEEE Workshop on Content Based Access of Image and Video Libraries, pp. 24–28 (1998)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)
MATH MathSciNet Google Scholar
Deng, Y., Manjunath, B.S.: Unsupervised segmentation of color-texture regions in images and video. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(8), 800–810 (2001)
Article Google Scholar
Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Chapter Google Scholar
Edmonds, P., Kilgarriff, A. (eds.): Journal of Natural Language Engineering, vol. 9 (January 2003)
Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Workshop on Generative-Model Based Vision (2004)
Google Scholar
Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: Proceedings of CVPR 2004, vol.2, pp.1002–1009 (2004)
Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2003)
Google Scholar
Francis, W.N., Kucera, H.: Frequency Analysis of English Usage. Lexicon and Grammar. Houghton Mifflin (1981)
Google Scholar
Miller, G., Leacock, C., Randee, T., Bunker, R.: A semantic concordance. In: Procedings of the 3rd DARPA Workshop on Human Language Technology, pp. 303–308 (1993)
Google Scholar
Gale, W., Church, K., Yarowsky, D.: One sense per discourse. In: DARPA Workshop on Speech and Natural Language, pp. 233–237 (1992)
Google Scholar
Gonzalo, J., Verdejo, F., Chugur, I., Cigarran, J.: Indexing with wordnet synsets can improve text retrieval. In: Proceedings of the COLING/ACL 1998 Workshop on Usage of WordNet for NLP, Montreal, Canada, pp. 38–44 (1998)
Google Scholar
Hofmann, T., Puzicha, J.: Statistical models for co-occurrence data. Technical report, Massachusetts Institute of Technology (1998)
Google Scholar
Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: SIGIR, pp. 119–126 (2003)
Google Scholar
Johnson, M., Barnard, K.: ImCor: A linking of SemCor sense disambiguated text to corel image data (2004), http://kobus.ca/research/data/index.html
Kaplan, A.: An experimental study of ambiguity in context (1950)
Google Scholar
Maron, O., Lozano-Perez, T.: A framework for multiple-instance learning. In: Neural Information Processing Systems. MIT Press, Cambridge (1998)
Google Scholar
Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. In: The Fifteenth International Conference on Machine Learning (1998)
Google Scholar
Melamed, D.: Empirical methods for exploiting parallel texts. MIT Press, Cambridge (2001)
Google Scholar
Mihalcea, R., Faruque, E.: Senselearner: Minimally supervised word sense disambiguation for all words in open text. In: Proceedings of ACL/SIGLEX Senseval-3, Barcelona, Spain (July 2004)
Google Scholar
Mihalcea, R., Moldovan, D.: Word sense disambiguation based on semantic density. In: Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (August 1998)
Google Scholar
Mihalcea, R., Moldovan, D.: An iterative approach to word sense disambiguation. In: Proceedings of Florida Artificial Intelligence Research Society Conference (FLAIRS 2000), Orlando, FL, pp. 219–223 (May 2000)
Google Scholar
Montoyo, P.M.: Wordnet enrichment with classification systems. In: Proceedings of NAACL Workshop WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Carnegie Mellon University, Pittsburgh, USA, pp. 101–106 (2001)
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(9), 888–905 (2000)
Google Scholar
Shirahatti, N.V., Barnard, K.: Evaluating image retrieval. In: Proceedings of CVPR 2005, vol.1, pp. 955–961 (2005)
Google Scholar
Stetina, J., Kurohashi, S., Nagao, M.: General word sense disambiguation method based on A full sentential context. In: Harabagiu, S. (ed.) Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference, pp. 1–8. Association for Computational Linguistics, Somerset (1998)
Google Scholar
Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol.II, pp. 762–769 (2004)
Google Scholar
Traupman, J., Wilensky, R.: Experiments in improving unsupervised word sense disambiguation. Technical report, University of California at Berkeley (2003)
Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd Conference on Applied Natural Language Processing. ACL (1995)
Google Scholar
Yngve, V.: Syntax and the problem of multiple meaning. In: Locke, W., Booth, D. (eds.) Machine Translation of Languages, New York, pp. 208–226. Wiley, Chichester (1955)
Google Scholar
Zhang, Q., Goldman, S.A.: Em-dd:an improved multiple-instance learning technique. In: Neural Information Processing Systems (2001)
Google Scholar
Zhang, Q., Goldman, S.A., Yu, W., Fritts, J.E.: Content-based image retrieval using multiple-instance learning. In: 19th Int. Conf. Machine Learning (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Arizona,
Kobus Barnard
Department of Computer Science, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo, 182-8585, Japan
Keiji Yanai
Department of Engineering, University of Cambridge,
Matthew Johnson
Electrical and Computer Engineering, University of Arizona,
Prasad Gabbur

Authors

Kobus Barnard
View author publications
You can also search for this author in PubMed Google Scholar
Keiji Yanai
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Prasad Gabbur
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Département d’Informatique, Ecole Normale Supérieure, P.O. Box, Paris, France
Jean Ponce
Carnegie Mellon University, Pittsburgh, USA
Martial Hebert
GRAVIR-INRIA, 655 avenue de l’Europe, P.O. Box, 38330, Montbonnot, France
Cordelia Schmid
Department of Engineering Science, University of Oxford, Parks Road, OX1 3PJ, Oxford, UK
Andrew Zisserman

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Barnard, K., Yanai, K., Johnson, M., Gabbur, P. (2006). Cross Modal Disambiguation. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds) Toward Category-Level Object Recognition. Lecture Notes in Computer Science, vol 4170. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11957959_13

Download citation

DOI: https://doi.org/10.1007/11957959_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68794-8
Online ISBN: 978-3-540-68795-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics