Abstract
We consider strategies for reducing ambiguity in multi-modal data, particularly in the domain of images and text. Large data sets containing images with associated text (and vice versa) are readily available, and recent work has exploited such data to learn models for linking visual elements to semantics. This requires addressing a correspondence ambiguity because it is generally not known which parts of the images connect with which language elements. In this paper we first discuss using language processing to reduce correspondence ambiguity in loosely labeled image data. We then consider a similar problem of using visual correlates to reduce ambiguity in text with associated images. Only rudimentary image understanding is needed for this task because the image only needs to help differentiate between a limited set of choices, namely the senses of a particular word.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Proceedings of Senseval-3: The Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (2004)
Agirre, E., Rigau, G.: Word sense disambiguation using conceptual density. In: Proceedings of COLING 1996, Copenhagen, Denmark, pp. 16–22 (1996)
Agirre, E., Rigau, G.: A proposal for word sense disambiguation using conceptual distance. In: Proceedings of the 1st International Conference on Recent Advances in Natural Language Processing (1995)
Amar, R.A., Dooly, D.R., Goldman, S.A., Zhang, Q.: Multiple instance learning of real-valued data. In: 18th Int. Conf. Machine Learning (2001)
Andrews, S., Hofmann, T., Tsochantaridis, I.: Multiple instance learning with generalized support vector machines. In: AAAI (2002)
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Advances in Neural Information Processing Systems, 15 (2002)
Bar-Hillel, Y.: The present status of automatic translation of languages. In: Booth, D., Meagher, R.E. (eds.) Advances in Computers, pp. 91–163. Academic Press, New York (1960)
Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
Barnard, K., Duygulu, P., Forsyth, D.: Exploiting text and image feature co-occurrence statistics in large datasets. In: Veltkamp, R. (ed.) Trends and Advances in Content-Based Image and Video Retrieval. Springer, Heidelberg (to appear)
Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
Barnard, K., Duygulu, P., Raghavendra, K.G., Gabbur, P., Forsyth, D.: The effects of segmentation and feature choice in a translation model of object recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, vol.II, pp. 675–682 (2003)
Barnard, K., Fan, Q., Swaminathan, R., Hoogs, A., Collins, R., Rondot, P., Kaufhold, J.: Evaluation of localized semantics: data, methodology, and experiments. Technical report, University of Arizona (2005)
Barnard, K., Forsyth, D.: Learning the semantics of words and pictures. In: International Conference on Computer Vision, pp. II: 408–415 (2001)
Barnard, K., Johnson, M.: Word sense disambiguation with pictures. Artificial Intelligence 167, 13–30 (2005)
Berg, A.C., Berg, T.L., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: CVPR (2005)
Brill, E.: A simple rule-based part of speech tagger. In: Third Conference on Applied Natural Language Processing. ACL (1992)
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)
Brown, P.F., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A statistical approach to machine translation. Computational Linguistics 16, 79–85 (1990)
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of machine translation: parameter estimation. Computational Linguistics 19(10), 263–311 (1993)
Carbonetto, P., de Freitas, N., Barnard, K.: A statistical model for general contextual object recognition. In: European Conference on Computer Vision, vol. 1, pp. 350–362 (2004)
Carson, C., Thomas, M., Belongie, S., Hellerstein, J.M., Malik, J.: Blobworld: A system for region-based image indexing and retrieval. In: Third International Conference on Visual Information Systems. Springer, Heidelberg (1999)
La Cascia, M., Sethi, S., Sclaroff, S.: Combining textual and visual cues for content based image retrieval on the web. In: IEEE Workshop on Content Based Access of Image and Video Libraries, pp. 24–28 (1998)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)
Deng, Y., Manjunath, B.S.: Unsupervised segmentation of color-texture regions in images and video. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(8), 800–810 (2001)
Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Edmonds, P., Kilgarriff, A. (eds.): Journal of Natural Language Engineering, vol. 9 (January 2003)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Workshop on Generative-Model Based Vision (2004)
Feng, S.L., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: Proceedings of CVPR 2004, vol.2, pp.1002–1009 (2004)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: IEEE Conference on Computer Vision and Pattern Recognition (2003)
Francis, W.N., Kucera, H.: Frequency Analysis of English Usage. Lexicon and Grammar. Houghton Mifflin (1981)
Miller, G., Leacock, C., Randee, T., Bunker, R.: A semantic concordance. In: Procedings of the 3rd DARPA Workshop on Human Language Technology, pp. 303–308 (1993)
Gale, W., Church, K., Yarowsky, D.: One sense per discourse. In: DARPA Workshop on Speech and Natural Language, pp. 233–237 (1992)
Gonzalo, J., Verdejo, F., Chugur, I., Cigarran, J.: Indexing with wordnet synsets can improve text retrieval. In: Proceedings of the COLING/ACL 1998 Workshop on Usage of WordNet for NLP, Montreal, Canada, pp. 38–44 (1998)
Hofmann, T., Puzicha, J.: Statistical models for co-occurrence data. Technical report, Massachusetts Institute of Technology (1998)
Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: SIGIR, pp. 119–126 (2003)
Johnson, M., Barnard, K.: ImCor: A linking of SemCor sense disambiguated text to corel image data (2004), http://kobus.ca/research/data/index.html
Kaplan, A.: An experimental study of ambiguity in context (1950)
Maron, O., Lozano-Perez, T.: A framework for multiple-instance learning. In: Neural Information Processing Systems. MIT Press, Cambridge (1998)
Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. In: The Fifteenth International Conference on Machine Learning (1998)
Melamed, D.: Empirical methods for exploiting parallel texts. MIT Press, Cambridge (2001)
Mihalcea, R., Faruque, E.: Senselearner: Minimally supervised word sense disambiguation for all words in open text. In: Proceedings of ACL/SIGLEX Senseval-3, Barcelona, Spain (July 2004)
Mihalcea, R., Moldovan, D.: Word sense disambiguation based on semantic density. In: Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (August 1998)
Mihalcea, R., Moldovan, D.: An iterative approach to word sense disambiguation. In: Proceedings of Florida Artificial Intelligence Research Society Conference (FLAIRS 2000), Orlando, FL, pp. 219–223 (May 2000)
Montoyo, P.M.: Wordnet enrichment with classification systems. In: Proceedings of NAACL Workshop WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Carnegie Mellon University, Pittsburgh, USA, pp. 101–106 (2001)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(9), 888–905 (2000)
Shirahatti, N.V., Barnard, K.: Evaluating image retrieval. In: Proceedings of CVPR 2005, vol.1, pp. 955–961 (2005)
Stetina, J., Kurohashi, S., Nagao, M.: General word sense disambiguation method based on A full sentential context. In: Harabagiu, S. (ed.) Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference, pp. 1–8. Association for Computational Linguistics, Somerset (1998)
Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol.II, pp. 762–769 (2004)
Traupman, J., Wilensky, R.: Experiments in improving unsupervised word sense disambiguation. Technical report, University of California at Berkeley (2003)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd Conference on Applied Natural Language Processing. ACL (1995)
Yngve, V.: Syntax and the problem of multiple meaning. In: Locke, W., Booth, D. (eds.) Machine Translation of Languages, New York, pp. 208–226. Wiley, Chichester (1955)
Zhang, Q., Goldman, S.A.: Em-dd:an improved multiple-instance learning technique. In: Neural Information Processing Systems (2001)
Zhang, Q., Goldman, S.A., Yu, W., Fritts, J.E.: Content-based image retrieval using multiple-instance learning. In: 19th Int. Conf. Machine Learning (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Barnard, K., Yanai, K., Johnson, M., Gabbur, P. (2006). Cross Modal Disambiguation. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds) Toward Category-Level Object Recognition. Lecture Notes in Computer Science, vol 4170. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11957959_13
Download citation
DOI: https://doi.org/10.1007/11957959_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68794-8
Online ISBN: 978-3-540-68795-5
eBook Packages: Computer ScienceComputer Science (R0)