Skip to main content
Log in

Cross-Lingual Sense Determination: Can It Work?

  • Published:
Computers and the Humanities Aims and scope Submit manuscript

Abstract

This article reports the results of apreliminary analysis of translation equivalents infour languages from different language families,extracted from an on-line parallel corpus of GeorgeOrwell's Nineteen Eighty-Four. The goal ofthe study is to determine the degree to whichtranslation equivalents for different meanings of apolysemous word in English are lexicalized differentlyacross a variety of languages, and to determinewhether this information can be used to structure orcreate a set of sense distinctions useful in naturallanguage processing applications. A coherenceindex is computed that measures the tendency fordifferent senses of the same English word to belexicalized differently, and from this data aclustering algorithm is used to create sensehierarchies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Carletta, J. "Assessing Agreement on Classification Tasks: The Kappa Statistic". Computational Linguistics, 22(2) (1996), 249–254.

    Google Scholar 

  • Dagan, I. and A. Itai. "Word Sense Disambiguation Using a Second Language Monolingual Corpus". Computational Linguistics, 20(4) (1994), 563–596.

    Google Scholar 

  • Dagan, I., A. Itai and U. Schwall. "Two Languages Are More Informative Than One". Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, 18–21 June 1991, Berkeley, California, 1991, pp. 130–137.

  • Dyvik, H.. "Translations as Semantic Mirrors". Proceedings of Workshop W13: Multilinguality in the Lexicon II, The 13th Biennial European Conference on Artificial Intelligence (ECAI 98), Brighton, UK, 1998, pp. 24–44.

  • Erjavec, T. and N. Ide. "The MULTEXT-EAST Corpus". Proceedings of the First International Conference on Language Resources and Evaluation, 27–30 May 1998, Granada, 1998, pp. 971–974.

  • Erjavec, T., A. Lawson and L. Romary. "East Meets West: Producing Multilingual Resources in a European Context". Proceedings of the First International Conference on Language Resources and Evaluation, 27–30 May 1998, Granada, 1998, pp. 981–986.

  • Fellbaum, C. (ed.). Word Net: An Electronic Lexical Database. Cambridge, MA: MIT Press, 1998.

    Google Scholar 

  • Gale, W. A., K.W. Church and D. Yarowsky. "A Method for Disambiguating Word Senses in a Large Corpus". Computers and the Humanities, 26, 415–439.

  • Hearst, M. A. "Noun Homograph Disambiguation Using Local Context in Large Corpora". Proceedings of the 7th Annual Conference of the University of Waterloo Centre for the New OED and Text Research, Oxford, United Kingdom, 1991, pp. 1–19.

  • Ide, N. and J. Véronis. "Word Sense Disambiguation: The State of the Art". Computational Linguistics, 24(1) (1998), 1–40.

    Google Scholar 

  • Leacock, C., G. Towell and E. Voorhees. "Corpus-based Statistical Sense Resolution". Proceedings of the ARPA Human Language Technology Worskshop, Morgan Kaufman: San Francisco, 1993.

    Google Scholar 

  • Melamed, I. D. "Measuring Semantic Entropy". ACL-SIGLEX Workshop Tagging Text with Lexical Semantics: Why, What, and How? April 4–5, 1997, Washington, D.C., 1997, 41–46.

  • Miller, G. A., R. T. F. Beckwith, D. Christiane, D. Gross and K. J. Miller. "WordNet: An On-line Lexical Database". International Journal of Lexicography, 3(4) (1990), 235–244.

    Google Scholar 

  • Priest-Dorman, G., T. Erjavec, N. Ide and V. Petkevic. Corpus Markup. COP Project 106 MULTEXT-East Deliverable D2.3 F. Available at http://nl.ijs.si/ME/CD/docs/mte-d23f/mte-D23F.html, 1997.

  • Resnik, P., M. Broman Olsen and M. Diab (in press). "Creating a Parallel Corpus from the Book of 2000 Tongues". Computers and the Humanities.

  • Resnik, P. and D. Yarowsky (submitted). "Distinguishing Systems and Distinguishing Senses: New Evaluation Methods for Word Sense Disambiguation". Submitted to Natural Language Engineering.

  • Resnik, P. and D. Yarowsky. "A Perspective on Word Sense Disambiguation Methods and Their Evaluation". ACL-SIGLEX Workshop Tagging Text with Lexical Semantics: Why, What, and How? April 4–5, 1997, Washington, D.C., 1997, pp. 79–86.

  • Schütze, H. "Dimensions of Meaning". Proceedings of Supercomputing'92. Los Alamitos, California: IEEE Computer Society Press, 1992, pp. 787–796.

    Google Scholar 

  • Schütze, H. "Word Space". In Advances in Neural Information Processing Systems 5. Eds. S.J. Hanson, J.D. Cowan and C.L. Giles, San Mateo, California: Morgan Kauffman, 1993, pp. 5, 895–902.

    Google Scholar 

  • Vossen, P. (ed.). "EuroWordNet: A Multilingual Database with Lexical Semantic Networks". Computers and the Humanities, 32 (1998), 2–3.

  • Wilks, Y. and M. Stevenson. "Word Sense Disambiguation Using Optimized Combinations of Knowledge Sources". Proceedings of COLING/ACL-98, Montreal, August, 1998.

  • Yarowsky, D.. "Word Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora". Proceedings of the 14th International Conference on Computational Linguistics, COLING'92, 23–28 August, Nantes, France, 1992, pp. 454–460.

  • Yarowsky, D.. "One Sense per Collocation". Proceedings of the ARPA Human Language Technology Workshop, New Jersey: Princeton, 1993, pp. 266–271.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ide, N. Cross-Lingual Sense Determination: Can It Work?. Computers and the Humanities 34, 223–234 (2000). https://doi.org/10.1023/A:1002475423737

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1002475423737

Navigation