Skip to main content

A Baseline Methodology for Word Sense Disambiguation

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2276))

Abstract

This paper describes a methodology for supervised word sense disambiguation that relies on standard machine learning algorithms to induce classifiers from sense-tagged training examples where the context in which ambiguous words occur are represented by simple lexical features. This constitutes a baseline approach since it produces classifiers based on easy to identify features that result in accurate disambiguation across a variety of languages. This paper reviews several systems based on this methodology that participated in the Spanish and English lexical sample tasks of the SENSEVAL-2 comparative exercise among word sense disambiguation systems. These systems fared much better than standard baselines, and were within seven to ten percentage points of accuracy of the mostly highly ranked systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Y. Choueka and S. Lusignan. Disambiguation by short contexts. Computers and the Humanities, 19:147–157, 1985.

    Article  Google Scholar 

  2. R. Duda and P. Hart. Pattern Classification and Scene Analysis. Wiley, New York, NY, 1973.

    MATH  Google Scholar 

  3. S. McRoy. Using multiple knowledge sources for word sense discrimination. Computational Linguistics, 18(1):1–30, 1992.

    Google Scholar 

  4. R. Mooney. Comparative experiments on disambiguating word senses: An illustration of the role of bias in machine learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 82–91, May 1996.

    Google Scholar 

  5. H.T. Ng and H.B. Lee. Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pages 40–47, 1996.

    Google Scholar 

  6. T. Pedersen. A simple approach to building ensembles of Naive Bayesian classifiers for word sense disambiguation. In Proceedings of the First Annual Meeting of the North American Chapter of the Association for Computational Linguistics, pages 63–69, Seattle, WA, May 2000.

    Google Scholar 

  7. T. Pedersen. A decision tree of bigrams is an accurate predictor of word sense. In Proceedings of the Second Annual Meeting of the North American Chapter of the Association for Computational Linguistics, pages 79–86, Pittsburgh, July 2001.

    Google Scholar 

  8. T. Pedersen. Machine learning with lexical features: The Duluth approach to SENSEVAL-2. In Proceedings of the SENSEVAL-2 Workshop, Toulouse, July 2001.

    Google Scholar 

  9. T. Pedersen and R. Bruce. A new supervised learning algorithm for word sense disambiguation. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 604–609, Providence, RI, July 1997.

    Google Scholar 

  10. I. Witten and E. Frank. Data Mining-Practical Machine Learning Tools and Techniques with Java Implementations. Morgan-Kaufmann, San Francisco, CA, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pedersen, T. (2002). A Baseline Methodology for Word Sense Disambiguation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2002. Lecture Notes in Computer Science, vol 2276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45715-1_10

Download citation

  • DOI: https://doi.org/10.1007/3-540-45715-1_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43219-7

  • Online ISBN: 978-3-540-45715-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics