A Baseline Methodology for Word Sense Disambiguation

Pedersen, Ted

doi:10.1007/3-540-45715-1_10

Ted Pedersen⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2276))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1504 Accesses
7 Citations

Abstract

This paper describes a methodology for supervised word sense disambiguation that relies on standard machine learning algorithms to induce classifiers from sense-tagged training examples where the context in which ambiguous words occur are represented by simple lexical features. This constitutes a baseline approach since it produces classifiers based on easy to identify features that result in accurate disambiguation across a variety of languages. This paper reviews several systems based on this methodology that participated in the Spanish and English lexical sample tasks of the SENSEVAL-2 comparative exercise among word sense disambiguation systems. These systems fared much better than standard baselines, and were within seven to ten percentage points of accuracy of the mostly highly ranked systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Y. Choueka and S. Lusignan. Disambiguation by short contexts. Computers and the Humanities, 19:147–157, 1985.
Article Google Scholar
R. Duda and P. Hart. Pattern Classification and Scene Analysis. Wiley, New York, NY, 1973.
MATH Google Scholar
S. McRoy. Using multiple knowledge sources for word sense discrimination. Computational Linguistics, 18(1):1–30, 1992.
Google Scholar
R. Mooney. Comparative experiments on disambiguating word senses: An illustration of the role of bias in machine learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 82–91, May 1996.
Google Scholar
H.T. Ng and H.B. Lee. Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pages 40–47, 1996.
Google Scholar
T. Pedersen. A simple approach to building ensembles of Naive Bayesian classifiers for word sense disambiguation. In Proceedings of the First Annual Meeting of the North American Chapter of the Association for Computational Linguistics, pages 63–69, Seattle, WA, May 2000.
Google Scholar
T. Pedersen. A decision tree of bigrams is an accurate predictor of word sense. In Proceedings of the Second Annual Meeting of the North American Chapter of the Association for Computational Linguistics, pages 79–86, Pittsburgh, July 2001.
Google Scholar
T. Pedersen. Machine learning with lexical features: The Duluth approach to SENSEVAL-2. In Proceedings of the SENSEVAL-2 Workshop, Toulouse, July 2001.
Google Scholar
T. Pedersen and R. Bruce. A new supervised learning algorithm for word sense disambiguation. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pages 604–609, Providence, RI, July 1997.
Google Scholar
I. Witten and E. Frank. Data Mining-Practical Machine Learning Tools and Techniques with Java Implementations. Morgan-Kaufmann, San Francisco, CA, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Minnesota, 55812, Duluth, MN, USA
Ted Pedersen

Authors

Ted Pedersen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CIC Centro de Investigacion en Computacion, IPN Instituto Politecnico Nacional, Col Zacateno, CP 07738, Mexico DF, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pedersen, T. (2002). A Baseline Methodology for Word Sense Disambiguation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2002. Lecture Notes in Computer Science, vol 2276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45715-1_10

Download citation

DOI: https://doi.org/10.1007/3-540-45715-1_10
Published: 05 February 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43219-7
Online ISBN: 978-3-540-45715-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics