Robust Semi-supervised and Ensemble-Based Methods in Word Sense Disambiguation

Søgaard, Anders; Johannsen, Anders

doi:10.1007/978-3-642-14770-8_43

Anders Søgaard²² &
Anders Johannsen²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6233))

Included in the following conference series:

International Conference on Natural Language Processing

1206 Accesses

Abstract

Mihalcea [1] discusses self-training and co-training in the context of word sense disambiguation and shows that parameter optimization on individual words was important to obtain good results. Using smoothed co-training of a naive Bayes classifier she obtains a 9.8% error reduction on Senseval-2 data with a fixed parameter setting. In this paper we test a semi-supervised learning algorithm with no parameters, namely tri-training [2]. We also test the random subspace method [3] for building committees out of stable learners. Both techniques lead to significant error reductions with different learning algorithms, but improvements do not accumulate. Our best error reduction is 7.4%, and our best absolute average over Senseval-2 data, though not directly comparable, is 12% higher than the results reported in Mihalcea [1].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Practice of Word Sense Disambiguation

A New Approach to the Supervised Word Sense Disambiguation

An Analysis of Word Sense Disambiguation (WSD)

References

Mihalcea, R.: Co-training and self-training for word sense disambiguation. In: CONLL, Boston, MA (2004)
Google Scholar
Li, M., Zhou, Z.H.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17(11), 1529–1541 (2005)
Article Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Article Google Scholar
Abney, S.: Semi-supervised learning for computational linguistics. Chapman and Hall, Boca Raton (2008)
Google Scholar
Chen, W., Zhang, Y., Isahara, H.: Chinese chunking with tri-training learning. In: Matsumoto, Y., Sproat, R.W., Wong, K.-F., Zhang, M. (eds.) ICCPOL 2006. LNCS (LNAI), vol. 4285, pp. 466–473. Springer, Heidelberg (2006)
Chapter Google Scholar
Nguyen, T., Nguyen, L., Shimazu, A.: Using semi-supervised learning for question classification. Journal of Natural Language Processing 15, 3–21 (2008)
Google Scholar
García-Pedrajas, N., Ortiz-Boyer, D.: Boosting random subspace method. Neural Networks 21(9), 1344–1362 (2008)
Article Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Annals of Statistics 28(2), 337–407 (2000)
Article MATH MathSciNet Google Scholar
Frank, E., Witten, I.: Generating accurate rule sets without global optimization. In: The 15th International Conference on Machine Learning (1995)
Google Scholar
Sindhwani, V., Keerthi, S.: Large scale semi-supervised linear SVMs. In: ACM SIGIR, Seattle, WA (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Language Technology, University of Copenhagen, Njalsgade 140–142, DK-2300, Copenhagen S
Anders Søgaard & Anders Johannsen

Authors

Anders Søgaard
View author publications
You can also search for this author in PubMed Google Scholar
Anders Johannsen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Reykjavik University, Kringlan 1, 103, Reykjavik, Iceland
Hrafn Loftsson
Department of Icelandic, University of Iceland, Árnagardur v/Sudurgötu, 101, Reykjavik, Iceland
Eiríkur Rögnvaldsson
Arni Magnusson Institute for Icelandic Studies, Neshagi 16, 101, Reykjavik, Iceland
Sigrún Helgadóttir

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Søgaard, A., Johannsen, A. (2010). Robust Semi-supervised and Ensemble-Based Methods in Word Sense Disambiguation. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds) Advances in Natural Language Processing. NLP 2010. Lecture Notes in Computer Science(), vol 6233. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14770-8_43

Download citation

DOI: https://doi.org/10.1007/978-3-642-14770-8_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14769-2
Online ISBN: 978-3-642-14770-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics