Abstract
Ambiguity is an inherent problem for many tasks in Natural Language Processing. Unsupervised and semi-supervised approaches to ambiguity resolution are appealing as they lower the cost of manual labour. Typically, those methods struggle with estimation of number of senses without supervision. This paper shows research on using stopping functions applied to clustering algorithms for estimation of number of senses. The experiments were performed for Polish and English. We found that estimation based on PK2 stopping functions is encouraging, but only when using coarse-grained distinctions between senses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agirre, E., Edmonds, P. (eds.): Word Sense Disambiguation: Algorithms and Applications. Springer, Heidelberg (2006)
Pedersen, T., Kulkarni, A.: Selecting the right number of senses based on clustering criterion functions (2006)
Pawlowski, A.: Metody kwantytatywne w sekwencyjnej analizie danych. English title: Quantitative methods in sequential data analysis. Katedra Lingwistyki Formalnej Uniwersytetu Warszawskiego (2006)
Frey, B., Dueck, D.: Clustering by passing messages between data points. Science 315(5814) (2007)
Milligan, G., Cooper, M.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 159–179 (1985)
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics Simulation and Computation 3(1), 1–27 (1974)
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. Journal Of The Royal Statistical Society Series B 63(2), 411–423 (2001)
Mojena, R.: Hierarchical grouping methods and stopping rules: an evaluation. Computer Journal 20(4) (1977)
Broda, B., Piasecki, M., Maziarz, M.: Evaluating LexCSD — a weakly-supervised method on improved semantically annotated corpus in a large scale experiment. In: Intelligent Information Systems (2010)
Fellbaum, C., et al.: WordNet: An electronic lexical database. MIT press, Cambridge (1998)
Piasecki, M., Szpakowicz, S., Broda, B.: A wordnet from the ground up. Oficyna wydawnicza Politechniki Wroclawskiej (2009)
Przepiórkowski, A.: The IPI PAN Corpus: Preliminary version. Institute of Computer Science PAS (2004)
Weiss, D.: Korpus Rzeczpospolitej (2008), http://www.cs.put.poznan.pl/dweiss/rzeczpospolita
Kilgarriff, A., Rosenzweig, J.: Framework and results for English SENSEVAL. Computers and the Humanities 34(1), 15–48 (2000)
Edmonds, P.: SENSEVAL: The evaluation of word sense disambiguation systems. ELRA Newsletter 7(3), 5–14 (2002)
Mihalcea, R., Chklovski, T., Kilgarriff, A.: The Senseval-3 English lexical sample task. In: 3rd Int. Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 25–28 (2004)
Pradhan, S.S., Loper, E., Dligach, D., Palmer, M.: SemEval-2007 task 17: English lexical sample, SRL and all words. In: Proc. of the 4th International Workshop on Semantic Evaluations, pp. 87–92. ACL (2007)
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning 55(3), 311–331 (2004)
Pedersen, T., Kulkarni, A.: Automatic cluster stopping with criterion functions and the Gap Statistic. In: Proceedings of the Demo Session of NAACL (2006)
Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: Ontonotes: the 90% solution. In: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers on XX, pp. 57–60. Association for Computational Linguistics (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Broda, B., Kędzia, P. (2011). Finding the Optimal Number of Clusters for Word Sense Disambiguation. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_49
Download citation
DOI: https://doi.org/10.1007/978-3-642-23538-2_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23537-5
Online ISBN: 978-3-642-23538-2
eBook Packages: Computer ScienceComputer Science (R0)