Abstract
This paper presents an algorithm for Word Sense Discrimination that divides the global representation of a word into a number of classes by determining for any two occurrences whether they belong to the same sense or not. We rely on the notion that words that are used in similar contexts will have the same or a closely related meaning, thus, given a target word, we group its dependency co-occurrences in a Word Space Model. Each cluster represents a distinct meaning or sense of that word. We experiment with augmenting the bag of words of each cluster of co-occurrences, the dictionary of sense definition, and augmenting both. Then we count the number of intersections of each word of the bag of clustered senses and the bag of the dictionary of senses following the Lesk method. We find an increase in recall and a decrease in precision when augmenting. However, the best resulting F-measure is for the option of augmenting the both dictionary of senses and the bag of words from the clusters.
This work was done under partial support of Mexican Government (SNI, SIP-IPN, COFAA-IPN, and PIFI-IPN). We thank to Ted Pedersen and our anonymous reviewers for their useful comments and discussion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Biblioteca de Consulta Microsoft Encarta 2004, Microsoft Corporation (1994–2004)
Buckley, C., Salton, G.: Optimization of Relevance Feedback Weights. In: Annual International ACM-SIGIRr Conference on Research and Development in Information Retrieval (SIGIR 1995), pp. 351–357. ACM Press, New York (2004)
Calvo, H., Gelbukh, A.: DILUCT: An Open-Source Spanish Dependency Parser Based on Rules, Heuristics, and Selectional Preferences. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds.) NLDB 2006. LNCS, vol. 3999, pp. 164–175. Springer, Heidelberg (2004)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Kilgarriff, A., Rosenzweig, J.: Framework and results for English SENSEVAL. Computers and the Humanities 34(1-2) (2000)
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
Lin, D.: An information-theoretic measure of similarity. In: Proceedings of ICML 1998, pp. 296–304 (1998)
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of COLING/ACL 1998, Montreal, Canada (1998)
Márquez, L., Taulé, M., MartÃ, M.A., Artigas, N., GarcÃa, M., Real, F., Ferres, D.: Senseval-3: The Spanish lexical sample task. In: Senseval-3 Third international Workshop on the Evaluation of Systems for the Semantic Analysis of Text, ACL, USA, pp. 47–52 (2004)
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1991)
Purandare, A., Pedersen, T.: Word sense discrimination by clustering contexts in vector and similarity spaces. In: Proceedings of the Conference on Computational Natural Language Learning, Boston, MA, pp. 41–48 (2004)
Rasmussen, M., Karypis, G.: gCLUTO: An Interactive Clustering, Visualization, and Analysis System - Dep. Comput. Sci. Eng., Univ. Minnesota, Tech. Rep. TR (2004)
Rasmussen, M.D., Deshpande, M.S., Karypis, G., Johnson, J., Crow, J.A., Retzel, E.F.: wCLUTO: A Web-Enabled Clustering Toolkit. Plant Physiology 133(2), 510–516 (2003)
Salton, G., McGill, M.: Introduction to modern IR. McGraw-Hill, New York (1983)
Schütze, H.: Automatic word sense discrimination. Comp. Linguistics 24(1), 97–123 (1998)
van der Plas, L., Bouma, G.: Syntactic contexts for finding semantically similar words. In: van der Wouden, T., et al. (eds.) Computational Linguistics in the Netherlands, Selected Papers from the Fifteenth CLIN Meeting, Utrecht, LOT. pp. 173–184 (2004)
Zhao, Y., Karypis, G.: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning 55(3), 311–331 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Calvo, H. (2008). Augmenting Word Space Models for Word Sense Discrimination Using an Automatic Thesaurus. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-85287-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)