Augmenting Word Space Models for Word Sense Discrimination Using an Automatic Thesaurus

Calvo, Hiram

doi:10.1007/978-3-540-85287-2_10

Hiram Calvo²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

International Conference on Natural Language Processing

1452 Accesses

Abstract

This paper presents an algorithm for Word Sense Discrimination that divides the global representation of a word into a number of classes by determining for any two occurrences whether they belong to the same sense or not. We rely on the notion that words that are used in similar contexts will have the same or a closely related meaning, thus, given a target word, we group its dependency co-occurrences in a Word Space Model. Each cluster represents a distinct meaning or sense of that word. We experiment with augmenting the bag of words of each cluster of co-occurrences, the dictionary of sense definition, and augmenting both. Then we count the number of intersections of each word of the bag of clustered senses and the bag of the dictionary of senses following the Lesk method. We find an increase in recall and a decrease in precision when augmenting. However, the best resulting F-measure is for the option of augmenting the both dictionary of senses and the bag of words from the clusters.

This work was done under partial support of Mexican Government (SNI, SIP-IPN, COFAA-IPN, and PIFI-IPN). We thank to Ted Pedersen and our anonymous reviewers for their useful comments and discussion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Biblioteca de Consulta Microsoft Encarta 2004, Microsoft Corporation (1994–2004)
Google Scholar
Buckley, C., Salton, G.: Optimization of Relevance Feedback Weights. In: Annual International ACM-SIGIRr Conference on Research and Development in Information Retrieval (SIGIR 1995), pp. 351–357. ACM Press, New York (2004)
Google Scholar
Calvo, H., Gelbukh, A.: DILUCT: An Open-Source Spanish Dependency Parser Based on Rules, Heuristics, and Selectional Preferences. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds.) NLDB 2006. LNCS, vol. 3999, pp. 164–175. Springer, Heidelberg (2004)
Chapter Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Article Google Scholar
Kilgarriff, A., Rosenzweig, J.: Framework and results for English SENSEVAL. Computers and the Humanities 34(1-2) (2000)
Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
Article Google Scholar
Lin, D.: An information-theoretic measure of similarity. In: Proceedings of ICML 1998, pp. 296–304 (1998)
Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of COLING/ACL 1998, Montreal, Canada (1998)
Google Scholar
Márquez, L., Taulé, M., Martí, M.A., Artigas, N., García, M., Real, F., Ferres, D.: Senseval-3: The Spanish lexical sample task. In: Senseval-3 Third international Workshop on the Evaluation of Systems for the Semantic Analysis of Text, ACL, USA, pp. 47–52 (2004)
Google Scholar
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1991)
Article Google Scholar
Purandare, A., Pedersen, T.: Word sense discrimination by clustering contexts in vector and similarity spaces. In: Proceedings of the Conference on Computational Natural Language Learning, Boston, MA, pp. 41–48 (2004)
Google Scholar
Rasmussen, M., Karypis, G.: gCLUTO: An Interactive Clustering, Visualization, and Analysis System - Dep. Comput. Sci. Eng., Univ. Minnesota, Tech. Rep. TR (2004)
Google Scholar
Rasmussen, M.D., Deshpande, M.S., Karypis, G., Johnson, J., Crow, J.A., Retzel, E.F.: wCLUTO: A Web-Enabled Clustering Toolkit. Plant Physiology 133(2), 510–516 (2003)
Article Google Scholar
Salton, G., McGill, M.: Introduction to modern IR. McGraw-Hill, New York (1983)
MATH Google Scholar
Schütze, H.: Automatic word sense discrimination. Comp. Linguistics 24(1), 97–123 (1998)
Google Scholar
van der Plas, L., Bouma, G.: Syntactic contexts for finding semantically similar words. In: van der Wouden, T., et al. (eds.) Computational Linguistics in the Netherlands, Selected Papers from the Fifteenth CLIN Meeting, Utrecht, LOT. pp. 173–184 (2004)
Google Scholar
Zhao, Y., Karypis, G.: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning 55(3), 311–331 (2004)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Center for Research in Computing, National Polytechnic Institute, Mexico City, 07738, Mexico
Hiram Calvo

Authors

Hiram Calvo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 41296, Göteborg, Sweden
Bengt Nordström & Aarne Ranta &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Calvo, H. (2008). Augmenting Word Space Models for Word Sense Discrimination Using an Automatic Thesaurus. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-85287-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics