Skip to main content

Augmenting Word Space Models for Word Sense Discrimination Using an Automatic Thesaurus

  • Conference paper
Advances in Natural Language Processing (GoTAL 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

  • 1452 Accesses

Abstract

This paper presents an algorithm for Word Sense Discrimination that divides the global representation of a word into a number of classes by determining for any two occurrences whether they belong to the same sense or not. We rely on the notion that words that are used in similar contexts will have the same or a closely related meaning, thus, given a target word, we group its dependency co-occurrences in a Word Space Model. Each cluster represents a distinct meaning or sense of that word. We experiment with augmenting the bag of words of each cluster of co-occurrences, the dictionary of sense definition, and augmenting both. Then we count the number of intersections of each word of the bag of clustered senses and the bag of the dictionary of senses following the Lesk method. We find an increase in recall and a decrease in precision when augmenting. However, the best resulting F-measure is for the option of augmenting the both dictionary of senses and the bag of words from the clusters.

This work was done under partial support of Mexican Government (SNI, SIP-IPN, COFAA-IPN, and PIFI-IPN). We thank to Ted Pedersen and our anonymous reviewers for their useful comments and discussion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Biblioteca de Consulta Microsoft Encarta 2004, Microsoft Corporation (1994–2004)

    Google Scholar 

  2. Buckley, C., Salton, G.: Optimization of Relevance Feedback Weights. In: Annual International ACM-SIGIRr Conference on Research and Development in Information Retrieval (SIGIR 1995), pp. 351–357. ACM Press, New York (2004)

    Google Scholar 

  3. Calvo, H., Gelbukh, A.: DILUCT: An Open-Source Spanish Dependency Parser Based on Rules, Heuristics, and Selectional Preferences. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds.) NLDB 2006. LNCS, vol. 3999, pp. 164–175. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  4. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)

    Article  Google Scholar 

  5. Kilgarriff, A., Rosenzweig, J.: Framework and results for English SENSEVAL. Computers and the Humanities 34(1-2) (2000)

    Google Scholar 

  6. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)

    Article  Google Scholar 

  7. Lin, D.: An information-theoretic measure of similarity. In: Proceedings of ICML 1998, pp. 296–304 (1998)

    Google Scholar 

  8. Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of COLING/ACL 1998, Montreal, Canada (1998)

    Google Scholar 

  9. Márquez, L., Taulé, M., Martí, M.A., Artigas, N., García, M., Real, F., Ferres, D.: Senseval-3: The Spanish lexical sample task. In: Senseval-3 Third international Workshop on the Evaluation of Systems for the Semantic Analysis of Text, ACL, USA, pp. 47–52 (2004)

    Google Scholar 

  10. Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1991)

    Article  Google Scholar 

  11. Purandare, A., Pedersen, T.: Word sense discrimination by clustering contexts in vector and similarity spaces. In: Proceedings of the Conference on Computational Natural Language Learning, Boston, MA, pp. 41–48 (2004)

    Google Scholar 

  12. Rasmussen, M., Karypis, G.: gCLUTO: An Interactive Clustering, Visualization, and Analysis System - Dep. Comput. Sci. Eng., Univ. Minnesota, Tech. Rep. TR (2004)

    Google Scholar 

  13. Rasmussen, M.D., Deshpande, M.S., Karypis, G., Johnson, J., Crow, J.A., Retzel, E.F.: wCLUTO: A Web-Enabled Clustering Toolkit. Plant Physiology 133(2), 510–516 (2003)

    Article  Google Scholar 

  14. Salton, G., McGill, M.: Introduction to modern IR. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  15. Schütze, H.: Automatic word sense discrimination. Comp. Linguistics 24(1), 97–123 (1998)

    Google Scholar 

  16. van der Plas, L., Bouma, G.: Syntactic contexts for finding semantically similar words. In: van der Wouden, T., et al. (eds.) Computational Linguistics in the Netherlands, Selected Papers from the Fifteenth CLIN Meeting, Utrecht, LOT. pp. 173–184 (2004)

    Google Scholar 

  17. Zhao, Y., Karypis, G.: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning 55(3), 311–331 (2004)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Calvo, H. (2008). Augmenting Word Space Models for Word Sense Discrimination Using an Automatic Thesaurus. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85287-2_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85286-5

  • Online ISBN: 978-3-540-85287-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics