Skip to main content

A Generalized Method for Word Sense Disambiguation Based on Wikipedia

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6611))

Abstract

In this paper we propose a general framework for word sense disambiguation using knowledge latent in Wikipedia. Specifically, we exploit the rich and growing Wikipedia corpus in order to achieve a large and robust knowledge repository consisting of keyphrases and their associated candidate topics. Keyphrases are mainly derived from Wikipedia article titles and anchor texts associated with wikilinks. The disambiguation of a given keyphrase is based on both the commonness of a candidate topic and the context-dependent relatedness where unnecessary (and potentially noisy) context information is pruned. With extensive experimental evaluations using different relatedness measures, we show that the proposed technique achieved comparable disambiguation accuracies with respect to state-of-the-art techniques, while incurring orders of magnitude less computation cost.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Artiles, J., Gonzalo, J., Sekine, S.: Weps 2 evaluation campaign: overview of the web people search clustering task. In: Web People Search Evaluation Workshop (WePS), WWW Conference (2009)

    Google Scholar 

  2. Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Int’l Conf. on Computational Linguistics, pp. 79–85 (1998)

    Google Scholar 

  3. Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, pp. 708–716 (2007)

    Google Scholar 

  4. Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In: AAAI, pp. 1301–1306 (2006)

    Google Scholar 

  5. Giles, J.: Internet encyclopaedias go head to head. Nature 438 (December 2005)

    Google Scholar 

  6. Gliozzo, A., Giuliano, C., Strapparava, C.: Domain kernels for word sense disambiguation. In: ACL, pp. 403–410 (2005)

    Google Scholar 

  7. Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: WWW, pp. 661–670 (2009)

    Google Scholar 

  8. Han, X., Zhao, J.: Named entity disambiguation by leveraging wikipedia semantic knowledge. In: ACM CIKM, pp. 215–224 (2009)

    Google Scholar 

  9. Hu, X., Zhang, X., Lu, C., Park, E.K., Zhou, X.: Exploiting wikipedia as external knowledge for document clustering. In: ACM KDD, pp. 389–396 (2009)

    Google Scholar 

  10. Lee, Y.K., Ng, H.T.: An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: EMNLP, pp. 41–48 (2002)

    Google Scholar 

  11. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: SIGDOC, pp. 24–26 (1986)

    Google Scholar 

  12. Mann, G.S., Yarowsky, D.: Unsupervised personal name disambiguation. In: HLT-NAACL, pp. 33–40 (2003)

    Google Scholar 

  13. Medelyan, O., Witten, I.H., Milne, D.: Topic indexing with wikipedia. In: AAAI Workshop on Wikipedia and Artificial Intelligence (2008)

    Google Scholar 

  14. Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: ACM CIKM, pp. 233–242 (2007)

    Google Scholar 

  15. Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: AAAI Workshop on Wikipedia and Artificial Intelligence (2008)

    Google Scholar 

  16. Milne, D., Witten, I.H.: Learning to link with wikipedia. In: ACM CIKM, pp. 509–518 (2008)

    Google Scholar 

  17. Pedersen, T.: A decision tree of bigrams is an accurate predictor of word sense. In: NAACL, pp. 1–8 (2001)

    Google Scholar 

  18. Ravin, Y., Kazi, Z.: Is hillary rodham clinton the president?: disambiguating names across documents. In: Workshop on Coreference and its Applications (CorefApp), pp. 9–16 (1999)

    Google Scholar 

  19. Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: AAAI, pp. 1419–1424 (2006)

    Google Scholar 

  20. Turdakov, D., Velikhov, P.: Semantic relatedness metric for wikipedia concepts based on link analysis and its application to word sense disambiguation. In: SYRCoDIS. CEUR Workshop Proceedings, vol. 355 (2008)

    Google Scholar 

  21. Wang, P., Domeniconi, C.: Building semantic kernels for text classification using wikipedia. In: ACM KDD, pp. 713–721 (2008)

    Google Scholar 

  22. Yoshida, M., Ikeda, M., Ono, S., Sato, I., Nakagawa, H.: Person name disambiguation by bootstrapping. In: ACM SIGIR, pp. 10–17 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, C., Sun, A., Datta, A. (2011). A Generalized Method for Word Sense Disambiguation Based on Wikipedia. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20161-5_65

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20160-8

  • Online ISBN: 978-3-642-20161-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics