Skip to main content
Log in

Kernel methods for word sense disambiguation

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Many applications of natural language processing (NLP) need an accurate resolution of various ambiguities existing in natural language. The task of fulfilling this need is also called word sense disambiguation (WSD). WSD is to resolve the correct sense for an instance of a polysemous word. On the other hand, as one of the most popular machine learning approaches, kernel methods have attracted significant interest in recent years and have exhibited fairly high performance in a wide variety of learning tasks. In this paper, we present a survey of the research progress of kernel-based WSD techniques. We start by introducing some preliminary knowledge concerning WSD and kernel methods. Then, a review of the main approaches in the literature is presented, focusing on the following issues: context representation, kernel design and learning algorithms. We also provide some further discussions on the kernel-based WSD approaches. Finally, open problems and future directions are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. http://www.senseval.org/.

  2. A set has closure under an operation if performance of that operation on members of the set always produces a member of the same set; in this case we also say that the set is closed under the operation.

  3. For a fixed c, we take always the largest context \({\varvec{x}}=(t_{-bl} ,\ldots ,t_{-1} ,t_{1},\ldots ,t_{br})\) so that \(bl\le c\) and \(br\le c\). Note that if there exist c words preceding and following the word to be disambiguated, then \(bl=br=c\), otherwise \(bl<c\) or \(br<c\).

  4. This definition is the so-called gap-weighted subsequences kernel, which is one of the most general types of kernels defined on sequences.

References

  • Agirre E, Martínez D (2004) The basque country university system: english and basque tasks. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 44–48

  • Audibert L (2004) Word sense disambiguation criteria: a systematic study. In: Proceedings of 20th international conference on computational linguistics, Geneva, pp 910–916

  • Beck D (2014) Bayesian kernel methods for natural language processing. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, student research workshop, Baltimore, pp 1–9

  • Beck D, Cohn T, Specia L (2014) Joint emotion analysis via multi-task Gaussian processes. In: Proceedings of the 2014 conference on empirical methods in natural language processing, Doha, pp 1798–1803

  • Bhala RV, Abirami S (2014) Trends in word sense disambiguation. Artif Intell Rev 42(2):159–171

    Article  Google Scholar 

  • Bunescu R, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th conference of the european chapter of the association for computational linguistics, Trento, pp 9–16

  • Cabezas C, Resnik P, Stevens J (2001) Supervised sense tagging using support vector machines. In: Proceedings of the 2nd international workshop on evaluating word sense disambiguation systems (Senseval-2), Toulouse, pp 59–62

  • Cancedda N, Gaussier E, Goutte C, Renders J-M (2003) Word-sequences kernels. J Mach Learn Res 3:1059–1082

    MathSciNet  MATH  Google Scholar 

  • Cancedda N, Mahé P (2009) Factored sequence kernels. Neurocomputing 72(7–9):1407–1413

    Article  Google Scholar 

  • Carpuat M, Su W, Wu D (2004) Augmenting ensemble classification for word sense disambiguation with a kernel PCA model. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 88–92

  • Carpuat M, Wu D (2007) Improving statistical machine translation using word sense disambiguation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Czech Republic, Prague, pp 61–72

  • Chan YS, Ng HT, Chiang D (2007a) Word sense disambiguation improves statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics. Czech Republic, Prague, pp 33–40

  • Chan YS, Ng HT, Zhong Z (2007b) NUS-PT: Exploiting parallel texts for word sense disambiguation in the english all-words tasks. In: Proceedings of the 4th international workshop on semantic evaluations (Semeval-2007), Czech Republic, Prague, pp 253–256

  • Cohn T, Specia L (2013) Modelling annotator bias with multi-task Gaussian processes: an application to machine translation quality estimation. In: Proceedings of the 51st annual meeting of the association for computational linguistics, Sofia, pp 32–42

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  • Cristianini N, Shawe-Taylor J, Lodhi H (2002) Latent semantic kernels. J Intell Inf Syst 18(2–3):127–152

    Article  Google Scholar 

  • Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  • Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Escudero G, Màrquez L, Rigau G (2004) TALP system for the English lexical sample task. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 113–116

  • Gärtner T (2003) A survey of kernels for structured data. ACM SIGKDD Explor Newsl 5(1):49–58

    Article  Google Scholar 

  • Ginter F, Boberg J, Järvinen J, Salakoski T (2004) New techniques for disambiguation in natural language and their application to biological text. J Mach Learn Res 5:605–621

    MathSciNet  Google Scholar 

  • Giuliano C, Gliozzo A, Strapparava C (2006) Syntagmatic kernels: a word sense disambiguation case study. In: Proceedings of the EACL-2006 workshop on learning structured information in natural language applications, Trento

  • Giuliano C, Gliozzo A, Strapparava C (2009) Kernel methods for minimally supervised WSD. Comput Linguist 35(4):513–528

    Article  Google Scholar 

  • Gliozzo A, Giuliano C, Strapparava C (2005) Domain kernels for word sense disambiguation. In: Proceedings of the 43rd annual meeting of the association for computational linguistics, University of Michigan, USA, pp 403–410

  • Gönen M, Alpayın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268

    MathSciNet  MATH  Google Scholar 

  • Graf ABA, Smola AJ, Borer S (2003) Classification in a normalized feature space using support vector machines. IEEE Trans Neural Netw 14(3):597–605

    Article  Google Scholar 

  • Grozeaa C (2004) Finding optimal parameter settings for high performance word sense disambiguation. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 125–128

  • Hoi SCH, Lyu MR, Chang EY (2006) Learning the unified kernel machines for classification. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, pp 187–196

  • Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425

    Article  Google Scholar 

  • Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University

  • Jin P, Li F, Zhu D, Wu Y, Yu S (2008) Exploiting external knowledge sources to improve kernel-based word sense disambiguation. In: Proceedings of IEEE international conference on natural language processing and knowledge engineering, Beijing, pp 1–8

  • Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European conference on machine learning, Chemnitz, pp 137–142

  • Joshi M, Pedersen T, Maclin R (2005) A comparative study of support vector machines applied to the word sense disambiguation problem for the medical domain. In: Proceedings of the 2nd indian international conference on artificial intelligence, Pune, pp 3449–3468

  • Joshi M (2006) Kernel methods for word sense disambiguation and abbreviation expansion in the medical domain. Master Thesis, University of Minnesota

  • Joshi M, Pedersen T, Maclin R, Pakhomov S (2006) Kernel methods for word sense disambiguation and acronym expansion. In: Proceedings of the 21st National Conference on Artificial Intelligence, Boston

  • Kandola J, Shawe-Taylor J, Cristianini N (2003) Learning semantic similarity. Adv Neural Inf Process Syst 15:657–664

    Google Scholar 

  • Lee YK, Ng HT (2002) An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: Proceedings of the conference on empirical methods in natural language processing, Philadelphia, pp 41–48

  • Lee YK, Ng HT, Chia TK (2004) Supervised word sense disambiguation with support vector machines and multiple knowledge sources. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 137–140

  • Li XJ, Rao F, Wang TH, Qiu TR (2012) Rough set-based feature weighted kernels for support vector machine. J Comput Theor Nanosci 9(12):2255–2259

    Article  Google Scholar 

  • Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002) Text classification using string kernels. J Mach Learn Res 2:419–444

    MATH  Google Scholar 

  • Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–202

    Article  Google Scholar 

  • Murata M, Utiyama M, Uchimoto K, Ma Q, Isahara H (2001) Japanese word sense disambiguation using the simple Bayes and support vector machine methods. In: Proceedings of the 2nd international workshop on evaluating word sense disambiguation systems (Senseval-2), Toulouse, pp 135–138

  • Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv 41(2):1–69

    Article  Google Scholar 

  • Navigli R, Lapata M (2010) An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Trans Pattern Anal Mach Intell 32(4):678–692

    Article  Google Scholar 

  • Nguyen KH, Ock CY (2013) Word sense disambiguation as a traveling salesman problem. Artif Intell Rev 40(4):405–427

    Article  Google Scholar 

  • Pahikkala T, Ginter F, Boberg J, Järvinen J, Salakoski T (2005a) Contextual weighting for support vector machines in literature mining: an application to gene versus protein name disambiguation. BMC Bioinform 6(1):157–168

    Article  Google Scholar 

  • Pahikkala T, Pyysalo S, Boberg J, Mylläri A, Salakoski T (2005b) Improving the performance of Bayesian and support vector classifiers in word sense disambiguation using positional information. In: Proceedings of the international and interdisciplinary conference on adaptive knowledge representation and reasoning, Espoo, pp 90–97

  • Pahikkala T, Pyysalo S, Ginter F, Boberg J, Järvinen J, Salakoski T (2005c) Kernels incorporating word positional information in natural language disambiguation tasks. In: Proceedings of the 18th international florida artificial intelligence research society conference, Menlo Park, pp 442–447

  • Pahikkala T, Pyysalo S, Boberg J, Järvinen J, Salakoski T (2009) Matrix representations, linear transformations, and kernels for disambiguation in natural language. Mach Learn 74(2):133–158

    Article  MATH  Google Scholar 

  • Popescu M (2004) Regularized least-squares classification for word sense disambiguation. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 209–212

  • Preotiuc-Pietro D, Cohn T (2013) A temporal model of text periodicities using Gaussian processes. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, pp 977–988

  • Preotiuc-Pietro D, Hristea F (2014) Unsupervised word sense disambiguation with N-gram features. Artif Intell Rev 41(2):241–260

    Article  Google Scholar 

  • Purandare A, Pedersen T (2004) Word sense discrimination by clustering contexts in vector and similarity spaces. In: Proceedings of the 8th conference on computational natural language learning, Boston

  • Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141

    MathSciNet  MATH  Google Scholar 

  • Salton G, Wang A, Yang C (1975) A vector space model for information retrieval. J Am Soc Inf Sci 18:613–620

    MATH  Google Scholar 

  • Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319

    Article  Google Scholar 

  • Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Siolas G, d’Alché-Buc F (2000) Support vector machines based on a semantic kernel for text categorization. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks, Como, pp 205–209

  • Stokoe C, Oakes MP, Tait J (2003) Word sense disambiguation in information retrieval revisited. In: Proceedings of the 26th annual international acm sigir conference on research and development in information retrieval, Toronto, pp 159–166

  • Strapparava C, Gliozzo A, Giuliano C (2004) Pattern abstraction and term similarity for word sense disambiguation. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 229–234

  • Su W, Carpuat M, Wu D (2004) Semi-supervised training of a kernel PCA-based model for word sense disambiguation. In: Proceedings of the 20th international conference on computational linguistics, Geneva, pp 1298–1304

  • Turdakov DY (2010) Word sense disambiguation methods. Program Comput Softw 36(6):309–326

    Article  MathSciNet  MATH  Google Scholar 

  • Wang P, Domeniconi C (2008) Building semantic kernels for text classification using Wikipedia. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, pp 713–721

  • Wang T, Rao J, Zhao D (2013) Using exponential kernel for word sense disambiguation. In: Proceedings of the 23rd international conference on artificial neural networks, LNCS 8131, Sofia, pp 545–552

  • Wang T, Rao J, Hu Q (2014) Supervised word sense disambiguation using semantic diffusion kernel. Eng Appl Artif Intell 27:167–174

    Article  Google Scholar 

  • Wang T, Zhao D, Tian S (2015) An overview of kernel alignment and its applications. Artif Intell Rev 43(2):179–192

    Article  Google Scholar 

  • Wu D, Su W, Carpuat M (2004) A kernel PCA method for superior word sense disambiguation. In: Proceedings of the 42nd annual meeting of the association for computational linguistics, Barcelona, pp 637–644

  • Yarowsky D, Florian R (2002) Evaluating sense disambiguation across diverse parameter spaces. Nat Lang Eng 8(4):293–310

    Article  Google Scholar 

  • Zhong Z, Ng HT (2010) It makes sense: a wide-coverage word sense disambiguation system for free text. In: Proceedings of the ACL system demonstrations, Uppsala, pp 78–83

  • Zhong Z, Ng HT (2012) Word sense disambiguation improves information retrieval. In: Proceedings of the 50th Annual meeting of the association for computational linguistics, Jeju, pp 273–282

Download references

Acknowledgments

The authors would like to thank all the referees for their constructive and insightful comments on this paper. The corresponding author also thanks the financial support of China Scholarship Council (No. 201308360053) as a visiting scholar for doing research with Prof. Peter X. Liu at Carleton University, and thanks for valuable discussions with Prof. Peter X. Liu and Dr. Shichao Liu at Carleton University. This work is supported in part by the National Nature Science Foundation of China (Nos. 51367014, 61202265, 61462040 and 61262049), the Jiangxi Province Natural Science Foundation of China (Nos. 20142BAB207011 and 20142BAB217016), the Jiangxi Province Education Plan of Young Scientists Foundation of China (No. 20112BCB23004), the Jiangxi Province Science and Technology Support Plan Key Projects of China (No. 20111BBE50008), and the Science and Technology Plan Projects in Jiangxi province Education Bureau of China (Nos. GJJ14770 and YC2015-S035).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangjun Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Qing, S., Zhang, H. et al. Kernel methods for word sense disambiguation. Artif Intell Rev 46, 41–58 (2016). https://doi.org/10.1007/s10462-015-9455-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-015-9455-5

Keywords

Navigation