Kernel methods for word sense disambiguation

Li, Xiangjun; Qing, Song; Zhang, Huawei; Wang, Tinghua; Yang, Huping

doi:10.1007/s10462-015-9455-5

Kernel methods for word sense disambiguation

Published: 30 December 2015

Volume 46, pages 41–58, (2016)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Xiangjun Li^1,2,
Song Qing¹,
Huawei Zhang¹,
Tinghua Wang³ &
…
Huping Yang⁴

680 Accesses
5 Citations
Explore all metrics

Abstract

Many applications of natural language processing (NLP) need an accurate resolution of various ambiguities existing in natural language. The task of fulfilling this need is also called word sense disambiguation (WSD). WSD is to resolve the correct sense for an instance of a polysemous word. On the other hand, as one of the most popular machine learning approaches, kernel methods have attracted significant interest in recent years and have exhibited fairly high performance in a wide variety of learning tasks. In this paper, we present a survey of the research progress of kernel-based WSD techniques. We start by introducing some preliminary knowledge concerning WSD and kernel methods. Then, a review of the main approaches in the literature is presented, focusing on the following issues: context representation, kernel design and learning algorithms. We also provide some further discussions on the kernel-based WSD approaches. Finally, open problems and future directions are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Natural Language Processing

Near-term advances in quantum natural language processing

Article 11 April 2024

Notes

http://www.senseval.org/.
A set has closure under an operation if performance of that operation on members of the set always produces a member of the same set; in this case we also say that the set is closed under the operation.
For a fixed c, we take always the largest context \({\varvec{x}}=(t_{-bl} ,\ldots ,t_{-1} ,t_{1},\ldots ,t_{br})\) so that \(bl\le c\) and \(br\le c\). Note that if there exist c words preceding and following the word to be disambiguated, then \(bl=br=c\), otherwise \(bl<c\) or \(br<c\).
This definition is the so-called gap-weighted subsequences kernel, which is one of the most general types of kernels defined on sequences.

References

Agirre E, Martínez D (2004) The basque country university system: english and basque tasks. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 44–48
Audibert L (2004) Word sense disambiguation criteria: a systematic study. In: Proceedings of 20th international conference on computational linguistics, Geneva, pp 910–916
Beck D (2014) Bayesian kernel methods for natural language processing. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, student research workshop, Baltimore, pp 1–9
Beck D, Cohn T, Specia L (2014) Joint emotion analysis via multi-task Gaussian processes. In: Proceedings of the 2014 conference on empirical methods in natural language processing, Doha, pp 1798–1803
Bhala RV, Abirami S (2014) Trends in word sense disambiguation. Artif Intell Rev 42(2):159–171
Article Google Scholar
Bunescu R, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th conference of the european chapter of the association for computational linguistics, Trento, pp 9–16
Cabezas C, Resnik P, Stevens J (2001) Supervised sense tagging using support vector machines. In: Proceedings of the 2nd international workshop on evaluating word sense disambiguation systems (Senseval-2), Toulouse, pp 59–62
Cancedda N, Gaussier E, Goutte C, Renders J-M (2003) Word-sequences kernels. J Mach Learn Res 3:1059–1082
MathSciNet MATH Google Scholar
Cancedda N, Mahé P (2009) Factored sequence kernels. Neurocomputing 72(7–9):1407–1413
Article Google Scholar
Carpuat M, Su W, Wu D (2004) Augmenting ensemble classification for word sense disambiguation with a kernel PCA model. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 88–92
Carpuat M, Wu D (2007) Improving statistical machine translation using word sense disambiguation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Czech Republic, Prague, pp 61–72
Chan YS, Ng HT, Chiang D (2007a) Word sense disambiguation improves statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics. Czech Republic, Prague, pp 33–40
Chan YS, Ng HT, Zhong Z (2007b) NUS-PT: Exploiting parallel texts for word sense disambiguation in the english all-words tasks. In: Proceedings of the 4th international workshop on semantic evaluations (Semeval-2007), Czech Republic, Prague, pp 253–256
Cohn T, Specia L (2013) Modelling annotator bias with multi-task Gaussian processes: an application to machine translation quality estimation. In: Proceedings of the 51st annual meeting of the association for computational linguistics, Sofia, pp 32–42
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Cristianini N, Shawe-Taylor J, Lodhi H (2002) Latent semantic kernels. J Intell Inf Syst 18(2–3):127–152
Article Google Scholar
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Article Google Scholar
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York
MATH Google Scholar
Escudero G, Màrquez L, Rigau G (2004) TALP system for the English lexical sample task. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 113–116
Gärtner T (2003) A survey of kernels for structured data. ACM SIGKDD Explor Newsl 5(1):49–58
Article Google Scholar
Ginter F, Boberg J, Järvinen J, Salakoski T (2004) New techniques for disambiguation in natural language and their application to biological text. J Mach Learn Res 5:605–621
MathSciNet Google Scholar
Giuliano C, Gliozzo A, Strapparava C (2006) Syntagmatic kernels: a word sense disambiguation case study. In: Proceedings of the EACL-2006 workshop on learning structured information in natural language applications, Trento
Giuliano C, Gliozzo A, Strapparava C (2009) Kernel methods for minimally supervised WSD. Comput Linguist 35(4):513–528
Article Google Scholar
Gliozzo A, Giuliano C, Strapparava C (2005) Domain kernels for word sense disambiguation. In: Proceedings of the 43rd annual meeting of the association for computational linguistics, University of Michigan, USA, pp 403–410
Gönen M, Alpayın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268
MathSciNet MATH Google Scholar
Graf ABA, Smola AJ, Borer S (2003) Classification in a normalized feature space using support vector machines. IEEE Trans Neural Netw 14(3):597–605
Article Google Scholar
Grozeaa C (2004) Finding optimal parameter settings for high performance word sense disambiguation. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 125–128
Hoi SCH, Lyu MR, Chang EY (2006) Learning the unified kernel machines for classification. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, pp 187–196
Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Article Google Scholar
Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University
Jin P, Li F, Zhu D, Wu Y, Yu S (2008) Exploiting external knowledge sources to improve kernel-based word sense disambiguation. In: Proceedings of IEEE international conference on natural language processing and knowledge engineering, Beijing, pp 1–8
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European conference on machine learning, Chemnitz, pp 137–142
Joshi M, Pedersen T, Maclin R (2005) A comparative study of support vector machines applied to the word sense disambiguation problem for the medical domain. In: Proceedings of the 2nd indian international conference on artificial intelligence, Pune, pp 3449–3468
Joshi M (2006) Kernel methods for word sense disambiguation and abbreviation expansion in the medical domain. Master Thesis, University of Minnesota
Joshi M, Pedersen T, Maclin R, Pakhomov S (2006) Kernel methods for word sense disambiguation and acronym expansion. In: Proceedings of the 21st National Conference on Artificial Intelligence, Boston
Kandola J, Shawe-Taylor J, Cristianini N (2003) Learning semantic similarity. Adv Neural Inf Process Syst 15:657–664
Google Scholar
Lee YK, Ng HT (2002) An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: Proceedings of the conference on empirical methods in natural language processing, Philadelphia, pp 41–48
Lee YK, Ng HT, Chia TK (2004) Supervised word sense disambiguation with support vector machines and multiple knowledge sources. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 137–140
Li XJ, Rao F, Wang TH, Qiu TR (2012) Rough set-based feature weighted kernels for support vector machine. J Comput Theor Nanosci 9(12):2255–2259
Article Google Scholar
Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002) Text classification using string kernels. J Mach Learn Res 2:419–444
MATH Google Scholar
Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–202
Article Google Scholar
Murata M, Utiyama M, Uchimoto K, Ma Q, Isahara H (2001) Japanese word sense disambiguation using the simple Bayes and support vector machine methods. In: Proceedings of the 2nd international workshop on evaluating word sense disambiguation systems (Senseval-2), Toulouse, pp 135–138
Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv 41(2):1–69
Article Google Scholar
Navigli R, Lapata M (2010) An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Trans Pattern Anal Mach Intell 32(4):678–692
Article Google Scholar
Nguyen KH, Ock CY (2013) Word sense disambiguation as a traveling salesman problem. Artif Intell Rev 40(4):405–427
Article Google Scholar
Pahikkala T, Ginter F, Boberg J, Järvinen J, Salakoski T (2005a) Contextual weighting for support vector machines in literature mining: an application to gene versus protein name disambiguation. BMC Bioinform 6(1):157–168
Article Google Scholar
Pahikkala T, Pyysalo S, Boberg J, Mylläri A, Salakoski T (2005b) Improving the performance of Bayesian and support vector classifiers in word sense disambiguation using positional information. In: Proceedings of the international and interdisciplinary conference on adaptive knowledge representation and reasoning, Espoo, pp 90–97
Pahikkala T, Pyysalo S, Ginter F, Boberg J, Järvinen J, Salakoski T (2005c) Kernels incorporating word positional information in natural language disambiguation tasks. In: Proceedings of the 18th international florida artificial intelligence research society conference, Menlo Park, pp 442–447
Pahikkala T, Pyysalo S, Boberg J, Järvinen J, Salakoski T (2009) Matrix representations, linear transformations, and kernels for disambiguation in natural language. Mach Learn 74(2):133–158
Article MATH Google Scholar
Popescu M (2004) Regularized least-squares classification for word sense disambiguation. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 209–212
Preotiuc-Pietro D, Cohn T (2013) A temporal model of text periodicities using Gaussian processes. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, pp 977–988
Preotiuc-Pietro D, Hristea F (2014) Unsupervised word sense disambiguation with N-gram features. Artif Intell Rev 41(2):241–260
Article Google Scholar
Purandare A, Pedersen T (2004) Word sense discrimination by clustering contexts in vector and similarity spaces. In: Proceedings of the 8th conference on computational natural language learning, Boston
Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141
MathSciNet MATH Google Scholar
Salton G, Wang A, Yang C (1975) A vector space model for information retrieval. J Am Soc Inf Sci 18:613–620
MATH Google Scholar
Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Article Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New York
Book MATH Google Scholar
Siolas G, d’Alché-Buc F (2000) Support vector machines based on a semantic kernel for text categorization. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks, Como, pp 205–209
Stokoe C, Oakes MP, Tait J (2003) Word sense disambiguation in information retrieval revisited. In: Proceedings of the 26th annual international acm sigir conference on research and development in information retrieval, Toronto, pp 159–166
Strapparava C, Gliozzo A, Giuliano C (2004) Pattern abstraction and term similarity for word sense disambiguation. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 229–234
Su W, Carpuat M, Wu D (2004) Semi-supervised training of a kernel PCA-based model for word sense disambiguation. In: Proceedings of the 20th international conference on computational linguistics, Geneva, pp 1298–1304
Turdakov DY (2010) Word sense disambiguation methods. Program Comput Softw 36(6):309–326
Article MathSciNet MATH Google Scholar
Wang P, Domeniconi C (2008) Building semantic kernels for text classification using Wikipedia. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, pp 713–721
Wang T, Rao J, Zhao D (2013) Using exponential kernel for word sense disambiguation. In: Proceedings of the 23rd international conference on artificial neural networks, LNCS 8131, Sofia, pp 545–552
Wang T, Rao J, Hu Q (2014) Supervised word sense disambiguation using semantic diffusion kernel. Eng Appl Artif Intell 27:167–174
Article Google Scholar
Wang T, Zhao D, Tian S (2015) An overview of kernel alignment and its applications. Artif Intell Rev 43(2):179–192
Article Google Scholar
Wu D, Su W, Carpuat M (2004) A kernel PCA method for superior word sense disambiguation. In: Proceedings of the 42nd annual meeting of the association for computational linguistics, Barcelona, pp 637–644
Yarowsky D, Florian R (2002) Evaluating sense disambiguation across diverse parameter spaces. Nat Lang Eng 8(4):293–310
Article Google Scholar
Zhong Z, Ng HT (2010) It makes sense: a wide-coverage word sense disambiguation system for free text. In: Proceedings of the ACL system demonstrations, Uppsala, pp 78–83
Zhong Z, Ng HT (2012) Word sense disambiguation improves information retrieval. In: Proceedings of the 50th Annual meeting of the association for computational linguistics, Jeju, pp 273–282

Download references

Acknowledgments

The authors would like to thank all the referees for their constructive and insightful comments on this paper. The corresponding author also thanks the financial support of China Scholarship Council (No. 201308360053) as a visiting scholar for doing research with Prof. Peter X. Liu at Carleton University, and thanks for valuable discussions with Prof. Peter X. Liu and Dr. Shichao Liu at Carleton University. This work is supported in part by the National Nature Science Foundation of China (Nos. 51367014, 61202265, 61462040 and 61262049), the Jiangxi Province Natural Science Foundation of China (Nos. 20142BAB207011 and 20142BAB217016), the Jiangxi Province Education Plan of Young Scientists Foundation of China (No. 20112BCB23004), the Jiangxi Province Science and Technology Support Plan Key Projects of China (No. 20111BBE50008), and the Science and Technology Plan Projects in Jiangxi province Education Bureau of China (Nos. GJJ14770 and YC2015-S035).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Nanchang University, Nanchang, 330031, People’s Republic of China
Xiangjun Li, Song Qing & Huawei Zhang
Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, KIS 5B6, Canada
Xiangjun Li
School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, 341000, People’s Republic of China
Tinghua Wang
Department of Electrical and Automation Engineering, Nanchang University, Nanchang, 330031, People’s Republic of China
Huping Yang

Authors

Xiangjun Li
View author publications
You can also search for this author in PubMed Google Scholar
Song Qing
View author publications
You can also search for this author in PubMed Google Scholar
Huawei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tinghua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huping Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangjun Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Qing, S., Zhang, H. et al. Kernel methods for word sense disambiguation. Artif Intell Rev 46, 41–58 (2016). https://doi.org/10.1007/s10462-015-9455-5

Download citation

Published: 30 December 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10462-015-9455-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel methods for word sense disambiguation

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural Language Processing

Near-term advances in quantum natural language processing

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Kernel methods for word sense disambiguation

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural Language Processing

Near-term advances in quantum natural language processing

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation