Abstract
The enormous quantity of digital data necessitates automation, which among other things can help link unstructured to structured data. Such a task requires a systematic approach of mapping entity mentions (e.g., person, location) to corresponding entries in a Knowledge Base. This area of research is rapidly evolving at a breathtaking pace, which has led to the popularization of the Named Entity Disambiguation (NED). NED, also known as Entity Linking, described as the task of removing any ambiguities occurring when processing unstructured data packed with Named Entities. The goal of this paper is to investigate ensemble learning using Support Vector Machines (SVM) for tackling the NED problem. Multiple ensemble learning algorithms were studied, including bagging, boosting and voting using different SVM kernel functions, including Linear, RBF, and Polynomial kernels. Our results on three benchmark corpora show that ensemble learning using SVM produces competitive performance levels compared to well-known entity annotation systems and ensemble models. Specifically, the proposed method was best at the disambiguation of AIDA/CONLL-TestB and AQUAINT with F-measure equals to 78.5 and 71.5%, respectively.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Glass M, Gliozzo A (2018) A dataset for web-scale knowledge base population. In: Proceedings of the European semantic web conference (ESWC), Heraklion, Greece. Springer, Cham, pp 256–271. https://doi.org/10.1007/978-3-319-93417-4_17
Zhou G, Su J (2002) Named entity recognition using an HMM-based chunk tagger. In: Proceedings of the 40th annual meeting on association for computational linguistics (ACL), Philadelphia, PA. ACL, pp 473–480. http://portal.acm.org/citation.cfm?doid=1073083.1073163
Florian R, Ittycheriah A, Jing H, Zhang T (2003) Named entity recognition through classifier combination. In: Proceedings of the 7th conference on natural language learning BUDAPESTACADat HLT-NAACL 2003 (CoNLL), vol 4, Edmonton, Canada. ACL, pp 168–171. http://portal.acm.org/citation.cfm?doid=1119176.1119201
Bunescu R, Paca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th conference of the European chapter of the association for computational linguistics (EACL), Trento, Italy. ACL, pp 9–16. http://www.cs.utexas.edu/~ml/papers/encyc-eacl-06.pdf
Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP–CoNLL), Prague, Czech Republic. ACL, pp 708–716. http://www.aclweb.org/anthology/D07-1074
Hoffart J, Yosef MA, Bordino I, Urstenau H, Pinkal M, Spaniol M et al. (2011) Robust disambiguation of named entities in text. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Edinburgh, UK. ACL, pp 782–792. http://www.aclweb.org/anthology/D11-1072
West R, Gabrilovich E, Murphy K, Sun S, Gupta R, Lin D (2014) Knowledge base completion via search-based question answering. In: Proceedings of the 23rd international conference on world wide web (WWW). Seoul, Korea, ACM, pp 515–526. http://dl.acm.org/citation.cfm?doid=2566486.2568032
Hachey B, Radford W, Nothman J, Honnibal M, Curran JR (2013) Evaluating entity linking with Wikipedia. Artif Intell 194:130–150
Shen W, Wang J, Han J (2015) Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans Knowl Data Eng 27(2):443–460
Rizzo A, Erp V, Basave C, Elizabeth A, Rizzo G, Pereira B et al (2017) Lessons learnt from the named entity recognition and linking (NEEL) challenge series. Semant Web J 8(5):667–770
Milne D, Witten IH (2008) Learning to link with Wikipedia. In: Proceedings of the 17th ACM conference on information and knowledge management (CIKM), Napa Valle, CA. ACM, pp 509–518. https://www.cs.waikato.ac.nz/~ihw/papers/08-DNM-IHW-LearningToLinkWithWikipedia.pdf
Milne D, Witten IH (2008) An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI workshop on Wikipedia and artificial intelligence: an evolving synergy (AAAI), Chicago, IL. AAAI, pp 25–30. http://www.aaai.org/Papers/Workshops/2008/WS-08-15/WS08-15-005.pdf
Ferragina P, Scaiella U (2010) TAGME. In: Proceedings of the 19th ACM international conference on information and knowledge management (CIKM), Toronto, Canada. ACM, pp 1625–1628. http://portal.acm.org/citation.cfm?doid=1871437.1871689
Ratinov L, Roth D, Downey D, Anderson M (2011) Local and global algorithms for disambiguation to Wikipedia. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies-volume 1(HT), Portland, OR. ACL, pp 1375–1384. https://dl.acm.org/citation.cfm?id=2002642
Pilz A, Paaß G (2011) From names to entities using thematic context distance. In: Proceedings of the 20th ACM international conference on information and knowledge management (CIKM), Glasgow, UK. ACM, pp 857–866. http://dl.acm.org/citation.cfm?doid=2063576.2063700
Shen W, Wang J, Luo P, Wang M (2012) LINDEN: linking named entities with knowledge base via semantic knowledge. In: Proceedings of the 21st international conference on world wide web (WWW), Lyon, France. ACM, pp 449–458. https://www2012.universite-lyon.fr/proceedings/proceedings/p449.pdf
Liu X, Li Y, Wu H, Zhou M, Wei F, Lu Y (2013) Entity linking for tweets. In: Proceedings of the 51st annual meeting of the association for computational linguistics (ACL), Sofia, Bulgaria. ACL, pp 1304–1311. http://www.aclweb.org/anthology/P13-1128
He Z, Liu S, Li M, Zhou M, Zhang L, Wang H (2013) Learning entity representation for entity disambiguation. In: Proceedings of the 51st annual meeting of the association for computational linguistics (ACL), Sofia, Bulgaria, ACL, pp 30–34. http://www.aclweb.org/anthology/P13-2006
Lazic N, Subramanya A, Ringgaard M, Pereira F (2015) Plato: a selective context model for entity resolution. Trans Assoc Comput Linguist 3:503–515
Chisholm A, Hachey B (2015) Entity disambiguation with web links. Trans Assoc Comput Linguist 3(1):145–156
Ganea OE, Ganea M, Lucchi A, Eickhoff C, Hofmann T (2016) Probabilistic bag-of-hyperlinks model for entity linking. In: Proceedings of the 25th International conference on world wide web (WWW), Montréal, Canada, IW3C2, pp 927–938. https://dl.acm.org/citation.cfm?id=2882988
Phan MC, Sun A, Tay Y, Han J, Li C (2017) NeuPL: attention—based semantic matching and pair—linking for entity disambiguation. In: Proceedings of the 2017 ACM international conference on information and knowledge management (CIKM). Singapore, Singapore. ACM, pp 1667–1676. https://dl.acm.org/citation.cfm?id=3132963
Eshel Y, Cohen N, Radinsky K, Markovitch S, Yamada I, Levy O (2017) Named entity disambiguation for noisy text. In: Proceedings of the 21st conference on computational natural language learning (CoNLL). Vancouver, Canada. ACL, pp 58–68. http://www.aclweb.org/anthology/K17-1008
Barrena A , Soroa A , Agirre E (2018) Learning text representations for 500K classification tasks on named entity disambiguation. In: Proceedings of the 22nd conference on computational natural language learning (CoNLL), Brussels, Belgium. ACM, pp 171–180. http://portal.acm.org/citation.cfm?doid=775047.775067
Hu S, Tan Z, Zeng W, Ge B, Xiao W (2019) Entity linking via symmetrical attention-based neural network and entity structural features. Symmetry 11(4):453
Liu C, Li F, Sun X, Han H (2019) Attention-based joint entity linking with entity embedding. Information 10(2):46
Wang C , He X , Zhou A (2019) HEEL: exploratory entity linking for heterogeneous information networks. Knowl Inf Syst. https://doi.org/10.1007/s10115-019-01354-1
Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Edmonton, Canada. ACM, pp 133–142. http://portal.acm.org/citation.cfm?doid=775047.775067
Zhang W, Chuan Y, Jian S, Chew S, Tan L (2010) NUS-I2R: learning a combined system for entity linking. In: Proceedings of the text analysis conference (TAC), Gaithersburg, MD. NIST. https://tac.nist.gov/publications/2010/participant.papers/NUSchime.proceedings.pdf
Varma V, Reddy VB, Kovelamudi S, Bysani P, Santosh G, Kumar K et al (2009) IIIT hyderabad at TAC 2009 update summarization track. In: Proceedings of the text analysis conference (TAC), Gaithersburg, MD. NIST, pp 7–11. https://pdfs.semanticscholar.org/d602/cc05e91c22bf2916dc97ed7b0ef2d7215989.pdf
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Burlington
Herbrich R, Graepel T, Obermayer K (2000) Large margin rank boundaries for ordinal regression. In: Smola AJ, Bartlett PL, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge, MA, pp 115–132. http://svms.org/tutorials/Smolaetal2000.pdf
Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Han X, Sun L, Zhao J (2011) Collective entity linking in web text: a graph-based method. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval (SIGIR), Beijing, China. ACM. pp 765–774. http://nlpr-web.ia.ac.cn/cip/ZhaoJunPublications/paper/SIGIR2011.NED.pdf
Usbeck R, Ngonga Ngomo AC, Röder M, Gerber D, Coelho SA, Auer S et al (2014) AGDISTIS-graph-based disambiguation of named entities using linked data. In: 13th international semantic web conference (ISWC), Riva del Garda, Italy. Springer, Cham, pp 457–471. https://link.springer.com/chapter/10.1007/978-3-319-11964-9_29
Kulkarni S, Singh A, Ramakrishnan G, Chakrabarti S (2009) Collective annotation of Wikipedia entities in web text. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Paris, France. ACM, pp 457–466. https://www.cc.gatech.edu/~zha/CSE8801/query-annotation/p457-kulkarni.pdf
Phan MC, Sun A, Tay Y, Han J, Li C (2018) Pair-linking for collective entity disambiguation: two could be better than all. In: Computing research repository (CoRR). http://arxiv.org/abs/1802.01074
Ganea OE, Hofmann T (2017) Deep joint entity disambiguation with local neural attention. In: Proceedings of the 2017 conference on empirical methods in natural language processing (EMNLP), Copenhagen, Denmark. ACL, pp 2619–2629. http://arxiv.org/abs/1704.04920
Lin Y, Lin CY, Ji H (2017) List-only entity linking. In: Proceedings of the 55th annual meeting of the association for computational linguistics (ACL), Vancouver, BC. ACL, pp 536–541. https://doi.org/10.18653/v1/P17-2085
Cucerzan S (2011) TAC entity linking by performing full-document entity extraction and disambiguation. In: Text analysis conference 2011 workshop (TAC). NIST. https://tac.nist.gov/publications/2011/presentations/MS_MLI.presentation.pdf
Mendes PN, Jakob M, García-Silva A, Bizer C (2011) DBpedia spotlight. In: Proceedings of the 7th international conference on semantic systems (I-semantics), Graz, Austria. ACM, pp 1–8. http://dl.acm.org/citation.cfm?doid=2063518.2063519
Nemeskey D, Recski G, Zséder A, Kornai A (2010) BUDAPESTACAD at TAC. In: Proceedings of the text analysis conference 2010 workshop (TAC). Gaithersburg, MD. NIST. https://hlt.bme.hu/en/publ/Nemeskey_2010
Gottipati S, Jiang J (2011) Linking entities to a knowledge base with query expansion. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Edinburgh, UK. ACL, pp 804–813. https://dl.acm.org/citation.cfm?id=2145523
Liu Y, An A, Huang X (2003) Boosting prediction accuracy on imbalanced datasets with SVM ensembles. In: Proceedings of the 7th Pacific-Asia conference on knowledge discovery and data mining, Singapore, Singapore, pp 107–118. https://doi.org/10.1007/11731139_15
Singla R, Chambayil B, Khosla A, Santosh J (2011) Comparison of SVM and ANN for classification of eye events in EEG. J Biomed Sci Eng 4(1):62
Nitze I, Schulthess U, Asche H (2012) Comparison of machine learning algorithms random forest, artificial neural network and support vector machine to maximum likelihood for supervised crop type classification. In: Proceedings of the 4th GEOBIA, Rio de Janeiro, Brazil 79, p 3540. https://www.researchgate.net/publication/258667149_Comparison_of_support_vector_machine_neural_network_and_CART_algorithms_for_the_land-cover_classification_using_limited_training_data_points
Noi T, Kappas M (2018) Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using sentinel-2 imagery. Sensors 18(1):18
Cornolti M, Ferragina P, Ciaramita M (2013) A framework for benchmarking entity-annotation systems. In: Proceedings of the 22nd international conference on world wide web (WWW), Rio de Janeiro, Brazil. ACM, pp 249–260. https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/40749.pdf
Barrena A, Soroa A, Agirre E (2016) Alleviating poor context with background knowledge for named entity disambiguation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (ACL), Berlin, Germany. ACL, pp 1903–1912. http://www.aclweb.org/anthology/P16-1179
Barrena A, Soroa A, Agirre E (2015) Combining mention context and hyperlinks from wikipedia for named entity disambiguation. In: Proceedings of the fourth joint conference on lexical and computational semantics (*SEMEVAL), Denver, CO. ACL, pp 101–105. http://www.aclweb.org/anthology/S15-1011
Han X, Zhao J (2009) NLPR_KBP in TAC 2009 KBP track: a two-stage method to entity linking. In: Proceedings of the text analysis conference (TAC), Gaithersburg, MD. NIST http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.232.2434
Guo S, Chang MW, Kiciman E (2013) To link or not to link? A study on end-to-end tweet entity linking. In: Proceedings for the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies (NAACL-HLT), Atlanta, GA. ACL, pp 1020–1030. http://infolab.stanford.edu/~sdguo/naacl2013.pdf
Dredze M, Mcnamee P, Rao D, Gerber A, Finin T (2010) Entity disambiguation for knowledge base population. In: Proceedings of the 23rd international conference on computational linguistics (COLING), Beijing, China. ACL, pp 277–285. https://www.cs.jhu.edu/~mdredze/publications/entity_linking_coling.pdf
Zhang W, Su J, Tan CL, Wang WT (2010) Entity linking leveraging: automatically generated annotation. In: Proceedings of the 23rd international conference on computational linguistics (COLING), Beijing, China. ACL, pp 1290–1298. https://www.aclweb.org/anthology/C/C10/C10-1145.pdf
Zheng Z, Li F, Huang M, Zhu X (2010) Learning to link entities with knowledge base. In: Proceedings of the 23rd international conference on computational linguistics (HLT), Los Angeles, CA. ACL, pp 483–491. https://dl.acm.org/citation.cfm?id=1858071
Zhang W, Su J, Chen B, Wang W, Toh Z, Sim Y et al (2011) I2R-NUS-MSRA at TAC 2011: entity linking. In: Proceedings of the text analysis conference (TAC), Gaithersburg, MD. NIST http://yanchuan.sg/assets/papers/zhang2011nus.pdf
Shen W, Wang J, Luo P, Wang M (2013) Linking named entities in Tweets with knowledge base via user interest modeling. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), Chicago, IL. ACM, pp 68–76. http://dl.acm.org/citation.cfm?doid=2487575.2487686
Chen Z, Ji H (2011) Collaborative ranking: a case study on entity linking. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), Edinburgh, UK. ACL, pp 771–781. https://aclanthology.info/pdf/D/D11/D11-1071.pdf
Manning CD, Raghavan P, Schutze H (2009) Evaluation in information retrieval. Cambridge University Press, Cambridge
Hsu CW, Chang CC, Lin CJ (2004) A practical guide to support vector classification. Department of Computer Science and Information Engineering, National Taiwan University, Taipei City
Polikar R (2012) Ensemble machine learning. Springer, Boston. https://doi.org/10.1007/978-1-4419-9326-7_1
Mitchell T (1997) Machine learning. McGraw-Hill, New York
Zwicklbauer S, Seifert C, Granitzer M (2016) Robust and collective entity disambiguation through semantic embeddings. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval (SIGIR), Pisa, Italy. ACM, pp 425–434. http://dl.acm.org/citation.cfm?doid=2911451.2911535
Hachey B, Nothman J, Radford W (2014) Cheap and easy entity evaluation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (ACL), Baltimore, MD. ACL, pp 464–469. http://acl2014.org/acl2014/P14-2/pdf/P14-2076.pdf
Tjong EF, Sang K, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the 7th conference on natural language learning at HLT-NAACL 2003-volume 4 (CoNLL), Edmonton, Canada. ACL, pp 142–147. https://dl.acm.org/citation.cfm?id=1119195
Chang YW, Hsieh CJ, Chang KW, Ringgaard M, Lin CJ (2010) Training and testing low-degree polynomial data mappings via linear SVM. J Mach Learn Res 11:1471–1490
Usbeck R, Eickmann B, Ferragina P, Lemke C, Moro A, Navigli R et al (2015) GERBIL. In: Proceedings of the 24th international conference on world wide web (WWW), Florence, Italy, IW3C2, pp. 1133–1143. http://dl.acm.org/citation.cfm?doid=2736277.2741626
Acknowledgements
This work was supported by the Research Center of the College of Computer and Information Sciences, King Saud University. The authors are grateful for this support and to the anonymous reviewers for their insightful feedbacks.
Funding
This research was supported by a special fund in the Research Centre of the College of Computer and Information Sciences at King Saud University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Alokaili, A., Menai, M.E.B. SVM ensembles for named entity disambiguation. Computing 102, 1051–1076 (2020). https://doi.org/10.1007/s00607-019-00748-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-019-00748-x