Abstract
The study of contemporary tweet-based Entity Linking (EL) systems reveals a lack of a standard definition and a consensus on the task. Specifically, identifying what should be annotated in texts remains a recurring question. This prevents proper design and fair evaluation of EL systems. To tackle this issue, the present paper introduces a set of rules intended to define the EL task for tweets. We experimented the effectiveness of the proposed rules by developing TELS, an end-to-end supervised system that links tweets to Wikipedia. The experiments conducted on five publicly available datasets show that our system outperforms the baselines with an improvement, in terms of overall macro F1-score (micro F1-score), ranging from 25.04% (7.32%) up to 35.36% (42.03%). Moreover, feature analysis reveals that when the annotation is not limited to very few entity types, the proposed rules capture more efficiently annotators’ tacit agreements from datasets. Consequently, the proposed rules constitute a step further towards a consensus on the EL task.
Similar content being viewed by others
Notes
The revised version of Meij is shared at: https://drive.google.com/file/d/1CiGNjyK350Lyn3h5yLreOKNKq7Eb4EZo/view?usp=sharing.
A twitter user mention is a user defined nickname, it occurs in tweets preceded by the symbol @.
TagMe website: https://tagme.di.unipi.it/tagmehelp.html.
WAT website: https://sobigdata.d4science.org/web/tagme/wat-api.
AIDA website: https://www.mpi-inf.mpg.de/yago-naga/aida/.
DBpedia Spotlight website: https://www.dbpedia-spotlight.org/.
References
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Chu H (2011) MDB: a memory-mapped database and backend for OpenLDAP. In: Proceedings of the 3rd international conference on LDAP. Heidelberg, pp 35–47
Cornolti M, Ferragina P, Ciaramita M (2013) A framework for benchmarking entity-annotation systems. In: Proceedings of the 22nd international conference on world wide web. Rio de Janeiro, pp 249–259
Daiber J, Jakob M, Hokamp C, Mendes PN (2013) Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems. Graz, pp 121–124
Derczynski L, Maynard D, Rizzo G, Van Erp M, Gorrell G, Troncy R, Petrak J, Bontcheva K (2015) Analysis of named entity recognition and linking for tweets. Inf Process Manage 51(2):32–49
Serban O, Thapen N, Maginnis B, Hankin C, Foot V (2019) Real-time processing of social media with SENTINEL: a syndromic surveillance system incorporating deep learning for health classification. Inf Process Manage 56(3):1166–1184
Feng Y, Zarrinkalam F, Bagheri E, Fani H, Al-Obeidat F (2018) Entity linking of tweets based on dominant entity candidates. Soc Netw Anal Min 8(46):1–16
Ferragina P, Scaiella U (2010) TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities). In: Proceedings of the 19th ACM international conference on Information and knowledge management, Toronto, Canada, pp 1625–1628
Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting of the association for computational linguistics. Ann Arbor, Michigan, USA, pp 363–370
Grishman R, Sundheim B (1996) Message understanding conference-6. In: Proceedings of the 16th conference on Computational linguistics, USA, pp 466–471
Guo S, Chang MW, Kiciman E (2013) To link or not to link? a study on end-to-end tweet entity linking. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies, Atlanta, Georgia, USA, pp 1020–1030
Habib MB, Van Keulen M (2012) Unsupervised improvement of named entity extraction in short informal context using disambiguation clues. In: Proceedings of the workshop of semantic web and information extraction, Galway City, Ireland, pp 1–9
Habib MB, Van Keulen M (2016) TwitterNEED: a hybrid approach for named entity extraction and disambiguation for tweet. Nat Lang Eng 22(3):423–456
Han H, Viriyothai P, Lim SJ, Lameter D, Mussell B (2019) Yet another framework for tweet entity linking (YAFTEL). In: Proceedings of the IEEE conference on multimedia information processing and retrieval, San Jose, CA, USA, pp 258–263
Hasan M, Orgun MA, Schwitter R (2019) Real-time event detection from the Twitter data stream using the TwitterNews+ Framework. Inf Process Manage 56(3):1146–1165
Hasibi F, Balog K, Bratsberg SE (2016) On the reproducibility of the TAGME entity linking system. In: Ferro N et al (eds) Advances in Information Retrieval, ERIC2016, LNCS, vol 9626. Springer, Cham, pp 436–449
Jha K, Röder M, Ngomo ACN (2017) All that glitters is not gold: rule-based curation of reference datasets for named entity recognition and entity linking. In: Blomqvist E, Maynard D, Gangemi A, Hoekstra R, Hitzler P, Hartig O (eds) The semantic web ESWC 2017, LNCS, vol 10249. Springer, Cham, pp 305–320
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th international conference on world wide web, Raleigh North Carolina, USA, pp 591–600
Ling X, Singh S, Weld DS (2015) Design challenges for entity linking. Trans Assoc Comput Linguist 3:315–328
Locke B, Martin J (2009) Named entity recognition: adapting to microblogging. University of Colorado UG Thesis, pp 1–12
Meij E, Weerkamp W, De Rijke M (2012) Adding semantics to microblog posts. In: Proceedings of the Fifth ACM international conference on web search and data mining. Seattle, Washington, USA, pp 563–572
Milne D, Witten IH (2008) Learning to link with wikipedia. In: Proceedings of the 17th ACM conference on information and knowledge management. Napa Valley, California, USA, pp 509–518
Piccinno F, Ferragina P (2014) From Tagme to WAT: a new entity annotator. In: Proceedings of the 1st ACM international workshop on entity recognition and disambiguation, co-located with SIGIR 2014. USA, pp 55–61
Ran C, Shen W, Wang J (2018) An attention factor graph model for tweet entity linking. In: Proceedings of the 2018 world wide web conference, Lyon, France, pp 1135–1144
Rizzo G, Cano AE, Pereira B, Varga A (2015) Making sense of microposts (#Microposts2015) named entity recognition and linking challenge. In: Proceedings of the 5th workshop on making sense of microposts, Florence, Italy, pp 44–53
Rizzo G, Van Erp M, Plu J, Troncy R (2016) Making sense of microposts (#Microposts2016) named entity recognition and linking (NEEL) challenge. In: Proceedings of the 6th workshop on making sense of microposts, Montreal, Canada, pp 50–59
Rosales-Méndez H, Hogan A, Poblete B (2018a) VoxEL: a benchmark dataset for multilingual entity linking. In: Vrandečić D (eds) The semantic web: ISWC, et al (2018) ISWC 2018, LNCS, vol 11137. LNCS, Springer, Cham, pp 170–186
Rosales-Méndez H, Hogan A, Poblete B (2019) Nifify: towards better quality entity linking datasets. In: Companion proceedings of the 2019 world wide web conference, San Francisco, USA, pp 815–818
Rosales-Méndez H, Hogan A, Poblete B (2020) Fine-grained evaluation for entity linking. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, Hong Kong, China, pp 718–727
Rosales-Méndez H, Poblete B, Hogan A (2018b) What should entity linking link? CEUR Workshop Proc 2100:1–5
Shen W, Wang J, Luo P, Wang M (2013) Linking named entities in tweets with knowledge base via user interest modeling. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, Chicago, Illinois, USA, pp 68–76
Usbeck R, Röder M, Ngomo ACN, Baron C, Both A, Brümmer M, Ceccarelli D, Cornolti M, Cherix D, Eickmann B, Ferragina P, Lemke C, Moro A, Navigli R, Piccinno F, Rizzo G, Sack H, Speck R, Troncy R, Waitelonis J, Wesemann L (2015) GERBIL: General entity annotator benchmarking framework. In: Proceedings of the 24th international conference on world wide web, Florence, Italy, pp 1133–1143
Van Erp M, Mendes PN, Paulheim H, Ilievski F, Plu J, Rizzo G, Waitelonis J (2016) Evaluating entity linking: an analysis of current benchmark datasets and a roadmap for doing a better job. In: Proceedings of the tenth international conference on language resources and evaluation, Portorož, Slovenia, pp 4373–4379
Speck R, Ngonga ACN (2014) Ensemble learning for named entity recognition. In: Mika P (eds) The Semantic Web: ISWC et al (2014) ISWC 2014, LNCS, vol 8796. Springer, Cham, pp 293–308
Weichselbraun A, Braşoveanu AM, Kuntschik P, Nixon LJ (2019) Improving named entity linking corpora quality. In: Proceedings of the international conference on recent advances in natural language processing, Varna, Bulgaria, pp 1328–1337
Wu G, He Y, Hu X (2018) Entity linking: an issue to extract corresponding entity with knowledge base. IEEE Access 6:6220–6231
Yosef MA, Hoffart J, Bordino I, Spaniol M, Weikum G (2011) AIDA: an online tool for accurate disambiguation of named entities in text and tables. Proc VLDB Endow 4(12):1450–1457
Zarrinkalam F, Kahani M, Bagheri E (2018) Mining user interests over active topics on social networks. Inf Process Manage 54(2):339–357
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nait-Hamoud, M.C., Lahfa, F. & Ennaji, A. A step further towards a consensus on linking tweets to Wikipedia. Evol. Intel. 16, 1825–1840 (2023). https://doi.org/10.1007/s12065-020-00549-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-020-00549-8