A step further towards a consensus on linking tweets to Wikipedia

Nait-Hamoud, Mohamed Cherif; Lahfa, Fedoua; Ennaji, Abdellatif

doi:10.1007/s12065-020-00549-8

A step further towards a consensus on linking tweets to Wikipedia

Special Issue
Published: 01 February 2021

Volume 16, pages 1825–1840, (2023)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Mohamed Cherif Nait-Hamoud ORCID: orcid.org/0000-0002-6316-0699^1,2,
Fedoua Lahfa¹ &
Abdellatif Ennaji³

166 Accesses
Explore all metrics

Abstract

The study of contemporary tweet-based Entity Linking (EL) systems reveals a lack of a standard definition and a consensus on the task. Specifically, identifying what should be annotated in texts remains a recurring question. This prevents proper design and fair evaluation of EL systems. To tackle this issue, the present paper introduces a set of rules intended to define the EL task for tweets. We experimented the effectiveness of the proposed rules by developing TELS, an end-to-end supervised system that links tweets to Wikipedia. The experiments conducted on five publicly available datasets show that our system outperforms the baselines with an improvement, in terms of overall macro F1-score (micro F1-score), ranging from 25.04% (7.32%) up to 35.36% (42.03%). Moreover, feature analysis reveals that when the annotation is not limited to very few entity types, the proposed rules capture more efficiently annotators’ tacit agreements from datasets. Consequently, the proposed rules constitute a step further towards a consensus on the EL task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Entity Linking for Vietnamese Tweets

Implicit Entity Linking in Tweets

Named Entity Recognition and Linking in Tweets Based on Linguistic Similarity

Notes

The revised version of Meij is shared at: https://drive.google.com/file/d/1CiGNjyK350Lyn3h5yLreOKNKq7Eb4EZo/view?usp=sharing.
A twitter user mention is a user defined nickname, it occurs in tweets preceded by the symbol @.
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews.
http://www.lmdb.tech/bench/inmem/.
https://developers.google.com/protocol-buffers/docs/overview.
https://labs.criteo.com/2017/05/serialization/.
TagMe website: https://tagme.di.unipi.it/tagmehelp.html.
WAT website: https://sobigdata.d4science.org/web/tagme/wat-api.
AIDA website: https://www.mpi-inf.mpg.de/yago-naga/aida/.
DBpedia Spotlight website: https://www.dbpedia-spotlight.org/.

References

Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Chu H (2011) MDB: a memory-mapped database and backend for OpenLDAP. In: Proceedings of the 3rd international conference on LDAP. Heidelberg, pp 35–47
Cornolti M, Ferragina P, Ciaramita M (2013) A framework for benchmarking entity-annotation systems. In: Proceedings of the 22nd international conference on world wide web. Rio de Janeiro, pp 249–259
Daiber J, Jakob M, Hokamp C, Mendes PN (2013) Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems. Graz, pp 121–124
Derczynski L, Maynard D, Rizzo G, Van Erp M, Gorrell G, Troncy R, Petrak J, Bontcheva K (2015) Analysis of named entity recognition and linking for tweets. Inf Process Manage 51(2):32–49
Article Google Scholar
Serban O, Thapen N, Maginnis B, Hankin C, Foot V (2019) Real-time processing of social media with SENTINEL: a syndromic surveillance system incorporating deep learning for health classification. Inf Process Manage 56(3):1166–1184
Article Google Scholar
Feng Y, Zarrinkalam F, Bagheri E, Fani H, Al-Obeidat F (2018) Entity linking of tweets based on dominant entity candidates. Soc Netw Anal Min 8(46):1–16
Google Scholar
Ferragina P, Scaiella U (2010) TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities). In: Proceedings of the 19th ACM international conference on Information and knowledge management, Toronto, Canada, pp 1625–1628
Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting of the association for computational linguistics. Ann Arbor, Michigan, USA, pp 363–370
Grishman R, Sundheim B (1996) Message understanding conference-6. In: Proceedings of the 16th conference on Computational linguistics, USA, pp 466–471
Guo S, Chang MW, Kiciman E (2013) To link or not to link? a study on end-to-end tweet entity linking. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies, Atlanta, Georgia, USA, pp 1020–1030
Habib MB, Van Keulen M (2012) Unsupervised improvement of named entity extraction in short informal context using disambiguation clues. In: Proceedings of the workshop of semantic web and information extraction, Galway City, Ireland, pp 1–9
Habib MB, Van Keulen M (2016) TwitterNEED: a hybrid approach for named entity extraction and disambiguation for tweet. Nat Lang Eng 22(3):423–456
Article Google Scholar
Han H, Viriyothai P, Lim SJ, Lameter D, Mussell B (2019) Yet another framework for tweet entity linking (YAFTEL). In: Proceedings of the IEEE conference on multimedia information processing and retrieval, San Jose, CA, USA, pp 258–263
Hasan M, Orgun MA, Schwitter R (2019) Real-time event detection from the Twitter data stream using the TwitterNews+ Framework. Inf Process Manage 56(3):1146–1165
Article Google Scholar
Hasibi F, Balog K, Bratsberg SE (2016) On the reproducibility of the TAGME entity linking system. In: Ferro N et al (eds) Advances in Information Retrieval, ERIC2016, LNCS, vol 9626. Springer, Cham, pp 436–449
Chapter Google Scholar
Jha K, Röder M, Ngomo ACN (2017) All that glitters is not gold: rule-based curation of reference datasets for named entity recognition and entity linking. In: Blomqvist E, Maynard D, Gangemi A, Hoekstra R, Hitzler P, Hartig O (eds) The semantic web ESWC 2017, LNCS, vol 10249. Springer, Cham, pp 305–320
Google Scholar
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th international conference on world wide web, Raleigh North Carolina, USA, pp 591–600
Ling X, Singh S, Weld DS (2015) Design challenges for entity linking. Trans Assoc Comput Linguist 3:315–328
Article Google Scholar
Locke B, Martin J (2009) Named entity recognition: adapting to microblogging. University of Colorado UG Thesis, pp 1–12
Meij E, Weerkamp W, De Rijke M (2012) Adding semantics to microblog posts. In: Proceedings of the Fifth ACM international conference on web search and data mining. Seattle, Washington, USA, pp 563–572
Milne D, Witten IH (2008) Learning to link with wikipedia. In: Proceedings of the 17th ACM conference on information and knowledge management. Napa Valley, California, USA, pp 509–518
Piccinno F, Ferragina P (2014) From Tagme to WAT: a new entity annotator. In: Proceedings of the 1st ACM international workshop on entity recognition and disambiguation, co-located with SIGIR 2014. USA, pp 55–61
Ran C, Shen W, Wang J (2018) An attention factor graph model for tweet entity linking. In: Proceedings of the 2018 world wide web conference, Lyon, France, pp 1135–1144
Rizzo G, Cano AE, Pereira B, Varga A (2015) Making sense of microposts (#Microposts2015) named entity recognition and linking challenge. In: Proceedings of the 5th workshop on making sense of microposts, Florence, Italy, pp 44–53
Rizzo G, Van Erp M, Plu J, Troncy R (2016) Making sense of microposts (#Microposts2016) named entity recognition and linking (NEEL) challenge. In: Proceedings of the 6th workshop on making sense of microposts, Montreal, Canada, pp 50–59
Rosales-Méndez H, Hogan A, Poblete B (2018a) VoxEL: a benchmark dataset for multilingual entity linking. In: Vrandečić D (eds) The semantic web: ISWC, et al (2018) ISWC 2018, LNCS, vol 11137. LNCS, Springer, Cham, pp 170–186
Rosales-Méndez H, Hogan A, Poblete B (2019) Nifify: towards better quality entity linking datasets. In: Companion proceedings of the 2019 world wide web conference, San Francisco, USA, pp 815–818
Rosales-Méndez H, Hogan A, Poblete B (2020) Fine-grained evaluation for entity linking. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, Hong Kong, China, pp 718–727
Rosales-Méndez H, Poblete B, Hogan A (2018b) What should entity linking link? CEUR Workshop Proc 2100:1–5
Google Scholar
Shen W, Wang J, Luo P, Wang M (2013) Linking named entities in tweets with knowledge base via user interest modeling. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, Chicago, Illinois, USA, pp 68–76
Usbeck R, Röder M, Ngomo ACN, Baron C, Both A, Brümmer M, Ceccarelli D, Cornolti M, Cherix D, Eickmann B, Ferragina P, Lemke C, Moro A, Navigli R, Piccinno F, Rizzo G, Sack H, Speck R, Troncy R, Waitelonis J, Wesemann L (2015) GERBIL: General entity annotator benchmarking framework. In: Proceedings of the 24th international conference on world wide web, Florence, Italy, pp 1133–1143
Van Erp M, Mendes PN, Paulheim H, Ilievski F, Plu J, Rizzo G, Waitelonis J (2016) Evaluating entity linking: an analysis of current benchmark datasets and a roadmap for doing a better job. In: Proceedings of the tenth international conference on language resources and evaluation, Portorož, Slovenia, pp 4373–4379
Speck R, Ngonga ACN (2014) Ensemble learning for named entity recognition. In: Mika P (eds) The Semantic Web: ISWC et al (2014) ISWC 2014, LNCS, vol 8796. Springer, Cham, pp 293–308
Weichselbraun A, Braşoveanu AM, Kuntschik P, Nixon LJ (2019) Improving named entity linking corpora quality. In: Proceedings of the international conference on recent advances in natural language processing, Varna, Bulgaria, pp 1328–1337
Wu G, He Y, Hu X (2018) Entity linking: an issue to extract corresponding entity with knowledge base. IEEE Access 6:6220–6231
Article Google Scholar
Yosef MA, Hoffart J, Bordino I, Spaniol M, Weikum G (2011) AIDA: an online tool for accurate disambiguation of named entities in text and tables. Proc VLDB Endow 4(12):1450–1457
Article Google Scholar
Zarrinkalam F, Kahani M, Bagheri E (2018) Mining user interests over active topics on social networks. Inf Process Manage 54(2):339–357
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Science Computing, University of Abou Bekr Belkaid, BP 13000, Tlemcen, Algeria
Mohamed Cherif Nait-Hamoud & Fedoua Lahfa
Department of Mathematics and Science Computing, University of Larbi Tebessi, BP 12000, Tebessa, Algeria
Mohamed Cherif Nait-Hamoud
LITIS Laboratory EA-4108, University of Rouen-Normandie, Rouen, France
Abdellatif Ennaji

Authors

Mohamed Cherif Nait-Hamoud
View author publications
You can also search for this author in PubMed Google Scholar
Fedoua Lahfa
View author publications
You can also search for this author in PubMed Google Scholar
Abdellatif Ennaji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Cherif Nait-Hamoud.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nait-Hamoud, M.C., Lahfa, F. & Ennaji, A. A step further towards a consensus on linking tweets to Wikipedia. Evol. Intel. 16, 1825–1840 (2023). https://doi.org/10.1007/s12065-020-00549-8

Download citation

Received: 09 May 2020
Revised: 02 December 2020
Accepted: 09 December 2020
Published: 01 February 2021
Issue Date: December 2023
DOI: https://doi.org/10.1007/s12065-020-00549-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A step further towards a consensus on linking tweets to Wikipedia

Abstract

Access this article

Similar content being viewed by others

Entity Linking for Vietnamese Tweets

Implicit Entity Linking in Tweets

Named Entity Recognition and Linking in Tweets Based on Linguistic Similarity

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A step further towards a consensus on linking tweets to Wikipedia

Abstract

Access this article

Similar content being viewed by others

Entity Linking for Vietnamese Tweets

Implicit Entity Linking in Tweets

Named Entity Recognition and Linking in Tweets Based on Linguistic Similarity

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation