Skip to main content
Log in

A step further towards a consensus on linking tweets to Wikipedia

  • Special Issue
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

The study of contemporary tweet-based Entity Linking (EL) systems reveals a lack of a standard definition and a consensus on the task. Specifically, identifying what should be annotated in texts remains a recurring question. This prevents proper design and fair evaluation of EL systems. To tackle this issue, the present paper introduces a set of rules intended to define the EL task for tweets. We experimented the effectiveness of the proposed rules by developing TELS, an end-to-end supervised system that links tweets to Wikipedia. The experiments conducted on five publicly available datasets show that our system outperforms the baselines with an improvement, in terms of overall macro F1-score (micro F1-score), ranging from 25.04% (7.32%) up to 35.36% (42.03%). Moreover, feature analysis reveals that when the annotation is not limited to very few entity types, the proposed rules capture more efficiently annotators’ tacit agreements from datasets. Consequently, the proposed rules constitute a step further towards a consensus on the EL task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The revised version of Meij is shared at: https://drive.google.com/file/d/1CiGNjyK350Lyn3h5yLreOKNKq7Eb4EZo/view?usp=sharing.

  2. A twitter user mention is a user defined nickname, it occurs in tweets preceded by the symbol @.

  3. https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews.

  4. http://www.lmdb.tech/bench/inmem/.

  5. https://developers.google.com/protocol-buffers/docs/overview.

  6. https://labs.criteo.com/2017/05/serialization/.

  7. TagMe website: https://tagme.di.unipi.it/tagmehelp.html.

  8. WAT website: https://sobigdata.d4science.org/web/tagme/wat-api.

  9. AIDA website: https://www.mpi-inf.mpg.de/yago-naga/aida/.

  10. DBpedia Spotlight website: https://www.dbpedia-spotlight.org/.

References

  1. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  2. Chu H (2011) MDB: a memory-mapped database and backend for OpenLDAP. In: Proceedings of the 3rd international conference on LDAP. Heidelberg, pp 35–47

  3. Cornolti M, Ferragina P, Ciaramita M (2013) A framework for benchmarking entity-annotation systems. In: Proceedings of the 22nd international conference on world wide web. Rio de Janeiro, pp 249–259

  4. Daiber J, Jakob M, Hokamp C, Mendes PN (2013) Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems. Graz, pp 121–124

  5. Derczynski L, Maynard D, Rizzo G, Van Erp M, Gorrell G, Troncy R, Petrak J, Bontcheva K (2015) Analysis of named entity recognition and linking for tweets. Inf Process Manage 51(2):32–49

    Article  Google Scholar 

  6. Serban O, Thapen N, Maginnis B, Hankin C, Foot V (2019) Real-time processing of social media with SENTINEL: a syndromic surveillance system incorporating deep learning for health classification. Inf Process Manage 56(3):1166–1184

    Article  Google Scholar 

  7. Feng Y, Zarrinkalam F, Bagheri E, Fani H, Al-Obeidat F (2018) Entity linking of tweets based on dominant entity candidates. Soc Netw Anal Min 8(46):1–16

    Google Scholar 

  8. Ferragina P, Scaiella U (2010) TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities). In: Proceedings of the 19th ACM international conference on Information and knowledge management, Toronto, Canada, pp 1625–1628

  9. Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting of the association for computational linguistics. Ann Arbor, Michigan, USA, pp 363–370

  10. Grishman R, Sundheim B (1996) Message understanding conference-6. In: Proceedings of the 16th conference on Computational linguistics, USA, pp 466–471

  11. Guo S, Chang MW, Kiciman E (2013) To link or not to link? a study on end-to-end tweet entity linking. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies, Atlanta, Georgia, USA, pp 1020–1030

  12. Habib MB, Van Keulen M (2012) Unsupervised improvement of named entity extraction in short informal context using disambiguation clues. In: Proceedings of the workshop of semantic web and information extraction, Galway City, Ireland, pp 1–9

  13. Habib MB, Van Keulen M (2016) TwitterNEED: a hybrid approach for named entity extraction and disambiguation for tweet. Nat Lang Eng 22(3):423–456

    Article  Google Scholar 

  14. Han H, Viriyothai P, Lim SJ, Lameter D, Mussell B (2019) Yet another framework for tweet entity linking (YAFTEL). In: Proceedings of the IEEE conference on multimedia information processing and retrieval, San Jose, CA, USA, pp 258–263

  15. Hasan M, Orgun MA, Schwitter R (2019) Real-time event detection from the Twitter data stream using the TwitterNews+ Framework. Inf Process Manage 56(3):1146–1165

    Article  Google Scholar 

  16. Hasibi F, Balog K, Bratsberg SE (2016) On the reproducibility of the TAGME entity linking system. In: Ferro N et al (eds) Advances in Information Retrieval, ERIC2016, LNCS, vol 9626. Springer, Cham, pp 436–449

    Chapter  Google Scholar 

  17. Jha K, Röder M, Ngomo ACN (2017) All that glitters is not gold: rule-based curation of reference datasets for named entity recognition and entity linking. In: Blomqvist E, Maynard D, Gangemi A, Hoekstra R, Hitzler P, Hartig O (eds) The semantic web ESWC 2017, LNCS, vol 10249. Springer, Cham, pp 305–320

    Google Scholar 

  18. Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th international conference on world wide web, Raleigh North Carolina, USA, pp 591–600

  19. Ling X, Singh S, Weld DS (2015) Design challenges for entity linking. Trans Assoc Comput Linguist 3:315–328

    Article  Google Scholar 

  20. Locke B, Martin J (2009) Named entity recognition: adapting to microblogging. University of Colorado UG Thesis, pp 1–12

  21. Meij E, Weerkamp W, De Rijke M (2012) Adding semantics to microblog posts. In: Proceedings of the Fifth ACM international conference on web search and data mining. Seattle, Washington, USA, pp 563–572

  22. Milne D, Witten IH (2008) Learning to link with wikipedia. In: Proceedings of the 17th ACM conference on information and knowledge management. Napa Valley, California, USA, pp 509–518

  23. Piccinno F, Ferragina P (2014) From Tagme to WAT: a new entity annotator. In: Proceedings of the 1st ACM international workshop on entity recognition and disambiguation, co-located with SIGIR 2014. USA, pp 55–61

  24. Ran C, Shen W, Wang J (2018) An attention factor graph model for tweet entity linking. In: Proceedings of the 2018 world wide web conference, Lyon, France, pp 1135–1144

  25. Rizzo G, Cano AE, Pereira B, Varga A (2015) Making sense of microposts (#Microposts2015) named entity recognition and linking challenge. In: Proceedings of the 5th workshop on making sense of microposts, Florence, Italy, pp 44–53

  26. Rizzo G, Van Erp M, Plu J, Troncy R (2016) Making sense of microposts (#Microposts2016) named entity recognition and linking (NEEL) challenge. In: Proceedings of the 6th workshop on making sense of microposts, Montreal, Canada, pp 50–59

  27. Rosales-Méndez H, Hogan A, Poblete B (2018a) VoxEL: a benchmark dataset for multilingual entity linking. In: Vrandečić D (eds) The semantic web: ISWC, et al (2018) ISWC 2018, LNCS, vol 11137. LNCS, Springer, Cham, pp 170–186

  28. Rosales-Méndez H, Hogan A, Poblete B (2019) Nifify: towards better quality entity linking datasets. In: Companion proceedings of the 2019 world wide web conference, San Francisco, USA, pp 815–818

  29. Rosales-Méndez H, Hogan A, Poblete B (2020) Fine-grained evaluation for entity linking. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, Hong Kong, China, pp 718–727

  30. Rosales-Méndez H, Poblete B, Hogan A (2018b) What should entity linking link? CEUR Workshop Proc 2100:1–5

    Google Scholar 

  31. Shen W, Wang J, Luo P, Wang M (2013) Linking named entities in tweets with knowledge base via user interest modeling. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, Chicago, Illinois, USA, pp 68–76

  32. Usbeck R, Röder M, Ngomo ACN, Baron C, Both A, Brümmer M, Ceccarelli D, Cornolti M, Cherix D, Eickmann B, Ferragina P, Lemke C, Moro A, Navigli R, Piccinno F, Rizzo G, Sack H, Speck R, Troncy R, Waitelonis J, Wesemann L (2015) GERBIL: General entity annotator benchmarking framework. In: Proceedings of the 24th international conference on world wide web, Florence, Italy, pp 1133–1143

  33. Van Erp M, Mendes PN, Paulheim H, Ilievski F, Plu J, Rizzo G, Waitelonis J (2016) Evaluating entity linking: an analysis of current benchmark datasets and a roadmap for doing a better job. In: Proceedings of the tenth international conference on language resources and evaluation, Portorož, Slovenia, pp 4373–4379

  34. Speck R, Ngonga ACN (2014) Ensemble learning for named entity recognition. In: Mika P (eds) The Semantic Web: ISWC et al (2014) ISWC 2014, LNCS, vol 8796. Springer, Cham, pp 293–308

  35. Weichselbraun A, Braşoveanu AM, Kuntschik P, Nixon LJ (2019) Improving named entity linking corpora quality. In: Proceedings of the international conference on recent advances in natural language processing, Varna, Bulgaria, pp 1328–1337

  36. Wu G, He Y, Hu X (2018) Entity linking: an issue to extract corresponding entity with knowledge base. IEEE Access 6:6220–6231

    Article  Google Scholar 

  37. Yosef MA, Hoffart J, Bordino I, Spaniol M, Weikum G (2011) AIDA: an online tool for accurate disambiguation of named entities in text and tables. Proc VLDB Endow 4(12):1450–1457

    Article  Google Scholar 

  38. Zarrinkalam F, Kahani M, Bagheri E (2018) Mining user interests over active topics on social networks. Inf Process Manage 54(2):339–357

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Cherif Nait-Hamoud.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nait-Hamoud, M.C., Lahfa, F. & Ennaji, A. A step further towards a consensus on linking tweets to Wikipedia. Evol. Intel. 16, 1825–1840 (2023). https://doi.org/10.1007/s12065-020-00549-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-020-00549-8

Keywords

Navigation