ABSTRACT
The Entity Linking (EL) task is concerned with linking entity mentions in a text collection with their corresponding knowledge-base entries. Despite the progress made in the evaluation of EL systems, there is still much work to be done, where this Ph.D. research tackles issues concerning EL evaluation. Among these issues, we stress (a) the lack of consensus about the definition of “entity” and the lack of evaluation metrics that allow for different notions of entities, (b) the lack of datasets that allow for cross-language comparison, and (c) the focus on evaluating high-level systems rather than low-level techniques. By addressing these challenges and better understanding the performance of EL systems, our hypothesis is that we can create a more general, more configurable EL framework that can be better adapted to the needs of a particular application. In the early stages of this PhD work, we have identified these problems and begun to address (a) and (b), publishing initial results that constitute a significant step forward in our investigation. However, there are still further challenges that must be addressed before we reach our goal. Our next steps thus involve proposing a more fluid definition of “entity” adaptable to different applications, the definition of quality measures that allow for comparing EL approaches targeting different types of entities, as well as the creation of a customizable EL framework that allows for composing and evaluating individual techniques as appropriate to a particular task.
- Carmen Brando, Francesca Frontini, and Jean-Gabriel Ganascia. 2016. REDEN: named entity linking in digital literary editions using linked data sets. CSIMQ7(2016), 60–80.Google Scholar
- Martin Brümmer, Milan Dojchinovski, and Sebastian Hellmann. 2016. DBpedia Abstracts: A Large-Scale, Open, Multilingual NLP Training Corpus.. In LREC.Google Scholar
- Silviu Cucerzan. 2007. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. EMNLP-CoNLL (2007), 708.Google Scholar
- Milan Dojchinovski and Tomás Kliegr. 2013. Entityclassifier.eu: Real-Time Classification of Entities in Text with Wikipedia. In ECML/PKDD. 654–658. Google ScholarDigital Library
- Alan Eckhardt, Juraj Hresko, Jan Procházka, and Otakar Smrs. 2014. Entity linking based on the co-occurrence graph and entity probability. In ERD. 37–44. Google ScholarDigital Library
- Oren Etzioni, Michael J. Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the Web: An experimental study. Artif. Intell. 165, 1 (2005), 91–134. Google ScholarCross Ref
- Paolo Ferragina and Ugo Scaiella. 2010. TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In CIKM. 1625–1628. Google ScholarDigital Library
- Michael Fleischman. 2001. Automated Subcategorization of Named Entities. In ACL. 25–30.Google Scholar
- Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero, Andrea Giovanni Nuzzolese, Francesco Draicchio, and Misael Mongiovì. 2017. Semantic Web Machine Reading with FRED. Semantic Web 8, 6 (2017), 873–893.Google ScholarDigital Library
- Ralph Grishman and Beth Sundheim. 1996. Message Understanding Conference- 6: A Brief History. In COLING. 466–471. Google ScholarDigital Library
- Johannes Hoffart and et al.2011. Robust disambiguation of named entities in text. In EMNLP. ACL, 782–792. Google ScholarDigital Library
- Johannes Hoffart, Stephan Seufert, Dat Ba Nguyen, Martin Theobald, and Gerhard Weikum. 2012. KORE: keyphrase overlap relatedness for entity disambiguation. In CIKM. 545–554. Google ScholarDigital Library
- Johannes Hoffart, Stephan Seufert, Dat Ba Nguyen, Martin Theobald, and Gerhard Weikum. 2012. KORE: keyphrase overlap relatedness for entity disambiguation. In CIKM. 545–554. Google ScholarDigital Library
- Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Fürstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum. 2011. Robust Disambiguation of Named Entities in Text. In EMNLP. 782–792. Google ScholarDigital Library
- Kunal Jha, Michael Röder, and Axel-Cyrille Ngonga Ngomo. 2017. All that Glitters Is Not Gold - Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking. In ESWC. 305–320.Google Scholar
- Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and Soumen Chakrabarti. 2009. Collective annotation of Wikipedia entities in web text. In SIGKDD. 457–466. Google ScholarDigital Library
- Xiao Ling, Sameer Singh, and Daniel S. Weld. 2015. Design Challenges for Entity Linking. TACL 3(2015), 315–328.Google ScholarCross Ref
- Pablo N. Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. 2011. DBpedia spotlight: shedding light on the web of documents. In I-SEMANTICS. 1–8. Google ScholarDigital Library
- Pablo N Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. 2011. DBpedia spotlight: shedding light on the web of documents. In I-SEMANTICS. ACM, 1–8. Google ScholarDigital Library
- A.L. Minard and et al.2016. MEANTIME, the NewsReader multilingual event and time corpus. (2016).Google Scholar
- Anne-Lyse Minard, Manuela Speranza, Ruben Urizar, Begoña Altuna, Marieke van Erp, Anneleen Schoen, and Chantal van Son. 2016. MEANTIME, the NewsReader Multilingual Event and Time Corpus. In LREC.Google Scholar
- Andrea Moro and Roberto Navigli. 2015. SemEval-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking.. In SemEval. 288–297.Google Scholar
- Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity Linking meets Word Sense Disambiguation: a Unified Approach. TACL 2(2014), 231–244.Google ScholarCross Ref
- Diego Moussallem, Ricardo Usbeck, Michael Röder, and Axel-Cyrille Ngonga Ngomo. 2018. Entity Linking in 40 Languages Using MAG. In ESWC. 176–181.Google Scholar
- Dat Ba Nguyen, Martin Theobald, and Gerhard Weikum. 2016. J-NERD: Joint Named Entity Recognition and Disambiguation with Rich Linguistic Features. TACL 4(2016), 215–229.Google ScholarCross Ref
- Julien Plu, Giuseppe Rizzo, and Raphaël Troncy. 2016. Enhancing Entity Linking by Combining NER Models. In ESWC. 17–32.Google Scholar
- Lev Ratinov, Dan Roth, Doug Downey, and Mike Anderson. 2011. Local and global algorithms for disambiguation to Wikipedia. In NAACL-HLT. 1375–1384. Google ScholarDigital Library
- Michael Röder, Ricardo Usbeck, Sebastian Hellmann, Daniel Gerber, and Andreas Both. 2014. N3 - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format. In LREC. 3529–3533.Google Scholar
- Henry Rosales-Méndez, Aidan Hogan, and Barbara Poblete. 2018. Machine Translation vs. Multilingual Approaches for Entity Linking. In ISWC (P&D/Industry/BlueSky).Google Scholar
- Henry Rosales-Méndez, Aidan Hogan, and Barbara Poblete. 2018. VoxEL: A Benchmark Dataset for Multilingual Entity Linking. In ISWC. 170–186.Google Scholar
- Henry Rosales-Méndez, Barbara Poblete, and Aidan Hogan. 2017. Multilingual Entity Linking: Comparing English and Spanish. In LD4IE. 62–73.Google Scholar
- Henry Rosales-Méndez, Barbara Poblete, and Aidan Hogan. 2018. What Should Entity Linking link?. In AMW.Google Scholar
- Felix Sasaki, Milan Dojchinovski, and Jan Nehring. 2016. Chainable and Extendable Knowledge Integration Web Services. In NLP&DBpedia. 89–101.Google Scholar
- Chen-Tse Tsai and Dan Roth. 2016. Cross-lingual Wikification Using Multilingual Embeddings. In HLT-NAACL. 589–598.Google Scholar
- Victoria S. Uren, Philipp Cimiano, José Iria, Siegfried Handschuh, Maria Vargas-Vera, Enrico Motta, and Fabio Ciravegna. 2006. Semantic annotation for knowledge management: Requirements and a survey of the state of the art. Journal of Web Semantics 4, 1 (2006), 14–28. Google ScholarDigital Library
- Jörg Waitelonis, Claudia Exeler, and Harald Sack. 2015. Linked data enabled generalized vector space model to improve document retrieval. In NLP&DBpedia.Google Scholar
- Longyue Wang, Shuo Li, Derek F. Wong, and Lidia S. Chao. 2012. A Joint Chinese Named Entity Recognition and Disambiguation System. In CIPS-SIGHAN. 146–151.Google Scholar
Index Terms
- Towards Better Entity Linking Evaluation
Recommendations
NIFify: Towards Better Quality Entity Linking Datasets
WWW '19: Companion Proceedings of The 2019 World Wide Web ConferenceThe Entity Linking (EL) task identifies entity mentions in a text corpus and associates them with a corresponding unambiguous entry in a Knowledge Base. The evaluation of EL systems relies on the comparison of their results against gold standards. A ...
Re-ranking for joint named-entity recognition and linking
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementRecognizing names and linking them to structured data is a fundamental task in text analysis. Existing approaches typically perform these two steps using a pipeline architecture: they use a Named-Entity Recognition (NER) system to find the boundaries of ...
WeDGeM: A Domain-Specific Evaluation Dataset Generator for Multilingual Entity Linking Systems
Web Information Systems Engineering – WISE 2017AbstractEntity Linking is the task to annotate ambiguous mentions in an unstructured text to the referent entities in the given knowledge base. To evaluate these approaches, there are a vast amount of general purpose benchmark datasets. However, it is ...
Comments