Skip to main content

WeDGeM: A Domain-Specific Evaluation Dataset Generator for Multilingual Entity Linking Systems

  • Conference paper
  • First Online:
Book cover Web Information Systems Engineering – WISE 2017 (WISE 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10570))

Included in the following conference series:

  • 1428 Accesses

Abstract

Entity Linking is the task to annotate ambiguous mentions in an unstructured text to the referent entities in the given knowledge base. To evaluate these approaches, there are a vast amount of general purpose benchmark datasets. However, it is difficult to evaluate domain-specific Entity Linking approaches due to lack of evaluation datasets for specific domains. This study presents a tool called WeDGeM as a multilingual evaluation set generator for specific domains using Wikipedia and DBpedia. Wikipedia category pages and DBpedia taxonomy are used for adjusting domain-specific annotated text generation. Wikipedia disambiguation pages are applied to determine the ambiguity level of the generated texts. Based on these texts, a use case for well-known Entity Linking systems supporting English and Turkish texts are evaluated in the movie domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    http://dbpedia.org/resource/Wicker_Park_(film).

  2. 2.

    http://dbpedia.org/resource/Josh_Hartnett.

  3. 3.

    http://lucene.apache.org/solr/.

  4. 4.

    https://dumps.wikimedia.org/.

  5. 5.

    https://github.com/einan/WeDGeM.

  6. 6.

    http://aksw.org/Projects/GERBIL.html.

  7. 7.

    https://dumps.wikimedia.org/trwiki/20170420/.

  8. 8.

    https://dumps.wikimedia.org/enwiki/20170420/.

  9. 9.

    https://en.wikipedia.org/wiki/Wicker_Park.

References

  1. Cornolti, M., Ferragina, P., Ciaramita, M.: A framework for benchmarking entity-annotation systems. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 249–260. ACM (2013)

    Google Scholar 

  2. Dou, D., Wang, H., Liu, H.: Semantic data mining: a survey of ontology-based approaches. In: 2015 IEEE International Conference on Semantic Computing (ICSC), pp. 244–251. IEEE (2015)

    Google Scholar 

  3. Eisner, J. (ed.): EMNLP-CoNLL 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, 28–30 June 2007. ACL (2007). http://www.aclweb.org/anthology/K/K07/

  4. Ellis, J., Getman, J., Mott, J., Li, X., Griffitt, K., Strassel, S., Wright, J.: Linguistic resources for 2013 knowledge base population evaluations. In: Proceedings of the Sixth Text Analysis Conference, TAC 2013, Gaithersburg, Maryland, USA, 18–19 November 2013 (2013)

    Google Scholar 

  5. Ernst, P., Siu, A., Weikum, G.: KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC Bioinform. 16(1), 157 (2015)

    Article  Google Scholar 

  6. Hassanzadeh, O., Consens, M.P.: Linked movie data base. In: LDOW (2009)

    Google Scholar 

  7. Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of wikipedia entities in web text. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 457–466. ACM (2009)

    Google Scholar 

  8. Li, X., Strassel, S., Ji, H., Griffitt, K., Ellis, J.: Linguistic resources for entity linking evaluation: from monolingual to cross-lingual. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, 23–25 May 2012, pp. 3098–3105 (2012). http://www.lrec-conf.org/proceedings/lrec2012/summaries/278.html

  9. Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, I-Semantics 2011, NY, USA, pp. 1–8 (2011). http://doi.acm.org/10.1145/2063518.2063519

  10. Mitchell, A., Strassel, S., Huang, S., Zakhary, R.: Ace 2004 multilingual training corpus. Linguist. Data Consortium 1, 1 (2005). Philadelphia

    Google Scholar 

  11. Moro, A., Cecconi, F., Navigli, R.: Multilingual word sense disambiguation and entity linking for everybody. In: Proceedings of the 2014 International Conference on Posters & #38; Demonstrations Track, ISWC-PD 2014, vol. 1272, pp. 25–28. CEUR-WS.org, Aachen, Germany (2014). http://dl.acm.org/citation.cfm?id=2878453.2878460

  12. Navigli, R.: Babelnet and friends: a manifesto for multilingual semantic processing. Intelligenza Artificiale 7(2), 165–181 (2013). http://dx.doi.org/10.3233/IA-130057

    Google Scholar 

  13. Singh, S., Subramanya, A., Pereira, F., McCallum, A.: Wikilinks: a large-scale cross-document coreference corpus labeled via links to Wikipedia. University of Massachusetts, Amherst, Technical report UM-CS-2012-015 (2012)

    Google Scholar 

  14. Spitkovsky, V.I., Chang, A.X.: A cross-lingual dictionary for English Wikipedia concepts. In: LREC, pp. 3168–3175 (2012)

    Google Scholar 

  15. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 142–147. Association for Computational Linguistics (2003)

    Google Scholar 

  16. Usbeck, R., Röder, M., Ngonga Ngomo, A.C., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL - general entity annotation benchmark framework. In: 24th WWW Conference (2015). http://svn.aksw.org/papers/2015/WWW_GERBIL/public.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emrah Inan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Inan, E., Dikenelli, O. (2017). WeDGeM: A Domain-Specific Evaluation Dataset Generator for Multilingual Entity Linking Systems. In: Bouguettaya, A., et al. Web Information Systems Engineering – WISE 2017. WISE 2017. Lecture Notes in Computer Science(), vol 10570. Springer, Cham. https://doi.org/10.1007/978-3-319-68786-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68786-5_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68785-8

  • Online ISBN: 978-3-319-68786-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics