Entity Typing Using Distributional Semantics and DBpedia

van Erp, Marieke; Vossen, Piek

doi:10.1007/978-3-319-68723-0_9

Marieke van Erp²⁴ &
Piek Vossen²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10579))

Included in the following conference series:

International Semantic Web Conference

658 Accesses

Abstract

Recognising entities in a text and linking them to an external resource is a vital step in creating a structured resource (e.g. a knowledge base) from text. This allows semantic querying over a dataset, for example selecting all politicians or football players. However, traditional named entity recognition systems only distinguish a limited number of entity types (such as Person, Organisation and Location) and entity linking has the limitation that often not all entities found in a text can be linked to a knowledge base. This creates a gap in coverage between what is in the text and what can be annotated with fine grained types.

This paper presents an approach to detect entity types using DBpedia type information and distributional semantics. The distributional semantics paradigm assumes that similar words occur in similar contexts. We exploit this by comparing entities with an unknown type to entities for which the type is known and assign the type of the most similar set of entities to the entity with the unknown type. We demonstrate our approach on seven different named entity linking datasets.

To the best of our knowledge, our approach is the first to combine word embeddings with external type information for this task. Our results show that this task is challenging but not impossible and performance improves when narrowing the search space by adding more context to the entities in the form of topic information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Entity Linking with Distributional Semantics

Entity Typing and Linking Using SPARQL Patterns and DBpedia

Cross-Lingual Type Inference

Notes

1.
As entities are made up of words, we hypothesise that this also extends to entities.
2.
https://code.google.com/archive/p/word2vec/.
3.
For this paper, we ran experiments on a Ubuntu machine with 2 CPUs, 16 GB of RAM and most experiments did not take longer than 2 h.
4.
AIDA-YAGO2 originally contained Wikipedia URLs but these have been mapped to their corresponding DBpedia URIs.
5.
https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/aida/downloads/.
6.
Available from: http://www.jmlr.org/papers/volume5/lewis04a/lyrl2004_rcv1v2_ README.htm Last visited: 27 April 2016.
7.
http://scc-research.lancaster.ac.uk/workshops/microposts2014/challenge/index.html.
8.
http://scc-research.lancaster.ac.uk/workshops/microposts2015/challenge/index.html.
9.
https://github.com/anuzzolese/oke-challenge.
10.
http://stlab.istc.cnr.it/stlab/WikipediaOntology/.
11.
https://github.com/AKSW/n3-collection.
12.
http://yovisto.com/labs/wes2015/wes2015-dataset-nif.rdf.
13.
http://blog.yovisto.com/.
14.
http://www.newsreader-project.eu/results/data/wikinews.
15.
https://en.wikinews.org/.
16.
https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing.
17.
Unfortunately, no further information about the Google News corpus is available as it is not an open dataset.
18.
https://github.com/idio/wiki2vec.
19.
http://trec.nist.gov/data/reuters/reuters.html.
20.
https://radimrehurek.com/gensim/models/word2vec.html.

References

ACE (Automatic Content Extraction) english annotation guidelines for entities (2006). http://www.ldc.upenn.edu/Projects/ACE/
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Cano, A.E., Rizzo, G., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.S.: Making sense of microposts (#Microposts2014) named entity extraction & linking challenge. In: 4th International Workshop on Making Sense of Microposts. #Microposts (2014)
Google Scholar
Elsner, M., Charniak, E., Johnson, M.: Structured generative models for unsupervised named-entity clustering. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2009), pp. 164–172 (2009)
Google Scholar
van Erp, M., Ilievski, F., Rospocher, M., Vossen, P.: Missing Mr. Brown and buying an Abraham Lincoln - dark entities and DBpedia. In: Proceedings of NLP & DBpedia 2015 Workshop in Conjunction with 14th International Semantic Web Conference (ISWC 2015). CEUR Workshop Proceedings (2015)
Google Scholar
van Erp, M., Mendes, P., Paulheim, H., Ilievski, F., Plu, J., Rizzo, G., Waitelonis, J.: Evaluating entity linking: an analysis of current benchmark datasets and a roadmap for doing a better job. In: Proceedings of LREC 2016 (2016). Preprint available from: https://mariekevanerp.files.wordpress.com/2012/06/evaluating-entity-linking-1.pdf
Grishman, R., Sundheim, B.M.: Message understanding conference - 6: a brief history. In: Proceedings International Conference on Computational Linguistics (1996)
Google Scholar
Hachey, B., Radford, W., Nothman, J., Honnibal, M., Curran, J.R.: Evaluating entity linking with Wikipedia. Artif. Intell. 9, 130–150 (2013)
Article MATH MathSciNet Google Scholar
Hoffart, J., Yosef, M.A., Bordin, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities. In: Conference on Empirical Methods in Natural Language Processing. EMNLP (2011)
Google Scholar
Kittur, A., Chi, E.H., Suh, B.: What’s in Wikipedia?: mapping topics and conflict using socially annotated category structure. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2009), pp. 1509–1512. ACM, New York (2009)
Google Scholar
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
Google Scholar
Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems (I-SEMANTICS 2011), Graz, Austria. ACM New York, 7–9 September 2011
Google Scholar
Mihalcea, R., Csomai, A.: Wikify! linking document to encyclopedic knowledge. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM 2007), pp. 233–242 (2007)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013)
Google Scholar
Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM 2008), pp. 509–518 (2008)
Google Scholar
Minard, A.L., Speranza, M., Urizar, R., na Altuna, B., van Erp, M., Schoen, A., van Son, C.: MEANTIME, the newsreader multilingual event and time corpus. In: Proceedings of the 10th Edition of the Language Resources and Evaluation Conference (LREC 2016) (2016)
Google Scholar
Nadeau, D.: Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision. Ph.D. thesis, University of Ottawa (2007)
Google Scholar
Nguyen, T.H., Grishman, R.: Relation extraction: perspective from convolutional neural networks. In: Proceedings of NAACL-HLT 2015, Denver, Colorado, USA, pp. 39–48, 31 May – 5 June 2015
Google Scholar
Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A., Garigliotti, D., Navigli, R.: Open knowledge extraction challenge. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) SemWebEval 2015. CCIS, vol. 548, pp. 3–15. Springer, Cham (2015). doi:10.1007/978-3-319-25518-7_1
Chapter Google Scholar
Rizzo, G., Cano Amparo, E., Pereira, B., Varga, A.: Making sense of microposts (#Microposts2015) named entity recognition & linking challenge. In: 5th International Workshop on Making Sense of Microposts. #Microposts (2015)
Google Scholar
Rizzo, G., Troncy, R.: NERD: a framework for unifying named entity recognition and disambiguation extraction tools. In: 13th Conference of the European Chapter of the Association for computational Linguistics (EACL 2012) (2012)
Google Scholar
Röder, M., Usbeck, R., Hellmann, S., Gerber, D., Both, A.: N3-a collection of datasets for named entity recognition and disambiguation in the NLP interchange format. In: 9th Language Resources and Evaluation Conference. LREC (2014)
Google Scholar
Sang, E.F.T.K.: Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. In: Proceedings of CoNLL-2002, Taipei, Taiwan (2002)
Google Scholar
Sekine, S., Sudo, K., Nobata, C.: Extended named entity hierarchy. In: Proceedings of the Third International Conference on Language Resources and Evaluation, pp. 1818–1824 (2002)
Google Scholar
Sienčnik, S.K.: Adapting word2vec to named entity recognition. In: Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), Vilnius, Lithuania, pp. 239–243, 11–13 May 2015
Google Scholar
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Conference on Computational Natural Language Learning. CoNLL (2003)
Google Scholar
Usbeck, R., Ngomo, A.C.N., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS -graph-based disambiguation of named entities using linked data. In: Proceedings of the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, pp. 457–471, October 2014
Google Scholar
Waitelonis, J., Exeler, C., Sack, H.: Linked data enabled generalized vector space model to improve document retrieval. In: Proceedings of NLP & DBpedia 2015 Workshop in Conjunction with 14th International Semantic Web Conference (ISWC2015). CEUR Workshop Proceedings (2015)
Google Scholar

Download references

Acknowledgements

The research for this paper was made possible by the CLARIAH-CORE project financed by NWO: http://www.clariah.nl.

Author information

Authors and Affiliations

Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Marieke van Erp & Piek Vossen

Authors

Marieke van Erp
View author publications
You can also search for this author in PubMed Google Scholar
Piek Vossen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marieke van Erp .

Editor information

Editors and Affiliations

Digital Humanities Group, KNAW Humanities Cluster, Amsterdam, The Netherlands
Marieke van Erp
University of Leipzig, Leipzig, Germany
Sebastian Hellmann
National University of Ireland, Galway, Ireland
John P. McCrae
Institut für Informatik, Goethe University Frankfurt, Frankfurt, Hessen, Germany
Christian Chiarcos
Division of Web Science and Technology, Department of Computer Science, KAIST, Daejeon, Korea (Republic of)
Key-Sun Choi
Universidad Politécnica de Madrid, Madrid, Spain
Jorge Gracia
Waseda University, Tokyo, Japan
Yoshihiko Hayashi
Ontolonomy LLC, Yokohama, Japan
Seiji Koide
Apple San Francisco, San Francisco, California, USA
Pablo Mendes
Inst für Info & Wirtschaftsinfo, Universität Mannheim, Mannheim, Baden-Württemberg, Germany
Heiko Paulheim
National Institute of Informatics, Tokyo, Japan
Hideaki Takeda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Erp, M., Vossen, P. (2017). Entity Typing Using Distributional Semantics and DBpedia. In: van Erp, M., et al. Knowledge Graphs and Language Technology. ISWC 2016. Lecture Notes in Computer Science(), vol 10579. Springer, Cham. https://doi.org/10.1007/978-3-319-68723-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-68723-0_9
Published: 29 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68722-3
Online ISBN: 978-3-319-68723-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics