Knowledge graph entity typing via learning connecting embeddings

https://doi.org/10.1016/j.knosys.2020.105808Get rights and content

Abstract

Knowledge graph (KG) entity typing aims at inferring possible missing entity type instances in KG, which is a very significant but still under-explored subtask of knowledge graph completion. In this paper, we propose a novel approach for KG entity typing which is trained by jointly utilizing local typing knowledge from existing entity type assertions and global triple knowledge in KGs. Specifically, we present two distinct knowledge-driven effective mechanisms of entity type inference. Accordingly, we build two novel embedding models to realize the mechanisms. Afterward, a joint model via connecting them is used to infer missing entity type instances, which favors inferences that agree with both entity type instances and triple knowledge in KGs. Experimental results on two real-world datasets (Freebase and YAGO) demonstrate the effectiveness of our proposed mechanisms and models for improving KG entity typing.

Introduction

The past decade has witnessed great thrive in building web-scale knowledge graphs (KGs), such as Freebase [1], GoogleKnowledge Graph [2], YAGO [3], which usually consists of a huge amount of triples in the form of (head entity, relation, tail entity) (denoted (e,r,ẽ)). KGs usually suffer from incompleteness and miss important facts, jeopardizing their usefulness in downstream tasks such as question answering [4], semantic parsing [5], relation classification [6]. Hence, the task of knowledge graph completion (KGC, i.e. completing knowledge graph entries) is extremely significant and attracts wide attention.

This paper concentrates on KG entity typing, i.e. inferring missing entity type instances in KGs, which is an important sub-problem of KGC. Entity type instances, each of which is in the formed of (entity, entity type) (denoted (e,t)), are essential entries of KGs and widely used in many NLP tasks such as relation extraction [7], [8], coreference resolution [9], entity linking [10]. KGs usually suffer from entity types incompleteness. For instance, 10% of entities in FB15k [11], which have the /music/artist type, miss the /people/person type [12]. KG entity type incompleteness leads to some type-involved algorithms in KG-driven tasks grossly inefficient or even unavailable. However, most previous works of KGC only focus on inferring missing entities and relationships [11], [13], [14], [15], [16], [17], [18], [19] (e.g. entity prediction (head entity, relationship, tail entity = ?)), paying less attention to entity type prediction (entity, entity type = ?), which limits the application of KGs in downstream tasks.

To solve KG entity type incompleteness issue, in this paper we propose a novel embedding methodology for inferring missing entity type instances in KGs that employs not only the existing local typing knowledge in existing entity type assertions, as most conventional approaches do, but also leverages the global triple knowledge in KGs. We build two distinct knowledge-driven entity type inference mechanisms with these two kinds of structural knowledge.

Mechanism 1

Missing entity types of an entity can be found from other entities that are close to the entity in the embedding space, using local typing knowledge as in Fig. 1 (Mech. 1).

Mechanism 2

Missing entity types of an (head or tail) entity can be inferred from the types of other (tail or head) entities through their relationships, using global triple knowledge as in Fig. 1 (Mech. 2).

The main idea behind Mechanism 1 is based on the observation that the learned entities’ embeddings by conventional KG embedding methods [20], [21] cluster well according to their types in vector space. For instance in Fig. 1(Mech. 1), given an entity Barack Obama, it’s missing type /people/person can be induced by the given type of similar entity Donald Trump. In addition, the key motivation behind Mechanism 2 is that the relationship shall remain unchanged if the entities in a triple fact are replaced with their corresponding types. For instance, given a global triple fact (Barack Obama, born_in, Honolulu), under this assumption, we can induce a new type triple (/people/person, born_in, /location/location).1 Formally, HonoluluBarackObama=∕location∕location∕people∕person ( = born_in), which can be used to infer missing entity types, e.g. (Barack Obama, type? = /people/person) via BarackObamaHonolulu+∕location∕location=∕people∕person, as Mechanism 2 does. Fig. 1 demonstrates a simple illustration of effective mechanisms of entity type inference. Both mechanisms are utilized to build our final composite model.

Specifically, we first build two embedding models to realize the two mechanisms respectively. Considering entities and entity types are completely distinct objects, we build two distinct embedding spaces for them, i.e., entity space and entity type space. Accordingly, we encode (e,t) entity type instance by projecting the entity from entity space to entity type space with mapping matrix M, hence we have (1):

, called E2T. Moreover, we learn the plausibility of (te,r,tẽ) global type triple by newly generalizing from (e,r,ẽ) global triple fact, even though this type triple is not present originally. Following translating assumption [11], we have (2):
, called TRT. E2T and TRT are the implementation models of the two mechanisms. Fig. 2 demonstrates a brief illustration of our models. A ranking-based embedding framework is used to train our models. Thereby, entities, entity types, and relationships are all embedded into low-dimensional vector spaces, where the composite energy score of both E2T and TRT are computed and utilized to determine the optimal types for (entity, entity type = ?) incomplete assertions.

The experimental results on real-world datasets show that our composite model achieves significant and consistent improvement compared to all baselines in entity type prediction and achieves comparable performance in entity type classification. The main contributions of this work are concluded as follows:

  • We propose a novel framework for inferring missing entity type instances in KGs by connecting entity type instances and global triple information and correspondingly present two effective mechanisms.

  • Under these mechanisms, we propose two novelembedding-based models: one for predicting entity types given entities and another one to encode the interactions among entity types and relationships from KGs. A combination of both models are utilized to conduct entity type inference.

  • We conduct empirical experiments on two real-worlddatasets for entity type prediction and classification, which demonstrate our model can successfully take into account global triple information to improve KG entity typing.

Section snippets

Related works

Entity typing is valuable for many NLP tasks, such as knowledge base population [22], question answering [4], etc. In recent years, researchers attempt to mine fine-grained entity types [23], [24], [25], [26] with external text information, such as web search query logs [27], the textual surface patterns [28], context representation [29], Wikipedia [22]. Moniruzzaman et al. [30] improves performance in fine-grained type inference by leveraging domain knowledge and utilizing additional data

Embedding-based framework

We consider a KG containing entity type instances of the form (e,t)H (H is the training set consists of lots of (entity, entity type) assertions), where eE (E is the set of all entities) is an entity in the KG with the type tT (T is the set of all types). For example, e could be Barack Obama and t could be /people/person. As a single entity can have multiple types, entities in KG often miss some of their types. The aim of this work is to infer missing entity type instances in KGs.

Our work

Experiments

We present the experiments of entity type completion and entity type classification to demonstrate the effectiveness of our proposed model.

Conclusion and future work

In this paper we described a framework for leveraging global triple knowledge to improve KG entity typing by training not only on (entity, entity type) assertions but also using all newly generated (head type, relationship, tail type) type triples. Specifically, we propose two novel embedding-based models to encode entity type instances and type triples respectively. The connection of both models is utilized to infer missing entity type instances. The empirical experiments demonstrate the

CRediT authorship contribution statement

Yu Zhao: Conceptualization, Methodology, Wring-orininal draft, Writing-review&editing. Anxiang Zhang: Validation, Wring-orininal draft. Huali Feng: Writing-review&editing. Qing Li: Supervision. Patrick Gallinari: Supervision. Fuji Ren: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant No. 61906159; the Sichuan Science and Technology Program, China under Grant No. 2018JY0607; the Fundamental Research Funds for the Central Universities, China under Grant No. JBK2003008; the Financial Intelligence and Financial Engineering Key Laboratory of Sichuan Province, China .

References (47)

  • R. Zhang, F. Kong, C. Wang, Y. Mao, Embedding of hierarchically typed knowledge bases, in: Proceddings of AAAI,...
  • P. Jain, P. Kumar, S. Chakrabarti, Type-sensitive knowledge base inference without explicit type supervision, in:...
  • H. Hajishirzi, L. Zilles, D.S. Weld, L. Zettlemoyer, Joint coreference resolution and named-entity linking with...
  • N. Gupta, S. Singh, D. Roth, Entity linking via joint encoding of types, descriptions, and context, in: Proceedings of...
  • A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, O. Yakhnenko, Translating embeddings for modeling multi-relational...
  • C. Moon, P. Jones, N.F. Samatova, Learning entity type embeddings for knowledge graph completion, in: Proceedings of...
  • Z. Wang, J. Zhang, J. Feng, Z. Chen, Knowledge graph embedding by translating on hyperplanes, in: Proceedings of AAAI,...
  • Y. Lin, Z. Liu, M. Sun, Y. Liu, X. Zhu, Learning entity and relation embeddings for knowledge graph completion, in:...
  • T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, G. Bouchard, Complex embeddings for simple link prediction, in:...
  • T. Dettmers, M. Pasquale, S. Pontus, S. Riedel, Convolutional 2D knowledge graph embeddings, in: Proceedings of AAAI,...
  • B. Ding, Q. Wang, B. Wang, L. Guo, Improving knowledge graph embedding using simple constraints, in: Proceedings of...
  • R. Xie, Z. Liu, F. Lin, L. Lin, Does william shakespeare really write hamlet? knowledge representation learning with...
  • D. Nathani, J. Chauhan, C. Sharma, M. Kaul, Learning attention-based embeddings for relation prediction in knowledge...
  • Cited by (8)

    View all citing articles on Scopus
    View full text