Knowledge graph entity typing via learning connecting embeddings
Introduction
The past decade has witnessed great thrive in building web-scale knowledge graphs (KGs), such as Freebase [1], GoogleKnowledge Graph [2], YAGO [3], which usually consists of a huge amount of triples in the form of (head entity, relation, tail entity) (denoted ()). KGs usually suffer from incompleteness and miss important facts, jeopardizing their usefulness in downstream tasks such as question answering [4], semantic parsing [5], relation classification [6]. Hence, the task of knowledge graph completion (KGC, i.e. completing knowledge graph entries) is extremely significant and attracts wide attention.
This paper concentrates on KG entity typing, i.e. inferring missing entity type instances in KGs, which is an important sub-problem of KGC. Entity type instances, each of which is in the formed of (entity, entity type) (denoted ()), are essential entries of KGs and widely used in many NLP tasks such as relation extraction [7], [8], coreference resolution [9], entity linking [10]. KGs usually suffer from entity types incompleteness. For instance, 10% of entities in FB15k [11], which have the /music/artist type, miss the /people/person type [12]. KG entity type incompleteness leads to some type-involved algorithms in KG-driven tasks grossly inefficient or even unavailable. However, most previous works of KGC only focus on inferring missing entities and relationships [11], [13], [14], [15], [16], [17], [18], [19] (e.g. entity prediction (head entity, relationship, tail entity ?)), paying less attention to entity type prediction (entity, entity type ?), which limits the application of KGs in downstream tasks.
To solve KG entity type incompleteness issue, in this paper we propose a novel embedding methodology for inferring missing entity type instances in KGs that employs not only the existing local typing knowledge in existing entity type assertions, as most conventional approaches do, but also leverages the global triple knowledge in KGs. We build two distinct knowledge-driven entity type inference mechanisms with these two kinds of structural knowledge.
Mechanism 1 Missing entity types of an entity can be found from other entities that are close to the entity in the embedding space, using local typing knowledge as in Fig. 1 (Mech. 1).
Mechanism 2 Missing entity types of an (head or tail) entity can be inferred from the types of other (tail or head) entities through their relationships, using global triple knowledge as in Fig. 1 (Mech. 2).
The main idea behind Mechanism 1 is based on the observation that the learned entities’ embeddings by conventional KG embedding methods [20], [21] cluster well according to their types in vector space. For instance in Fig. 1(Mech. 1), given an entity Barack Obama, it’s missing type /people/person can be induced by the given type of similar entity Donald Trump. In addition, the key motivation behind Mechanism 2 is that the relationship shall remain unchanged if the entities in a triple fact are replaced with their corresponding types. For instance, given a global triple fact (Barack Obama, born_in, Honolulu), under this assumption, we can induce a new type triple (/people/person, born_in, /location/location).1 Formally, ( ), which can be used to infer missing entity types, e.g. (Barack Obama, type? /people/person) via , as Mechanism 2 does. Fig. 1 demonstrates a simple illustration of effective mechanisms of entity type inference. Both mechanisms are utilized to build our final composite model.
Specifically, we first build two embedding models to realize the two mechanisms respectively. Considering entities and entity types are completely distinct objects, we build two distinct embedding spaces for them, i.e., entity space and entity type space. Accordingly, we encode entity type instance by projecting the entity from entity space to entity type space with mapping matrix , hence we have (1): , called E2T. Moreover, we learn the plausibility of () global type triple by newly generalizing from () global triple fact, even though this type triple is not present originally. Following translating assumption [11], we have (2): , called TRT. E2T and TRT are the implementation models of the two mechanisms. Fig. 2 demonstrates a brief illustration of our models. A ranking-based embedding framework is used to train our models. Thereby, entities, entity types, and relationships are all embedded into low-dimensional vector spaces, where the composite energy score of both E2T and TRT are computed and utilized to determine the optimal types for (entity, entity type ?) incomplete assertions.
The experimental results on real-world datasets show that our composite model achieves significant and consistent improvement compared to all baselines in entity type prediction and achieves comparable performance in entity type classification. The main contributions of this work are concluded as follows:
- •
We propose a novel framework for inferring missing entity type instances in KGs by connecting entity type instances and global triple information and correspondingly present two effective mechanisms.
- •
Under these mechanisms, we propose two novelembedding-based models: one for predicting entity types given entities and another one to encode the interactions among entity types and relationships from KGs. A combination of both models are utilized to conduct entity type inference.
- •
We conduct empirical experiments on two real-worlddatasets for entity type prediction and classification, which demonstrate our model can successfully take into account global triple information to improve KG entity typing.
Section snippets
Related works
Entity typing is valuable for many NLP tasks, such as knowledge base population [22], question answering [4], etc. In recent years, researchers attempt to mine fine-grained entity types [23], [24], [25], [26] with external text information, such as web search query logs [27], the textual surface patterns [28], context representation [29], Wikipedia [22]. Moniruzzaman et al. [30] improves performance in fine-grained type inference by leveraging domain knowledge and utilizing additional data
Embedding-based framework
We consider a KG containing entity type instances of the form ( is the training set consists of lots of (entity, entity type) assertions), where ( is the set of all entities) is an entity in the KG with the type ( is the set of all types). For example, could be Barack Obama and could be /people/person. As a single entity can have multiple types, entities in KG often miss some of their types. The aim of this work is to infer missing entity type instances in KGs.
Our work
Experiments
We present the experiments of entity type completion and entity type classification to demonstrate the effectiveness of our proposed model.
Conclusion and future work
In this paper we described a framework for leveraging global triple knowledge to improve KG entity typing by training not only on (entity, entity type) assertions but also using all newly generated (head type, relationship, tail type) type triples. Specifically, we propose two novel embedding-based models to encode entity type instances and type triples respectively. The connection of both models is utilized to infer missing entity type instances. The empirical experiments demonstrate the
CRediT authorship contribution statement
Yu Zhao: Conceptualization, Methodology, Wring-orininal draft, Writing-review&editing. Anxiang Zhang: Validation, Wring-orininal draft. Huali Feng: Writing-review&editing. Qing Li: Supervision. Patrick Gallinari: Supervision. Fuji Ren: Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant No. 61906159; the Sichuan Science and Technology Program, China under Grant No. 2018JY0607; the Fundamental Research Funds for the Central Universities, China under Grant No. JBK2003008; the Financial Intelligence and Financial Engineering Key Laboratory of Sichuan Province, China .
References (47)
- et al.
A neural model for type classification of entities for text
Knowl.-Based Syst.
(2019) - et al.
Low-rank local tangent space embedding for subspace clustering
Inform. Sci.
(2020) - et al.
Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering
Inform. Sci.
(2020) - et al.
Mining weighted subgraphs in a single large graph
Inform. Sci.
(2020) - K. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: A collaboratively created graph database for...
- X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, W. Zhang, Knowledge vault: A...
- F.M. Suchanek, G. Kasneci, G. Weikum, Yago: a core of semantic knowledge, in: Proceedings of WWW, 2007, pp....
- H. Elsahar, C. Gravier, F. Laforest, Zero-shot question generation from knowledge graphs for unseen predicates and...
- J. Berant, A. Chou, R. Frostig, P. Liang, Semantic parsing on freebase from question-answer pairs, in: Proceedings of...
- D. Zeng, K. Liu, S. Lai, G. Zhou, J. Zhao, Relation classification via convolutional deep neural network, in:...
Cited by (8)
Pronounce differently, mean differently: A multi-tagging-scheme learning method for Chinese NER integrated with lexicon and phonetic features
2022, Information Processing and ManagementCitation Excerpt :As a task of extracting structured information from unstructured text, Named Entity Recognition (NER), which aims to identify entity boundaries and types, plays an important role in many natural language processing (NLP) downstream tasks, like knowledge graph construction (Fei, Ren, & Ji, 2020; Huang et al., 2021; Shen, Ding, Zheng, Li, & Yang, 2021; Zhang, Li, Liu, & Xiong, 2020; Zhao et al., 2020), text summarization (Cao, Wei, Li, & Li, 2018; Horst et al., 2020), and the pre-trained language model (Liu et al., 2020; Peng, Zheng, Cai, Wang, Xie, & Li, 2021).
GLANet: temporal knowledge graph completion based on global and local information-aware network
2023, Applied IntelligenceConnecting Embeddings Based on Multiplex Relational Graph Attention Networks for Knowledge Graph Entity Typing
2023, IEEE Transactions on Knowledge and Data EngineeringApplication of knowledge graph in power system fault diagnosis and disposal: A critical review and perspectives
2022, Frontiers in Energy Research