Elsevier

Future Generation Computer Systems

Volume 91, February 2019, Pages 426-433
Future Generation Computer Systems

Knowledge graph embedding via reasoning over entities, relations, and text

https://doi.org/10.1016/j.future.2018.09.040Get rights and content

Highlights

  • Construction of a uniform graph for better knowledge inference.

  • Effectiveness of an enhanced kind of LSTM for modeling multiple-step paths.

  • Impacts of attention mechanism in multiple path learning.

Abstract

Knowledge graph embedding has attracted significant research interest in the field of intelligent web, which aims to embed both entities and relations into a low-dimensional space. In particular, there are two fundamentally different kinds of models, latent feature models and graph feature models, to infer new predictions in the graph. Latent feature models are expert at using latent features of entities to explain triples and infer these features automatically from the data, while graph feature models are do well in extracting features from the observable graph patterns. Combining the strengths of these two fundamental models is a promising approach to increase the predictive performance of graph models. Thus, we propose a new combined model, named as Text-enhanced Knowledge Graph Embedding (TKGE), to perform inference over entities, relations, and text. The model is not only well-suited for modeling interactions of their latent features, but also well-suited for modeling paths between entities in the graph. Experimental results show that TKGE has significant improvements compared to baselines on two tasks: knowledge graph completion and triple classification.

Introduction

Knowledge graphs model factual information in the form of entities and relations between them to semantically represent the world’s truth. Recently, a large number of knowledge graphs have been created, including YAGO [1], DBpedia [2], NELL [3], Freebase [4], and the Google Knowledge Graph [5]. This motivates us to study statistical models to predict new facts about the world given existing facts in the knowledge graph. Most commonly, models now available can roughly be categorized into two fundamentally different classes: latent feature models and graph feature models. Latent feature models derive the relationships between entities from interactions of their latent features, while graph feature models are computationally efficient if triples can be explained from the neighborhood of entities or from the knowledge inference over paths between entities in the graph. Both models focus on different aspects of knowledge graphs. And it has been observed experimentally that the strengths of latent and graph-based models are often complementary [see e.g. [6]]. Thus, many researchers combine them to get improved modeling power [7], [8], [9], [10]. The combined model proposed by [11] stays state-of-art performance.

However, previous combined models have not fully exploited the potential of the knowledge base since they suffer from the following limitations: (1) They only consider relations but not the entities that form the nodes of the paths to predict new facts. Ignoring entities leads to frequent errors. (2) They take only a single path as a basis in modeling correlations given the entity pair. However, multiple paths can provide further scientific basis for a prediction, as illustrated in Fig. 1. (3) They do not leverage the textual relations extracted from a text corpus to reduce the sparseness of knowledge graphs.

To address the above issues, we propose a novel combined model to learn knowledge graph embeddings from latent features and observable graph patterns. First, We take advantage of Long Short Term Memory (LSTM)’s capability of handling long-term dependencies to efficiently model arbitrary-length paths, and propose an enhance kind of LSTM to model the semantic meaning of a multiple-step path given the entity pair, which incorporates the merits of entities and relations presented in the multiple-step path. Then, we integrate an attention-based multiple-instance learning method to learn a comprehensive semantic representation of correlations between the entity pair (semantic relation), which incorporates the overall semantic meanings of all important paths given the entity pair. Moreover, to reduce the sparseness of knowledge graphs, we construct a uniform graph from a large text corpus and a Knowledge base for inference to fill in disconnected gaps over knowledge bases, and make full use of rich relation textual information of relations to augment knowledge inference over the uniform graph. For example, if there is no path connecting two entities in the graph, it is impossible for inference to discover any semantic relation between entities. Textual relations can solve this problem well, as illustrated in Fig. 2, there is no direct relation or multiple-step path between MarcusJordan and America, thereby making it impossible for inference to discover any relation between MarcusJordan and America. However, by using the textual relation SecondKid we could predict a new relation (MarcusJordan,Nationality,America). Finally, we propose a model, Text-enhanced Knowledge Graph Embedding (TKGE), to model both latent features and observable patterns in the unified graph by combining the strengths of the classical latent feature model TransE [12] and LSTM.

The key contributions of this paper are summarized as the following:

  • (1)

    We propose an enhance kind of LSTM to model the semantic meaning of a path given the entity pair, which incorporates the merits of entities and relations presented in the path, and leverage an attention-based multiple-instance learning method to model the semantic relation given the entity pair.

  • (2)

    By performing the TransE and the enhance kind of LSTM jointly, our model can learn superior knowledge graph embedding from latent features and observable patterns in the unified knowledge graph.

  • (3)

    We evaluate the TKGE model on benchmark datasets of Freebase and ClueWeb text corpus with knowledge graph completion and triple classification. Experimental results show that TKGE has significant improvements compared to baselines on both tasks.

The rest of this paper is organized as follows. Section 2 gives a brief review of related work. Section 3 reveals our proposed model in detail. In Section 4, we introduces the dataset and experiment settings. The empirical results are discussed in Section 5, followed by a conclusion of this paper.

Section snippets

Latent feature models

Latent feature models explain triples via latent features of entities. RESCAL [13] is a relational latent feature model which explains triples via pairwise interactions of latent features and captures all interactions between entities via tensor factorization. However, RESCAL requires a lot of parameters to model all pairwise interactions when the number of latent features are large. Multi-layer Perceptrons (MLPs) [10] proposed alternative ways to create composite triple representations and

The proposed approach

In this section, we present the details of how we reason over entities, relations and texts to learn superior knowledge graph embeddings.

Datasets and experiment settings

Knowledge base completion

Knowledge base completion aims to complete the triple (h,r,t) when one of h, r, t is missing. We divide the stage into two tasks: entity prediction and relation prediction.

Conclusion

In this paper, we propose a novel combined model, Text-enhanced Knowledge Graph Embedding (TKGE) model, to learn superior representation of entities and relations from latent features and observable patterns. We expand the structure of the knowledge graph by regarding textual relations same as KB relations, which increases the connectivity of knowledge graph. Moreover, We propose an enhance kind of LSTM to model the semantic meaning of an arbitrary-length path given the entity pair, which

Acknowledgment

The work is supported by National Key Research and Development Program of China (No. 2017YFD0700102).

Binling Nie is a Ph.D. candidate in the College of Computer Science and Technology at Zhejiang University, Hangzhou, China. She received her B.S. degree in 2014 from the College of Computer Science and Technology at Beijing University of Chemical Technology. Her research interests include knowledge representation and inference.

References (35)

  • F.M. Suchanek, G. Kasneci, G. Weikum, Yago: a core of semantic knowledge, in: International World Wide Web Conferences,...
  • S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, Dbpedia: a nucleus for a web of open data, 2007, pp....
  • A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E.R. Hruschka, T.M. Mitchell, Toward an architecture for never-ending...
  • K.D. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: a collaboratively created graph database for...
  • A. Singhal, Introducing the knowledge graph: things, not strings, Official google...
  • K. Toutanova, D. Chen, Observed versus latent features for knowledge base and text inference, in: Proceedings of the...
  • M. Nickel, X. Jiang, V. Tresp, Reducing the rank in relational factorization models by including observable patterns,...
  • KorenY.

    Factorization meets the neighborhood: a multifaceted collaborative filtering model

  • RendleS.

    Factorization machines with libfm

    ACM Trans. Intell. Syst. Technol. (TIST)

    (2012)
  • DongX. et al.

    Knowledge vault: A web-scale approach to probabilistic knowledge fusion

  • Y. Lin, Z. Liu, H. Luan, M. Sun, S. Rao, S. Liu, Modeling relation paths for representation learning of knowledge...
  • A. Bordes, N. Usunier, A. Garciaduran, J. Weston, O. Yakhnenko, Translating embeddings for modeling multi-relational...
  • NickelM.

    Tensor Factorization for Relational Learning

    (2013)
  • R. Socher, D. Chen, C.D. Manning, A. Ng, Reasoning with neural tensor networks for knowledge base completion, in:...
  • HoffP.D. et al.

    Latent space approaches to social network analysis

    J. Amer. Statist. Assoc.

    (2002)
  • A. Bordes, J. Weston, R. Collobert, Y. Bengio, Learning structured embeddings of knowledge bases, in: Conference on...
  • Y. Lin, Z. Liu, M. Sun, Y. Liu, X. Zhu, Learning entity and relation embeddings for knowledge graph completion. in:...
  • Cited by (34)

    • Knowledge discovery using an enhanced latent Dirichlet allocation-based clustering method for solving on-site assembly problems

      2022, Robotics and Computer-Integrated Manufacturing
      Citation Excerpt :

      Kumar and Santhosh [15] proposed an information retrieval and feature minimization approach for semantic web data, which combines feature extraction and feature selection techniques. Nie and Sun [16] developed a text-enhanced knowledge graph embedding approach to perform inference over entities, relations, and text, which combined latent feature and graph feature models to increase prediction performance. To transform unstructured text into a formal representation, Martinez-Rodriguez, Lopez-Arevalo, and Rios-Alvaradowe [17] proposed a method to generate knowledge graphs using binary relations produced by an open information extraction approach.

    • An ontology-based multi-domain model in social network analysis: Experimental validation and case study

      2020, Information Sciences
      Citation Excerpt :

      For example, if you want to analyze changes in behavior in a group of people over time, it is common to request that they repeat the same survey after a few months, so that, when dealing with this data, any other problems may be seen, together with the difficulty in processing the data and comparing the same population at different times [21]. It is true that digital social networks are on the rise and more and more techniques related to artificial intelligence (data curation, data mining, natural language processing, etc.) are being applied [22–27]. However, in the field of social networking within the context that we have introduced in this article, there is not much scientific literature that has tried to solve or improve existing software systems by adding knowledge engineering.

    • Social network analysis for personalized characterization and risk assessment of alcohol use disorders in adolescents using semantic technologies

      2020, Future Generation Computer Systems
      Citation Excerpt :

      In order to achieve these objectives, techniques from semantic knowledge representation have been applied to build the application described. These techniques allow the construction of conceptual structures, called ontologies [18], that capture and connect the knowledge of the different domains involved (personal data, relationships, alcohol consumption terms, SNA techniques, psychosocial health care terminology, etc.) [19,20] and allow the computer to handle them at a high level of abstraction. This kind of development needs the close collaboration of psychosocial healthcare professionals with knowledge of alcohol use disorders, experts from social network analysis domains, knowledge engineers and graphical user interface designers and programmers.

    View all citing articles on Scopus

    Binling Nie is a Ph.D. candidate in the College of Computer Science and Technology at Zhejiang University, Hangzhou, China. She received her B.S. degree in 2014 from the College of Computer Science and Technology at Beijing University of Chemical Technology. Her research interests include knowledge representation and inference.

    Shouqian Sun received his B.S. degree and his Ph.D. degree in Computer Science in 1996 from Zhejiang University. Currently, he is a professor in the Department of Computer Science and Engineering, Zhejiang University, Hangzhou, China. His research interests include computer graphics and CAD, knowledge graph, and virtual reality.

    View full text