Path-based reasoning approach for knowledge graph completion using CNN-BiLSTM with attention mechanism

https://doi.org/10.1016/j.eswa.2019.112960Get rights and content

Highlights

  • Coupled CNN and BiLSTM model accurately encoded paths for knowledge graph completion.

  • Combining embeddings of paths between two entities exhibits the semantic relation between the entities.

  • Multistep reasoning efficiently predicts the missing links between two entities.

  • Attention based CNN-BiLSTM performs better than the recent state-of-the-art path-reasoning methods.

Abstract

Knowledge graphs are valuable resources for building intelligent systems such as question answering or recommendation systems. However, most knowledge graphs are impaired by missing relationships between entities. Embedding methods that translate entities and relations into a low-dimensional space achieve great results, but they only focus on the direct relations between entities and neglect the presence of path relations in graphs. On the contrary, path-based embedding methods consider a single path to make inferences. It also relies on simple recurrent neural networks while highly efficient neural network models are available for processing sequence data. We propose a new approach for knowledge graph completion that combines bidirectional long short-term memory (BiLSTM) and convolutional neural network modules with an attention mechanism. Given a candidate relation and two entities, we encode paths that connect the entities into a low-dimensional space using a convolutional operation followed by BiLSTM. Then, an attention layer is applied to capture the semantic correlation between a candidate relation and each path between two entities and attentively extract reasoning evidence from the representation of multiple paths to predict whether the entities should be connected by the candidate relation. We extend our model to perform multistep reasoning over path representations in an embedding space. A recurrent neural network is designed to repeatedly interact with an attention module to derive logical inference from the representation of multiple paths. We perform link prediction tasks on several knowledge graphs and show that our method achieves better performance compared with recent state-of-the-art path-reasoning methods.

Introduction

Knowledge graphs (KGs), such as Freebase, WordNet, or NELL, are valuable resources for building intelligent systems such as question answering or recommendation systems. These KGs contain millions of facts about real-world entities and relations in the form of triples, e.g., (Bill Gates, founded, Microsoft). Additionally, a large amount of missing relations (triples) exists between the entities in KGs. To effectively use KGs for other applications, one must perform a KG completion (KGC) task and infer missing links or triples. The basic idea of a KGC task is to automatically infer missing triples by utilizing existing triples. In recent years, the embedding methods that translate entities and relations into a low-dimensional space have achieved great results on KGC tasks. However, most embedding methods only consider the direct relations between entities and overlook the presence of paths. Previously, path ranking algorithms (PRAs), such as those proposed by Lao, Mitchell, and Cohen (2011) and Gardner and Mitchell (2015), have shown that the relation paths that consist of the relation types between two entities can be effectively used for KGC. Such methods perform random walks over a graph and construct a feature matrix by enumerating the paths between all entities (entity pairs) given a candidate relation. Then, a binary classification method, such as logistic regression or decision tree, is trained on the feature matrix to infer missing links. In recent years, path-based reasoning methods (Das, Dhuliawala, Zaheer, Vilnis, Durugkar, Krishnamurthy, et al., 2018, Das, Neelakantan, Belanger, McCallum, 2017, Nickel, Tresp, Kriegel, 2011, Xiaotian, Quan, Baoyuan, Yongqin, Peng, Bin, 2017) have successfully applied recurrent neural networks (RNNs) to KGC tasks by embedding reasoning paths into a low-dimensional space and have shown significant improvements over PRA methods. The idea behind these path-reasoning approaches is that the semantic of a relation between entities can be represented by the semantic of multiple paths that connect the entities. Therefore, the missing relations between two entities can be inferred by learning the paths that connect the entities. However, these reasoning methods train a simple RNN, whereas highly efficient methods on sequence data processing are available and exhibit better performance compared to RNNs. Most of these methods use max-pooling or mean operations to combine multiple paths and neglect the fact that each path provides different reasoning evidence. In fact, an individual path, such as (s,spouse,e)(e,bornIn,t), frequently does not provide any indication of a semantic relationship between entities s and t.

In this paper, we propose a new attention-based approach for KGC that couples a convolutional neural network (CNN) with a bidirectional long a short-term memory (BiLSTM) module. First, given a candidate relation and two entities, our method encodes multiple reasoning paths between the entities into low-dimensional embeddings using the CNN followed by the BiLSTM module. Second, we assume that not all paths between two entities equally contribute to inferring the missing relation between the entities. To this end, an attention mechanism (Bahdanau, Cho, Bengio, 2015, Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, et al., 2017) is applied to capture the semantic correlation between a candidate relation and each path between two entities and to generate a vector representation for all paths between the entities. The paths of varying lengths that connect entities are encoded into a fixed-length real-valued vector. Finally, the summation of the relation embedding and the vector representation of the paths is passed through a fully-connected layer to predict if two entities should be connected by the candidate relation. The principle behind our method is that the CNN extracts the local features in the path and the BiLSTM network uses the ordering of the local features to learn about the entities and the relation orderings of each path. Finally, the attention layer extracts reasoning evidence from the paths that are correlated to the candidate relation. The attention mechanism in our model is identical to that of Xiaotian et al. (2017). The only difference is that instead of computing the dot product of the target relation and weighted path vectors, we applied the additive attention function using a feedforward network that scales well to smaller values. Dot-product attention is faster and space-efficient but in a few cases, requires an additional scaling factor to compute correct attention weights, which was not implemented in the previous study.

Furthermore, when the search space in path representation is very large, combining all paths does not provide sufficient evidence to make an inference about the relationship between entities. Therefore, to narrow down the search in a continuous space, we suggest using multiple steps of reasoning. Owing to this issue, we extend our model to perform multistep reasoning over path distribution. We adopt an RNN-like multihop (Sukhbaatar, Szlam, Weston, & Fergus, 2015) reasoning network that enables the model to read the embeddings of the same paths multiple times and update the encoding vector at each step before producing the final output. Through experiments, we demonstrate that multistep reasoning over path distribution can significantly improve the reasoning performance of KGC tasks. Moreover, our model is collectively trained end-to-end with gradient descent for all candidate relations.

In experiments, we perform link prediction tasks on four different KGs, i.e., NELL, Freebase, Kinship, and Countries. We compare our method with the most recent state-of-the art methods of path-based reasoning using various measures. For link prediction tasks, given test triples, we replace the source or target entity for each test triple with random entities and measure the rank of the corrupted set in terms of the original triple using each method. We further visualize multiple reasoning paths and observe that the paths that connect similar entity pairs are closely clustered together. Empirically, we show that our approach achieves comparable results with previous methods and exhibits better performance in a few cases.

Section snippets

Related work

This section reviews previous studies on KGC tasks. Previous works are broadly divided into two categories, i.e., path-based reasoning and KG embedding. KG embedding predicts missing links by applying low-dimensional embedding approaches to KGs (Bordes, Usunier, Garcia-Duran, Weston, Yakhnenko, 2013, Nickel, Murphy, Tresp, Gabrilovich, 2015, Nickel, Tresp, Kriegel, 2011, Wang, Mao, Wang, Guo, 2017). The key idea of embedding-based KGC is to represent entities and relations as low-dimensional

Method

In this section, we present our approach for KGC via link prediction tasks, which aim to predict missing links in a graph. An overview of the approach is shown in Figs. 1 and 2. First, we briefly review the problem of KGC and the PRA and describe how we obtain paths. Then, we introduce the CNN and BiLSTM modules, which embed relational paths into a low-dimensional space and combine those paths using an attention module according to a query relation. Then, we describe the RNN, which performs

Experiments

We evaluate our model on link prediction tasks and report the results on four different KGs. The statistics of the graph datasets are presented in Table 1. The hyperparameters of our model that result in the best performance on the development set are selected via a small grid search. Several measures are adopted to quantitatively evaluate our model, including F1, mean average precision (MAP), and mean reciprocal rank (MRR). MAP is the average of precision values at the ranks where relevant

Conclusion

In this paper, we propose a new approach for KGC that combines BiLSTM and CNN modules with an attention mechanism. Given a candidate relation and two entities, we encode the paths that connect the entities into a low-dimensional space using a convolutional operation followed by BiLSTM. Then, an attention layer is applied to combine multiple paths efficiently. We further extend our model to perform multistep reasoning over path representations in an embedding space. Compared to other models, our

Acknowledgement

This work was supported by Institute of Information & communications Technology Planning & evaluation (IITP) grant funded by the Korea government (MSIT) (2019000067, Semantic Analysis Reasoning Methods for Automatic Completion of Large Scale Knowledge Graph).

CRediT authorship contribution statement

Batselem Jagvaral: Conceptualization, Data curation, Writing - original draft, Writing - review & editing. Wan-Kon Lee: Writing - review & editing. Jae-Seung Roh: Data curation. Min-Sung Kim: Data curation. Young-Tack Park: Conceptualization, Writing - original draft, Writing - review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (30)

  • D. Bahdanau et al.

    Neural machine translation by jointly learning to align and translate

    (2015)
  • A. Bordes et al.

    Translating embeddings for modeling multi-relational data

    Advances in neural information processing systems 26

    (2013)
  • G. Bouchard et al.

    On approximate reasoning capabilities of low-rank vector spaces

    AAAI spring symposia

    (2015)
  • R. Das et al.

    Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning

    International conference on learning representations (ICLR)

    (2018)
  • R. Das et al.

    Chains of reasoning over entities, relations, and text using recurrent neural networks

    Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: Volume 1, Long papers

    (2017)
  • T. Dettmers et al.

    Convolutional 2D knowledge graph embeddings

    AAAI

    (2018)
  • Z. Gan et al.

    Multi-step reasoning via recurrent dual attention for visual dialog

    (2019)
  • M. Gardner et al.

    Efficient and expressive knowledge base completion using subgraph feature extraction

    Proceedings of the 2015 conference on empirical methods in natural language processing

    (2015)
  • M. Gardner et al.

    Improving learning and inference in a large knowledge-base using latent syntactic cues

    Proceedings of the 2013 conference on empirical methods in natural language processing

    (2013)
  • S. Hochreiter et al.

    Long short-term memory

    Neural Computation

    (1997)
  • S. Kok et al.

    Statistical Predicate Invention

    Proceedings of the 24th international conference on machine learning

    (2007)
  • N. Lao et al.

    Relational retrieval using a combination of path-constrained random walks

    Machine Learning

    (2010)
  • N. Lao et al.

    Random walk inference and learning in a large scale knowledge base

    Proceedings of the Conference on Empirical Methods in Natural Language Processing

    (2011)
  • Y. Lin et al.

    Modeling relation paths for representation learning of knowledge bases

    Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

    (2015)
  • L. van der Maaten et al.

    Viualizing data using t-SNE

    Journal of Machine Learning Research

    (2008)
  • Cited by (53)

    • BiLSTM deep neural network model for imbalanced medical data of IoT systems

      2023, Future Generation Computer Systems
      Citation Excerpt :

      As a result improved mechanism was able to work on life prediction data. Idea proposed in [9] was based on attention mechanism for CNN for path reasoning from the input data by BiLSTM cooperating with graph based network. Another interesting hybrid model was wave height forecasting for Australian coast line proposed in [10].

    View all citing articles on Scopus
    View full text