Path-based reasoning approach for knowledge graph completion using CNN-BiLSTM with attention mechanism
Introduction
Knowledge graphs (KGs), such as Freebase, WordNet, or NELL, are valuable resources for building intelligent systems such as question answering or recommendation systems. These KGs contain millions of facts about real-world entities and relations in the form of triples, e.g., (Bill Gates, founded, Microsoft). Additionally, a large amount of missing relations (triples) exists between the entities in KGs. To effectively use KGs for other applications, one must perform a KG completion (KGC) task and infer missing links or triples. The basic idea of a KGC task is to automatically infer missing triples by utilizing existing triples. In recent years, the embedding methods that translate entities and relations into a low-dimensional space have achieved great results on KGC tasks. However, most embedding methods only consider the direct relations between entities and overlook the presence of paths. Previously, path ranking algorithms (PRAs), such as those proposed by Lao, Mitchell, and Cohen (2011) and Gardner and Mitchell (2015), have shown that the relation paths that consist of the relation types between two entities can be effectively used for KGC. Such methods perform random walks over a graph and construct a feature matrix by enumerating the paths between all entities (entity pairs) given a candidate relation. Then, a binary classification method, such as logistic regression or decision tree, is trained on the feature matrix to infer missing links. In recent years, path-based reasoning methods (Das, Dhuliawala, Zaheer, Vilnis, Durugkar, Krishnamurthy, et al., 2018, Das, Neelakantan, Belanger, McCallum, 2017, Nickel, Tresp, Kriegel, 2011, Xiaotian, Quan, Baoyuan, Yongqin, Peng, Bin, 2017) have successfully applied recurrent neural networks (RNNs) to KGC tasks by embedding reasoning paths into a low-dimensional space and have shown significant improvements over PRA methods. The idea behind these path-reasoning approaches is that the semantic of a relation between entities can be represented by the semantic of multiple paths that connect the entities. Therefore, the missing relations between two entities can be inferred by learning the paths that connect the entities. However, these reasoning methods train a simple RNN, whereas highly efficient methods on sequence data processing are available and exhibit better performance compared to RNNs. Most of these methods use max-pooling or mean operations to combine multiple paths and neglect the fact that each path provides different reasoning evidence. In fact, an individual path, such as frequently does not provide any indication of a semantic relationship between entities s and t.
In this paper, we propose a new attention-based approach for KGC that couples a convolutional neural network (CNN) with a bidirectional long a short-term memory (BiLSTM) module. First, given a candidate relation and two entities, our method encodes multiple reasoning paths between the entities into low-dimensional embeddings using the CNN followed by the BiLSTM module. Second, we assume that not all paths between two entities equally contribute to inferring the missing relation between the entities. To this end, an attention mechanism (Bahdanau, Cho, Bengio, 2015, Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, et al., 2017) is applied to capture the semantic correlation between a candidate relation and each path between two entities and to generate a vector representation for all paths between the entities. The paths of varying lengths that connect entities are encoded into a fixed-length real-valued vector. Finally, the summation of the relation embedding and the vector representation of the paths is passed through a fully-connected layer to predict if two entities should be connected by the candidate relation. The principle behind our method is that the CNN extracts the local features in the path and the BiLSTM network uses the ordering of the local features to learn about the entities and the relation orderings of each path. Finally, the attention layer extracts reasoning evidence from the paths that are correlated to the candidate relation. The attention mechanism in our model is identical to that of Xiaotian et al. (2017). The only difference is that instead of computing the dot product of the target relation and weighted path vectors, we applied the additive attention function using a feedforward network that scales well to smaller values. Dot-product attention is faster and space-efficient but in a few cases, requires an additional scaling factor to compute correct attention weights, which was not implemented in the previous study.
Furthermore, when the search space in path representation is very large, combining all paths does not provide sufficient evidence to make an inference about the relationship between entities. Therefore, to narrow down the search in a continuous space, we suggest using multiple steps of reasoning. Owing to this issue, we extend our model to perform multistep reasoning over path distribution. We adopt an RNN-like multihop (Sukhbaatar, Szlam, Weston, & Fergus, 2015) reasoning network that enables the model to read the embeddings of the same paths multiple times and update the encoding vector at each step before producing the final output. Through experiments, we demonstrate that multistep reasoning over path distribution can significantly improve the reasoning performance of KGC tasks. Moreover, our model is collectively trained end-to-end with gradient descent for all candidate relations.
In experiments, we perform link prediction tasks on four different KGs, i.e., NELL, Freebase, Kinship, and Countries. We compare our method with the most recent state-of-the art methods of path-based reasoning using various measures. For link prediction tasks, given test triples, we replace the source or target entity for each test triple with random entities and measure the rank of the corrupted set in terms of the original triple using each method. We further visualize multiple reasoning paths and observe that the paths that connect similar entity pairs are closely clustered together. Empirically, we show that our approach achieves comparable results with previous methods and exhibits better performance in a few cases.
Section snippets
Related work
This section reviews previous studies on KGC tasks. Previous works are broadly divided into two categories, i.e., path-based reasoning and KG embedding. KG embedding predicts missing links by applying low-dimensional embedding approaches to KGs (Bordes, Usunier, Garcia-Duran, Weston, Yakhnenko, 2013, Nickel, Murphy, Tresp, Gabrilovich, 2015, Nickel, Tresp, Kriegel, 2011, Wang, Mao, Wang, Guo, 2017). The key idea of embedding-based KGC is to represent entities and relations as low-dimensional
Method
In this section, we present our approach for KGC via link prediction tasks, which aim to predict missing links in a graph. An overview of the approach is shown in Figs. 1 and 2. First, we briefly review the problem of KGC and the PRA and describe how we obtain paths. Then, we introduce the CNN and BiLSTM modules, which embed relational paths into a low-dimensional space and combine those paths using an attention module according to a query relation. Then, we describe the RNN, which performs
Experiments
We evaluate our model on link prediction tasks and report the results on four different KGs. The statistics of the graph datasets are presented in Table 1. The hyperparameters of our model that result in the best performance on the development set are selected via a small grid search. Several measures are adopted to quantitatively evaluate our model, including F1, mean average precision (MAP), and mean reciprocal rank (MRR). MAP is the average of precision values at the ranks where relevant
Conclusion
In this paper, we propose a new approach for KGC that combines BiLSTM and CNN modules with an attention mechanism. Given a candidate relation and two entities, we encode the paths that connect the entities into a low-dimensional space using a convolutional operation followed by BiLSTM. Then, an attention layer is applied to combine multiple paths efficiently. We further extend our model to perform multistep reasoning over path representations in an embedding space. Compared to other models, our
Acknowledgement
This work was supported by Institute of Information & communications Technology Planning & evaluation (IITP) grant funded by the Korea government (MSIT) (2019000067, Semantic Analysis Reasoning Methods for Automatic Completion of Large Scale Knowledge Graph).
CRediT authorship contribution statement
Batselem Jagvaral: Conceptualization, Data curation, Writing - original draft, Writing - review & editing. Wan-Kon Lee: Writing - review & editing. Jae-Seung Roh: Data curation. Min-Sung Kim: Data curation. Young-Tack Park: Conceptualization, Writing - original draft, Writing - review & editing.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (30)
- et al.
Neural machine translation by jointly learning to align and translate
(2015) - et al.
Translating embeddings for modeling multi-relational data
Advances in neural information processing systems 26
(2013) - et al.
On approximate reasoning capabilities of low-rank vector spaces
AAAI spring symposia
(2015) - et al.
Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning
International conference on learning representations (ICLR)
(2018) - et al.
Chains of reasoning over entities, relations, and text using recurrent neural networks
Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: Volume 1, Long papers
(2017) - et al.
Convolutional 2D knowledge graph embeddings
AAAI
(2018) - et al.
Multi-step reasoning via recurrent dual attention for visual dialog
(2019) - et al.
Efficient and expressive knowledge base completion using subgraph feature extraction
Proceedings of the 2015 conference on empirical methods in natural language processing
(2015) - et al.
Improving learning and inference in a large knowledge-base using latent syntactic cues
Proceedings of the 2013 conference on empirical methods in natural language processing
(2013) - et al.
Long short-term memory
Neural Computation
(1997)
Statistical Predicate Invention
Proceedings of the 24th international conference on machine learning
Relational retrieval using a combination of path-constrained random walks
Machine Learning
Random walk inference and learning in a large scale knowledge base
Proceedings of the Conference on Empirical Methods in Natural Language Processing
Modeling relation paths for representation learning of knowledge bases
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
Viualizing data using t-SNE
Journal of Machine Learning Research
Cited by (53)
Improving flight delays prediction by developing attention-based bidirectional LSTM network
2024, Expert Systems with ApplicationsKnowledge graph relation reasoning with variational reinforcement network
2023, Information FusionKnowledge graph completion method based on quantum embedding and quaternion interaction enhancement
2023, Information SciencesRLAT: Multi-hop temporal knowledge graph reasoning based on Reinforcement Learning and Attention Mechanism
2023, Knowledge-Based SystemsResearcher influence prediction (ResIP) using academic genealogy network
2023, Journal of InformetricsBiLSTM deep neural network model for imbalanced medical data of IoT systems
2023, Future Generation Computer SystemsCitation Excerpt :As a result improved mechanism was able to work on life prediction data. Idea proposed in [9] was based on attention mechanism for CNN for path reasoning from the input data by BiLSTM cooperating with graph based network. Another interesting hybrid model was wave height forecasting for Australian coast line proposed in [10].