ADRL: An attention-based deep reinforcement learning framework for knowledge graph reasoning
Introduction
Relational reasoning [1] is the basic problem of statistical relational learning methodology research, and it is also a hot issue in the field of the knowledge graph. The knowledge graph is a knowledge base [2], [3] of graph-like structures in which knowledge is expressed and kept in the form of “entity-relation-entity” triples [4]. The object of this paper is the relational reasoning problem on the knowledge graph, and the task objective of relational reasoning is to use the existing knowledge in the knowledge graph to infer the possible relation between entities using statistical machine learning methods or deep learning methods [5], [6], [7].
Relational reasoning is also known as entity prediction, relational prediction or knowledge graph completion [8], which aims to infer the missing element from the existing elements in a given triple to form a complete valid triple [9], [10]. The reasoning models used in the mainstream current knowledge base include the latent factor model [11], [12] and the random walk model [13]. The former maps entities and relations into a low-dimensional real vector space, and implements reasoning through vector similarity calculation [14]. The latter is built on first-order predicate logic [15] for relational reasoning between entities, and the algorithm complexity is reduced by a random algorithm. In comparison, the former has higher computational complexity due to the need for large-scale matrix operations [16], while the latter uses the random sampling method, which makes it difficult to fully utilize the existing structural information in the knowledge base, resulting in a lower recall rat e [17].
In recent years, deep learning (DL) [18], [19] and reinforcement learning (RL) [20], as principal research hotspots in the field of machine learning, has achieved remarkable success in many areas of artificial intelligence [21], [22]. The DL approach focuses on the perception and expression of things and the basic idea is to combine the underlying features through a multi-layered network structure and nonlinear transformation to form an abstract, easily distinguishable high-level representation to discover the distributed feature representation of the data [23], [24]. While the RL method is more focused on learning strategies of solving problems, the basic idea of RL [25] is to learn the optimal strategy for accomplishing the goal by maximizing the cumulative reward value that the agent obtains from the environment. In more and more complex real-world tasks, the DL is needed to automatically learn the abstract representation of large-scale input data so that the RL self-motivate is based on this representation to optimize the problem-solving strategy [26]. Taking advantage of DL and RL can form a new research hotspot in the field of artificial intelligence, that is, deep reinforcement learning (DRL). [27], [28], [29]. Inferring and complementing is equivalent to querying and finding answers which require access to many nodes and edges in the knowledge graph, so such a process of finding answers can be modeled as a serialized decision problem, which can naturally be solved with deep reinforcement learning [30], [31].
In this work, we propose a novel relational reinforcement learning framework that advocates mapping relation states, actions, and policies to a low-dimensional space using deep learning architectures [32], so that the generalization of deep learning combines the inferential ability of reinforcement learning to solve searching and Q&A problems [33]. Our approach advocates using a learnable, reusable, entity- and relation-centric function to implicitly reason about relations, and we employ a deep RL agent with architectural inductive biases that may be better suited to learning (and computing) relations.
Our framework has a lot of desirable features. First, we are effective in path-based computations because the intricate operations of ranking all entities in the knowledge graph are avoided by searching in small neighborhoods around the entity. Second, we employ the self-attention [34], [35] to infer the weight of entities and relations sharing with the RL, and ADRL does not require pre-training, supervision, and fine-tuning, but rather training the knowledge graph from searching via reinforcement learning. Third, we use the actor–critic algorithm [36] in reinforcement learning to effectively increase the efficiency of the model. It does not require a complete trajectory and is a model-free algorithm. The actor generates trajectories, and the critic learns policies and value functions to take a position on the actors’ actions so that the model can learn better policies. Finally, paths and trajectories that our agents explored automatically generate the resources for the model to continue exploring.
Our contributions in this paper are as follows:
- •
We propose a new network architecture based on deep reinforcement learning for knowledge graph reasoning, which can improve the efficiency and interpretability of traditional methods through structured perception and relational reasoning.
- •
We design a relational module that can be viewed as a universal plug-in for an inference framework, where the self-attention mechanism introduced iteratively infer the relations between entities to guide a model-free policy, which is more conducive for agents to infer relational paths compared to previous work.
- •
We employ the actor–critic architecture to effectively solve the problem of reward sparsity, where the reward depends on the value function, and finally, it will be trained and optimized at the same time as the policy. Besides, we adopt an actor–critic set-up, using a distributed agent-based on asynchronous advantage actor–critic (A3C) algorithm, therefore multi-thread agents will enhance efficiency and ability to be more suitable for large-scale knowledge graphs.
Section snippets
Related work
In the early stage of the development of statistical relational learning methods, the main relational reasoning is based on first-order predicate logic rules [37]. The representative work is the Markov Logic Networks proposed by [32], which combines Markov random field with first-order predicate logic rules, and builds a logical network to model and reason the relation between entities. Its principal advantage is the high accuracy of relational reasoning. The disadvantage is the fact that it
Methodology
In this section, we describe in detail our Attention-based deep reinforcement learning framework(ADRL) for multi-hop relational reasoning. The core idea of Relational reinforcement learning [58], [59], [60] is to integrate reinforcement learning with Inductive Logic Programming or relational learning via representing actions, states and policies by using the first-order language. The relational language facilitates the use of background knowledge, which can be provided by rules and logical
Experiments
In this section, we first describe the data sets we use in our experiments and the parameter settings in model training. Then we designed a series of experiments to prove the validity and efficiency of our model, which outperformed the traditional embedding-based method, the PRA method and the recent method based on reinforcement learning.
Conclusion and future work
In this paper, we propose a novel attention-based deep reinforcement learning framework that incorporates a message passing mechanism and inductive bias for relational reasoning on knowledge graphs. We combine CNN and LSTM to encode the knowledge graphs and remember the histories, introduce relational module contains self-attention to infer the weight of the entities the agent walk on. That facilitates better decision making and can be shared with the actor–critic algorithm which is a
CRediT authorship contribution statement
Yongsheng Hao: Writing review & editing. Jie Cao: Funding acquisition. Other are done by Qi Wang.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The work was partly supported by the National Social Science Foundation of China (No. 16ZDA054), and the National Science Foundation of China (No. U1736105).
Qi Wang received the B.S. and M.Eng. degree in Software engineering from Jilin University (Changchun city) and Central South University (Changsha city) in 2012 and 2016, China, respectively. He is currently pursuing the PH.D. degree in Software engineering at the school of computer science, Fudan University, Shanghai, China.
His current research interests include knowledge graph, deep learning, and reinforcement learning.
References (71)
A study of graph-based system for multi-view clustering
Knowl.-Based Syst.
(2019)Adaptive and large-scale service composition based on deep reinforcement learning
Knowl.-Based Syst.
(2019)- et al.
A proactive decision support method based on deep reinforcement learning and state partition
Knowl.-Based Syst.
(2018) A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning
Knowl.-Based Syst.
(2019)- et al.
Inductive logic programming: Theory and methods
J. Log. Program.
(1994) A review of relational machine learning for knowledge graphs
Proc. IEEE
(2015)Knowledge Graphs: Representation and Structuring of Scientific Knowledge
(1987)Freebase: a collaboratively created graph database for structuring human knowledge
- et al.
Yago: a core of semantic knowledge
- Andrew Carlson, et al. Toward an architecture for never-ending language learning, in: Twenty-Fourth AAA Conference on...
A three-way model for collective learning on multi-relational data
A simple neural network module for relational reasoning
A semantic matching energy function for learning with multi-relational data
Reasoning with neural tensor networks for knowledge base completion
Translating embeddings for modeling multi-relational data
Random walk inference and learning in a large scale knowledge base
Translating embeddings for modeling multi-relational data
Differentiable learning of logical rules for knowledge base reasoning
Learning symmetric collaborative dialogue agents with dynamic knowledge graph embeddings
Deep neural networks for acoustic modeling in speech recognition
IEEE Signal Process. Mag.
Deep learning
Nature
A study on overfitting in deep reinforcement learning
Mastering the game of go with deep neural networks and tree search
Nature
Mastering the game of go without human knowledge
Nature
A simple neural network module for relational reasoning
Relational reinforcement learning
Mach. Learn.
Reinforcement Learning: An Introduction
Human-level control through deep reinforcement learning
Nature
Relational representations in reinforcement learning: Review and open problems
Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering
Inform. Sci.
Graph attention networks
Cited by (48)
Quartet: A Query Aware Database Adaptive Compilation Decision System[Formula presented]
2024, Expert Systems with ApplicationsA Systematic Literature Review of Reinforcement Learning-based Knowledge Graph Research
2024, Expert Systems with ApplicationsAttention-based exploitation and exploration strategy for multi-hop knowledge graph reasoning
2024, Information SciencesOnline attentive kernel-based temporal difference learning
2023, Knowledge-Based SystemsPath-based multi-hop reasoning over knowledge graph for answering questions via adversarial reinforcement learning
2023, Knowledge-Based SystemsRLAT: Multi-hop temporal knowledge graph reasoning based on Reinforcement Learning and Attention Mechanism
2023, Knowledge-Based Systems
Qi Wang received the B.S. and M.Eng. degree in Software engineering from Jilin University (Changchun city) and Central South University (Changsha city) in 2012 and 2016, China, respectively. He is currently pursuing the PH.D. degree in Software engineering at the school of computer science, Fudan University, Shanghai, China.
His current research interests include knowledge graph, deep learning, and reinforcement learning.
Yongsheng Hao received his MS Degree of Engineering from Qingdao University in 2008. Now, he is a senior engineer of Network Center, Nanjing University of Information Science & Technology. His current research interests include distributed and parallel computing, mobile computing, Grid computing, web Service, particle swarm optimization algorithm and genetic algorithm. He has published more than 30 papers in international conferences and journals.
Jie Cao received the Ph.D. degree from Southeast University, Nanjing, China, in 2005. He was an Associate Professor, from 1999 to 2006. From 2006 to 2009, he was a Postdoctoral Fellow of the Academy of Mathematics and Systems Science, Chinese Academy of Science. From 2009 to 2019, he was a Professor with the School of Management and Economics, Nanjing University of Information Science and Technology. Since May 2019, he has been the Vice President of the Xuzhou University of Technology. His research interests include system engineering, and management science and technology.