Elsevier

Knowledge-Based Systems

Volume 242, 22 April 2022, 108221
Knowledge-Based Systems

Transfer reinforcement learning via meta-knowledge extraction using auto-pruned decision trees

https://doi.org/10.1016/j.knosys.2022.108221Get rights and content

Abstract

Transfer reinforcement learning (RL) has recently received increasing attention to make RL agents have better learning performance in target Markov decision problems (MDPs) by using the knowledge learned in source MDPs. However, it is still an open and challenging problem to improve the transfer capability and interpretability of RL algorithms. In this paper, we propose a novel transfer reinforcement learning approach via meta-knowledge extraction using auto-pruned decision trees. In source MDPs, pre-trained policies are firstly learned via RL algorithms using general function approximators. Then, a meta-knowledge extraction algorithm is designed with an auto-pruned decision tree model, where the meta-knowledge is learned by re-training the auto-pruned decision tree based on the data samples generated from the pre-trained policies. The state spaces of meta-knowledge are determined by estimating the uncertainty of state–action pairs in pre-trained policies based on the entropy value of leaf nodes. In target MDPs, according to whether the state is in the state set of meta-knowledge, a hybrid policy is generated by integrating the meta-knowledge and the policies learned on the target MDPs. Based on the proposed transfer RL approach, two meta-knowledge-based transfer reinforcement learning (MKRL) algorithms are developed for MDPs with discrete action spaces and continuous action spaces, respectively. Experimental results in several benchmark tasks show that the MKRL algorithm outperforms other baselines in terms of learning efficiency and interpretability in the target MDPs with generic cases of task similarity.

Introduction

Reinforcement learning (RL) is to learn a policy that can optimize a long-term performance index in sequential decision tasks, which are usually modeled as Markov Decision Problem (MDPs) [1]. Earlier research on RL algorithms focused on MDPs with discrete state and action spaces but many real-world problems need to deal with MDPs with continuous or high-dimensional state and action spaces. Therefore, value function approximation (VFA) or policy function approximation (PFA) has become a major research topic of RL, to enhance the learning efficiency and generalization ability of traditional tabular RL algorithms [2]. Until now, various feature representation methods were proposed for VFA or PFA, such as neural networks, kernel methods [3] and manifold methods [2]. In the past decade, by using deep neural networks as value or policy function approximators, deep reinforcement learning (DRL) algorithms have been widely studied. Until now, DRL algorithms have achieved state-of-the-art performance in computer games (e.g. Atari [4] and GO [5]) and some promising results have been obtained in more complex control tasks [6]. However, the lack of transferability and interpretability is a key obstacle for RL algorithms to be widely applied in many real-world tasks.

The notion of transfer capability of learning algorithms originates from the research on transfer learning in pattern recognition. In transfer learning for pattern recognition, a classifier or regression model is trained in a source domain and it is expected that the learned features and the model can be transferred to a target domain. But in RL, the transfer capability requires a control policy learned in an original MDP can be used to accelerate the learning control processes in target MDPs with different dynamics. This kind of transfer RL is usually denoted as inter-task transfer RL [7]. The transfer capability is important not only for applying RL in different learning control tasks with some similarities but also for transferring the learned policies from simulated tasks to real-world problems. In practice, a policy trained by RL in simulators often fails to work well in real-world control problems with dynamics changing a little. Knowledge transfer from policies pre-trained in source MDPs has not been gaining much attention until recently [8], [9], [10]. However, many transfer RL algorithms are designed to pre-train a policy in a group of source MDPs, even for a whole distribution of MDPs, which is impractical for some real problems. Moreover, the previous algorithms usually consider the knowledge obtained in one source MDP equally rather than distinguishing the importance of knowledge.

Despite the above issues for improving the transfer capability of RL, policies learned by RL with generic nonlinear function approximators such as neural networks are in fact black-box models. In many real safety-critical applications such as autonomous vehicles, it is necessary to know how the decisions are generated by learned control policies of RL algorithms [11]. Some recent works stepped toward making the policies interpretable and transparent [12], [13]. But these works did not improve the performance of policies with interpretable results, or even harm the performance.

For comparison, humans are good at summing up experience they have faced. When facing a similar task again, humans are able to make decisions with intuitive explanations. Such transferable explanations can represent general rules of a certain type of task, which can be regarded as meta-knowledge. In fact, humans can make decisions by some simple judgments without lots of features and accurate values of each feature, while an RL-trained policy requires complex numerical computations. Therefore, it is promising to develop human-like meta-knowledge extraction methods for RL by retraining a transparent model so that the meta-knowledge can be learned by mimicking a trained policy.

Motivated by the above, In this paper, we propose a novel transfer reinforcement learning approach via meta-knowledge extraction using auto-pruned decision trees. A novel transfer reinforcement learning approach via meta-knowledge extraction using auto-pruned decision trees is proposed. Based on the data samples generated from pre-trained policies in source MDPs learned via RL algorithm, a meta-knowledge extraction with re-training an auto-pruned decision tree algorithm is developed. By estimating the uncertainty of state–action pairs in pre-trained policies based on the entropy value of leaf nodes and similarities between the source MDPs and target MDPs, the state spaces of meta-knowledge are determined. Then, a hybrid policy is generated by integrating the extracted meta-knowledge and the policies learned on the target MDPs, by judging whether the observing state is in the state spaces of meta-knowledge. Based on the proposed transfer RL approach, for MDPs with discrete action and continuous action, two meta-knowledge-based transfer reinforcement learning (MKRL) algorithms are developed. We demonstrate the effectiveness of our approach in seven learning tasks and their variants on Gym and Mujoco.

The main contributions of this paper can be summarized in two aspects. Firstly, a novel transfer RL approach is proposed by using meta-knowledge extraction, which can be easily integrated with different RL algorithms. It is verified that the extracted meta-knowledge using auto-pruned decision trees can represent transferable and interpretable policies which are suitable both for source MDPs and target MDPs. Furthermore, it is shown that the learning process in the target MDPs with different dynamics can be significantly accelerated. The second contribution is that two novel transfer RL algorithms, i.e. MKA3C and MKPPO, are proposed for MDPs with discrete action spaces and continuous action spaces, respectively. Comprehensive experiments were conducted in several learning tasks and their variants on Gym and Mujoco. The experimental results showed that the proposed MKA3C and MKPPO can obtain better performance than previous baselines both in transfer learning efficiency and interpretability.

The remaining parts of the paper are organized as follows. In Section 2, related works are discussed. In Section 3, the RL problem and the transfer RL setting are formulated. The proposed algorithms for learning the meta-knowledge and improving both the transfer capability and the interpretability of RL are presented in Section 4. In Section 5, comprehensive performance evaluation and comparisons were conducted on seven typical tasks and their variants. Section 6 concludes this paper and suggests future work.

Section snippets

Related work

In this paper, we focus on the inter-task transfer RL setting with changing dynamics. Solutions for such transfer RL can be summarized into three categories: representation-based, instance-based and multi-task-based.

The representation-based transfer RL adapts the feature of knowledge representation of a target MDP by leveraging the high-level representation from the source MDPs. Policy distillation (PD) [14], [15] is a popular method for learning the representation of the source MDPs. A typical

RL and transfer RL

Before introducing the transfer RL problem, some basic definitions of RL are given first. RL algorithms learn a near-optimal policy through interacting with the environment. The learning process can be modeled as Markov decision processes (MDPs) which comprise state spaces SRl, action spaces A which can be divided into two categories: continuous action spaces and discrete action spaces. A transition dynamic distribution with conditional probability density p satisfying the Markov property.

Meta-knowledge extraction for transfer reinforcement learning

In this section, firstly, a novel meta-knowledge extraction method with auto-pruned decision trees is proposed for MDPs. Then, two meta-knowledge based transfer reinforcement learning (MKRL) algorithms are designed for MDPs with discrete action spaces and continuous action spaces, which takes advantage of both the meta-knowledge and RL algorithms to generate an integrated policy. Fig. 2 illustrates the overall setup. Finally, some discussions and performance analysis about MKRL are given.

Performance evaluation and comparisons

The aim of transfer RL is to improve the learning performance of the target MDPs in three metrics including the jump-start improvement, learning speed improvement and asymptotic improvement [41]. According to the three metrics, we conduct experiments for answering the following questions. (1) When the dynamics of the target MDPs are similar to the source MDPs, can MKRL with auto-pruned decision trees perform well? (2) When the similarity between the target MDPs and the source MDPs is low, can

Conclusion

How to improve the transfer capability and interpretability of RL algorithms is an important and challenging problem, especially for MDPs with changing state transition dynamics. In this paper, we propose a novel transfer RL approach by using meta-knowledge extraction with auto-pruned decision trees, which can extract the transferable and interpretable policies. Then, the meta-knowledge based transfer reinforcement learning (MKRL) algorithm is proposed for target MDPs with in generic cases of

CRediT authorship contribution statement

Yixing Lan: Conceptualization, Methodology, Software. Xin Xu: Conceptualization, Funding acquisition, Writing – review & editing. Qiang Fang: Writing – original draft, Methodology. Yujun Zeng: Writing – original draft, Methodology. Xinwang Liu: Writing – review & editing. Xianjian Zhang: Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (46)

  • HeuilletAlexandre et al.

    Explainability in deep reinforcement learning

    Knowl. Based Syst.

    (2021)
  • HeinDaniel et al.

    Interpretable policies for reinforcement learning by genetic programming

    Eng. Appl. Artif. Intell.

    (2018)
  • SuttonRichard S. et al.

    Reinforcement Learning: An Introduction

    (2018)
  • XuX. et al.

    Manifold-based reinforcement learning via locally linear reconstruction

    IEEE Trans. Neural Netw. Learn. Syst.

    (2017)
  • XuXin et al.

    Online learning control using adaptive critic designs with sparse kernel machines

    IEEE Trans. Neural Netw. Learn. Syst.

    (2013)
  • Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray...
  • SilverDavid et al.

    Mastering the game of go without human knowledge

    Nature

    (2017)
  • SchulmanJohn et al.

    Proximal policy optimization algorithms

    (2017)
  • YangQiang et al.

    Transfer Learning

    (2020)
  • BorsaDiana et al.

    Universal successor features approximators

  • TirinzoniAndrea et al.

    Sequential transfer in reinforcement learning with a generative model

  • TaoYunzhe et al.

    REPAINT: knowledge transfer in deep actor-critic reinforcement learning

    (2020)
  • BastaniOsbert et al.

    Verifiable reinforcement learning via policy extraction

  • BougieNicolas et al.

    Towards interpretable reinforcement learning with state abstraction driven by external knowledge

    IEICE Trans. Inf. Syst.

    (2020)
  • Andrei A. Rusu, Sergio Gomez Colmenarejo, Çaglar Gülçehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu,...
  • YinHaiyan et al.

    Knowledge transfer for deep reinforcement learning with hierarchical experience replay

  • VecerikMel et al.

    Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards

    (2017)
  • Ofir Marom, Benjamin S Rosman, Belief reward shaping in reinforcement learning, in: Thirty-Second AAAI Conference on...
  • BarretoAndré et al.

    Transfer in deep reinforcement learning using successor features and generalised policy improvement

    (2019)
  • Tamas Madarasz, TimE. J. Behrens, Better transfer learning with inferred successor maps, in: Advances in Neural...
  • GenevayAude et al.

    Transfer learning for user adaptation in spoken dialogue systems

  • Zhiyuan Xu, Kun Wu, Zhengping Che, Jian Tang, Jieping Ye, Knowledge Transfer in multi-task deep reinforcement learning...
  • FinnChelsea et al.

    Model-agnostic meta-learning for fast adaptation of deep networks

  • Cited by (6)

    • Lifelong reinforcement learning with temporal logic formulas and reward machines

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Lifelong learning (or continual or multitask learning) has received increasing interest in recent years, due to its potential to reduce agents’ training time in dynamic environments. Lan et al. [25] used an auto-pruned decision tree model to extract meta-knowledge from source MDPs and transfer it to the target MDPs to improve the performance of RL agents. Zhang et al. [26] associated an accelerating bioinspired optimizer with transfer reinforcement learning, which was applied in large-scale power systems to solve the reactive power optimization problem.

    • A Novel Pessimistic Decision Tree Pruning Approach for Classification

      2023, 2023 6th International Conference on Electrical Information and Communication Technology, EICT 2023

    This work was supported by the National Natural Science Foundation of China under Grant 61825305, 61751311, 61906205, the Joint Funds of National Natural Science Foundation of China under Grant U21A20518, and the National Key R&D Program of China 2018YFB1305105.

    View full text