Lyapunov theory based stable Markov game fuzzy control for non-linear systems

https://doi.org/10.1016/j.engappai.2016.06.008Get rights and content

Abstract

In this paper we propose a Lyapunov theory based Markov game fuzzy controller which is both safe and stable. We attempt to optimize a reinforcement learning (RL) based controller using Markov games, simultaneously hybridizing it with a Lyapunov theory based control for stability. Proposed technique generates in an RL based game theoretic, adaptive, self learning, optimal fuzzy controller which is both robust and has guaranteed stability. Proposed controller is an “annealed” hybrid of fuzzy Markov games and the Lyapunov theory based control. Fuzzy systems have been employed as generic function approximators for scaling the proposed approach to continuous state-action domains. We test our proposed controller on three benchmark non-linear control problems: (i) inverted pendulum, (ii) trajectory tracking of standard two-link robotic manipulator, and (iii) tracking control of a two link selective compliance assembly robotic arm (SCARA). Simulation results and comparative evaluation against baseline fuzzy Markov game based control showcases superiority and effectiveness of the proposed approach.

Introduction

Reinforcement learning (RL) paradigm centers on Markov Decision Processes (MDP) as the underlying model for adaptive optimal control of non-linear systems (Busoniu et al., 2010, Wiering and van Otterlo, 2012). A critical assumption in the MDP based RL technique is the assumption of a stationary environment. However, imposing such a restrictive assumption on the environment may not be feasible, especially when the controller has to deal with disturbances and parametric variations.

Notwithstanding this limitation, RL has been used successfully for controlling a wide variety of non linear systems, e.g., in Kobayashi et al. (2009) a meta-learning method based on temporal difference has been employed for inverted pendulum control (IPC); Kumar et al. (2012) presents a self tuning fuzzy Q controller for IPC; Ju et al. (2014) proposes kernel based approximate dynamic programming approach for inverted pendulum control, and in Liu et al. (2014) an experience replay least squares policy iteration procedure has been proposed for efficient utilization of experiential information. In literature, we can find quite a few variants of the inverted pendulum problem. However, in our work, we have used standard version of the pendulum wherein the pivot point is mounted on a cart which can move horizontally.

Another domain where RL has been applied is robotic manipulator control, which is a highly coupled, non linear and time varying task. The task becomes even more challenging when the controller has to cope with varying payload mass and external disturbances. Both, neural network based RL and fuzzy systems based RL have been employed for robotic manipulator control. In Lin (2009) authors have used an H reinforcement learning based controller on a fuzzy wavelet network (FWN).They implement an actor-critic RL formulation avoiding complex Ricatti equations for controlling SCARA. An adaptive neural RL control has been proposed in Tang et al. (2014) to counter unknown functions and dead zone inputs in an actor-critic RL configuration, wherein Lyapunov theory has been employed to show boundedness of all closed loop signals. For a comprehensive and in-depth look on controllers employing soft computing techniques, e.g., neural networks, fuzzy systems and evolutionary computation on robotic manipulators; we refer the reader to (Katic and Vukobratovic, 2013).

As stated earlier, all RL based controller design approaches share a basic lacuna that they assume an MDP framework. To make RL controller design process more general and robust, we introduced a Markov game formulation wherein the noise and disturbance are viewed as an “opponent” to the “controller” in a game theoretic setup (Sharma and Gopal, 2008). This formulation helped us in designing RL controllers that are robust in handling disturbances and noise as the controller always tries to optimize against the “worst case” opponent or noise. Markov Game formulation (Sharma and Gopal, 2008) allows broadening of the MDP based RL control to encompass multiple adaptive agents with competitive goals. Markov game controller was able to deal with disturbances and parameter variations of the controlled plant. However, both MDP and Markov games based RL approaches failed to address one key concern, namely, stability of the designed controller.

To be specific, there is no guarantee that the controller will remain stable in presence of disturbances and/or parameter variations. Our attempt herein is to design self learning, model free controllers with guaranteed stability. This is sought to be achieved by incorporating a Lyapunov theory based action generation mechanism in the game theoretic RL setup. The controller has all the advantages of game based RL (Markov game control) and has guaranteed stability due to inclusion of a Lyapunov theory based action.

This work is motivated by a need for addressing the stability issue in RL based control by proposing a ‘safe and stable' game theoretic controller. The controller is safe as it uses a Markov game framework for optimization; controller always optimises against the worst opponent or plays ‘safe’ as referred to in the game theory literature (Vrabie and Vamvoudakis, 2013). In the proposed approach Markov game based ‘safe’ policy is hybridized with a Lyapunov theory based ‘stable’ policy for generating a ‘safe and stable' policy. This hybridization is carried out in an ‘annealed’ or gradual manner for arriving at a safe and stable game theoretic control.

Robotic manipulators (Katic and Vukobratovic, 2013) are highly coupled, non linear and time varying uncertain systems. Furthermore, industrial robotic manipulators are employed for picking up and releasing objects or they have to deal a with varying payload mass. This presents a highly challenging and complex task for testing our proposed approach. We test our approach on two degrees of freedom (DOF) robotic manipulators as they capture all the intricacies of a six DOF manipulator and are computationally less expensive. We employ the approach on two robotic arms, i.e., a standard two link robot arm and a SCARA.

Proposed controller belongs to the class of self learning/adaptive systems with roots in Machine Learning (Wiering and van Otterlo, 2012). In contrast to other Artificial Intelligence based and conventional controllers, RL based controllers do not assume access to desired response or trajectory. Proposed controller neither assumes knowledge of desired response nor system model. The controller discovers optimal actions by repeated trial and error interactions with the systems/plant it intends to control. It has access to only a heuristic reinforcement signal emanated by the plant telling the controller whether the action taken by it is “good” or “bad”. This makes control task a very challenging one. The advantage is that the designed controller is a self learning, adaptive and is suitable for controlling an unknown system.

The paper is structured as follows: a systematic presentation of the RL approaches that lead to the formulation of the proposed methodology is presented in Section 2. Formulation of Lyapunov theory based stable Markov game fuzzy controller for the three tasks: a) inverted pendulum b) Two link robotic manipulator and c) SCARA, along with simulation models and parameters thereof, have been described in Section 3. Section 4 presents simulation results and comparative evaluation of Lyapunov Markov game fuzzy control against baseline fuzzy Markov game control for the three problems. Section 5 summarises the paper and outlines scope for future work.

Section snippets

Lyapunov theory based Markov game fuzzy approach

To facilitate reader understanding of the proposed approach, we give a brief description of some relevant RL approaches.

Inverted pendulum control

We employ the proposed approach for controlling bench mark Inverted pendulum. State space of the system consists of two real-valued variables ϕ (pole angle, in rad) and ϕ̇ (pole velocity, in rad/s). Fig. 3 shows the inverted pendulum.

We apply Lyapunov Markov game fuzzy control action to the cart to balance the pendulum (Kumar et al., 2012). The pendulum is initialised from a position close to (ϕ,ϕ̇)=(0,0) position (a standard practice). A trial is terminated when, either the pendulum remains

Simulation results

We give comparison of the proposed controller against base line game theory based RL control namely Markov game fuzzy RL control. Proposed Lyapunov theory based Markov game fuzzy control shares all shortcomings and advantages with corresponding Markov game fuzzy control, both being RL based approaches. This provides a fair comparison ground for the proposed control approach. This is more so because our primary claim is that the inclusion of a Lyapunov theory based action generation mechanism

Conclusions and future scope

In this work we have presented a novel Lyapunov theory based Markov game fuzzy controller which offers a stable and safe control. Proposed controller demonstrated its supremacy over the baseline Markov game fuzzy controller in terms of robustness and stability as evidenced by simulations on the benchmark problems: a) inverted pendulum b) two link robot arm, and c) SCARA. In future we intend to apply the proposed scheme on partially observable Markov domains using the POMDP framework.

References (14)

  • Aguilar-Ibanez, C., 2008. A constructive Lyapunov function for controlling the inverted pendulum. In: American Control...
  • L. Busoniu et al.

    Reinforcement Learning and Dynamic Programming using Function Approximator

    (2010)
  • X. Ju et al.

    Kernel based approximate dynamic programming forreal time online learning control: an experimental study

    IEEE Trans. Control. Syst. Technol.

    (2014)
  • D. Katic et al.

    Intelligent Control of Robotic Systems

    (2013)
  • Kobayashi, K., Mizoue, H., Kuremoto, T., Obayashi, M., 2009. A meta-learning method based on temporal difference error....
  • R. Kumar et al.

    Temporal difference based tuning of fuzzy logic controller through reinforcement learning to control an inverted pendulum

    Int. J. Intell. Syst. Appl.

    (2012)
  • J. Levine

    Analysis and Control of Nonlinear Systems

    (2009)
There are more references available in the full text version of this article.

Cited by (15)

  • Linguistic Lyapunov reinforcement learning control for robotic manipulators

    2018, Neurocomputing
    Citation Excerpt :

    However most of the RL approaches proposed so for do not guarantee any stability on the designed controller [2]. Lyapunov theory [3,4] offers a powerful platform for designing controllers that are guaranteed to be stable and can handle model uncertainties and parameter variations. Recently, in [3] authors have introduced stability in the RL paradigm by hybridizing Lyapunov theory into the RL action generation mechanism.

  • Fuzzy Lyapunov Reinforcement Learning for Non Linear Systems

    2017, ISA Transactions
    Citation Excerpt :

    Lyapunov's Stability theorem is the most general and widely used one for establishing stability of controllers for nonlinear systems. In a recent work [5], lyapunov theory has been used for designing a stable RL controller. The authors have proposed a lyapunov theory based Markov game fuzzy controller.

View all citing articles on Scopus
View full text