Adaptive dynamic programming-based design of integrated neural network structure for cooperative control of multiple MIMO nonlinear systems

doi:10.1016/j.neucom.2016.05.044

Neurocomputing

Volume 237, 10 May 2017, Pages 12-24

https://doi.org/10.1016/j.neucom.2016.05.044 Get rights and content

Abstract

Solving cooperative problems for multi-agent systems, in which the agent׳s artificial behaviors are similar to naturally biological behaviors of agents in practice, is a major challenge. The problems become more complex if the controlled agents are multi-input and multi-output (MIMO) nonlinear systems lacking knowledge of internal system dynamics and affected by external disturbances. In this paper, firstly, based on adaptive dynamic programming, three neural networks (NNs) (actor/disturber/critic) of control schemes for two-player games are integrated into the structure with only one NN, known as integrated NN (INN), with the aim of reducing computational complexity and waste of resources. Secondly, an INN weight update law and an online control algorithm, which updates parameters in one iterative step, are designed to find $H_{\infty}$ optimal cooperative control solutions. With the aid of Lyapunov theory, we prove that the INN weight approximation errors and the cooperative tracking errors are uniformly ultimately bounded (UUB), and the system parameters converge to the approximately optimal values. Finally, two simulation studies, one of which is compared to three-NN structures in existing literature, are carried out to show the effectiveness of the proposed INN structure.

Introduction

Motivated by works of a flock of birds, colony of ants, and other naturally biological systems, cooperative problems for agents have received much attention in recent years [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. In the cooperative systems, each agent exchanges information with others to do his actions, which not only satisfies his own but also helps his swarm reach the common goals. In practice, the agents are so complex that their models must be presented as a multi-input and multi-output (MIMO) nonlinear system lacking knowledge of internal system dynamics. They are also impacted by external disturbance. Therefore, solving the cooperative problem, in which the agent׳s artificial behaviors are similar to naturally biological behaviors of practical agents, is always a major challenge.

In the cooperative systems, graph theory is used to established a communication graph among the agents [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. Depending on the presence of leader nodes in the graph, the cooperative problem can be divided into two types: the cooperative regulator problem and the cooperative tracking problem. In this paper, the latter is focused, where both state trajectories of follower and leader nodes are synchronized [1], [2], [5], [6], [7], [8], [9].

Inspired by the nonlinear approximation abilities, NN structures have been constantly developed for cooperative adaptive control systems [9], [6], [7], [8], [10]. In [6], a method using the distributed NNs with linearly parametrized structures is proposed to identify uncertainty dynamics directly. Because of dependence on the analytic solution of the Riccati inequality formed by the given matrices, this method is only applied to a specific class of nonlinear systems. In [9], NN structures with a suitable basis set of action functions are selected to design a cooperative robust adaptive control law for linearized dynamic agents tracking the trajectory of the leader node. In [7], NN structures with radial basis functions (RBFs) are used to approximate the uncertain dynamics. The RBF NN weight update laws are based on the upper bound values of the NN weights, which are difficult to determine in practical cooperative applications. In [8], to identify the unknown non-linearity, linear-in-parameter NNs with the basis functions such as sigmoid or Gaussians are chosen. The NN weight update laws are based on the solution of Lyapunov equations derived from the sliding mode control method. In [10], the self-structuring NNs are designed to identify the agent׳s unknown dynamics directly, where the basic functions are sigmoid in the hidden layer, but the number of hidden nodes is dynamically computed and the sign of input dynamic functions in the system must be known.

Most of the NN-based methods mentioned in the paragraph above do not minimize any long-term performance functions, thus they are not considered as optimal control methods. Recently, an actor-critic NN (ACNN) structure from the adaptive dynamic programming (ADP) [11] has been developed for the cooperative systems [4], [12], where optimal value functions and control policies are approximated by critic and actor NNs, respectively, but all node dynamics in communication graphs are limited to double integrator systems.

To solve cooperative problems of nonlinear systems impacted by external disturbances, ACNNs are developed. In [13], [14], [15], ACNNs are used in online algorithms, which synchronously update parameters of NNs, to find the Hamilton–Jacobi–Isaac (HJI) solutions for the non-zero-sum games of the two-player system [14], [15] or the multi-player system [13]. However, there are still disadvantages applying the ACNNs, especially when a structure consists of three NNs, namely AC3NN, is exploited [16]. In AC3NN, one NN (critic NN) is used for approximating the optimal value function, the other two (actor NNs) are used for approximating the controller (first player) and the disturber (second player), respectively. Therefore, the greater number of agents in the cooperative system is, the more the number of NNs is tripled. Assume that AC3NN is used for the cooperative multiple MIMO systems which contain a large number of feedback states. Then the number of NN weights and the dimension of NN basis functions representing combinations of the states will significantly increase (see [4], [13] for more survey). As a result, besides wasting resources, AC3NN can lead to a significant increase in computational complexity. Another disadvantage of AC3NN is that the algorithms bring two iterative loops, i.e. while the parameters of the disturbance law are updated in an iterative loop, the parameters of the control law must be waited for updating in the other. Consequently, they result in low efficiency even being applied to a single agent [17].

To overcome the disadvantages of using more than one NN, the structures with a single NN (SOLA—Single Online Approximator) are proposed to learn the HJI solution of the affine system [3] or the HJB solution of the MIMO nonlinear system [18]. However, since the SOLA weight update laws require the knowledge of the system dynamics [3] or ignore disturbances [18], SOLA techniques are still restricted. In addition, until now, they have been only applied to the uncooperative problems.

In contrast, the control structures in [17], [19] not only use a single NN but also remove the assumption of the known system dynamics [20], [21], [22]. In [17], at every loop step, parameters of the control law and the disturbance law must be constantly held in a period of time to collect the sample set for training the NN. In [19], external disturbance of each player is ignored. Furthermore, the NN structures in [17], [19] are designed for the uncooperative problem.

Contributions: Comparing with the existing works, the main contribution of this paper include three following aspects:

1.
Design an integrated neural network (INN) structure in contrast to the existing work in [4]. To the best of our knowledge, this paper may be the first work that three NNs in the ADP method are integrated into only one for the cooperative problem of the multiple MIMO nonlinear systems. This integration aims to reduce computational complexity and resources.
2.
Design an INN weight update law, in which the knowledge of internal system dynamics is not required, and an online $H_{\infty}$ optimal cooperative control algorithm, in which the INN weight parameters and the parameters of the control and disturbance laws are simultaneously and continuously updated in one iterative step.
3.
Prove that, with the aid of Lyapunov theory, cooperative tracking errors of the closed-loop system and the INN weight approximation errors are UUB, and the value function, the $H_{\infty}$ optimal cooperative control law and the worst disturbance law converge to approximately optimal values.
4.
The effectiveness of our method is shown by comparing the simulation results of both INN and existing three-NN structures in [14], [21].

The rest of the paper is organized as follows. In Section 2, we introduce a distributed communication graph and derive overall cooperative tracking error dynamics of the MIMO nonlinear systems. In Section 3, we design the INN structure, the weight update law and the control algorithm. Simulation studies are implemented in Section 4. Finally, a brief conclusion is drawn in Section 5.

Notations: $R$ , $R^{n}$ , and $R^{n \times m}$ are the set of real numbers, the n-dimensional Euclidean space, and the set of all real $n \times m$ matrices, respectively. $∥ . ∥$ defines the vector or matrix norm in $R^{n}$ or $R^{n \times m}$ , respectively. $V_{x} ≜$ $\partial V / \partial x$ denotes the gradient of V in x. The superscript T is used for the transpose. $\otimes$ denotes the Kronecker product with the properties ${(X \otimes Y)}^{T} = X^{T} \otimes Y^{T}$ , $β (X \otimes Y) = (β X) \otimes Y = X \otimes (β Y)$ , where X and Y are matrices and β is a scalar. I_n denotes a n-dimensional identity matrix. ${\bar{1}}_{n} = {[1, \dots, 1]}^{T} \in R^{n}$ . $diag (α_{i})$ is a diagonal matrix whose the diagonal element i is $α_{i}$ . $L_{2} [0, \infty)$ is the Banach space, for $\forall d (t) \in L_{2} [0, \infty)$ , $\int_{0}^{\infty} {∥ d (t) ∥}^{2} d τ \in L_{2} [0, \infty)$ .

Section snippets

Distributed communication graph theory

Consider N agents in a cooperative system. The distributed communication of the multiple agents can be represented by a directed graph, $G (V, E, A)$ , where the agents are characterized by the set of nodes $V = {s_{0}, \dots, s_{N}}$ , where $s_{0}$ is a leader node. Relationships among the agents are determined by the set of edges $E \subseteq V \times V$ with a connectivity weight matrix $A = [a_{ij}]$ , where $a_{ii} = 0$ , $a_{ij} > 0$ for $a_{ij} \in E$ and $a_{ij} = 0$ , otherwise. If the states of the agent $s_{i}$ is available to $s_{j}$ , then $s_{j}$ is a neighbor of $s_{i}$ . All neighbors

Design of INN structure for multiple nonlinear systems

As mentioned in Lemma 1, this section is devoted to design the INN structure for the control law $u_{i}^{⁎}$ , $i = 1, \dots, N$ , to force the tracking error E in (13) to converge to zero in an $H_{\infty}$ optimal manner.

Simulation studies

In this section, we study two simulations. First, the simulation and comparison with existing works are conducted then the simulation on the cooperative problem of multiple wheeled mobile robots is implemented.

Conclusion

In this paper, based on the adaptive dynamic programming, the novel integrated neural network structure is designed for the cooperative problem of the MIMO nonlinear systems with disturbance and without knowledge of the internal system dynamics. In the proposed cooperative control scheme, only one INN is used instead of three. The INN weight update law and the control algorithm are designed to make the closed-loop system cooperative tracking errors and the INN weight approximation error be UUB.

Nguyen Tan Luy received the B.S. and the M.Sc. degrees in the Department of Computer Science and Electrical and Electronics Engineering from HCM City University of Technology, Vietnam, in 1996 and in 2006, respectively. He received the Ph.D. degree from HCM City University of Technology, VNU-HCM, HCM City, Vietnam, in 2015. He is currently a Lecturer in the Faculty of Electronics Technology, Industrial University of HCM City, Vietnam. His current research interests include reinforcement

References (34)

K.G. Vamvoudakis et al.
Multi-agent differential graphical gamesonline adaptive learning solution for synchronization with optimality
Automatica
(2012)
H.W. Zhang et al.
Adaptive cooperative tracking control of higher-order nonlinear systems with unknown dynamics
Automatica
(2012)
L. Cui et al.
Reinforcement learning-based asymptotic cooperative tracking of a class multi-agent dynamic systems using neural networks
Neurocomputing
(2016)
K.G. Vamvoudakis et al.
Multi-player non-zero-sum gamesonline adaptive learning solution of coupled Hamilton-Jacobi equations
Automatica
(2011)
D. Liu et al.
Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm
Neurocomputing
(2013)
H. He et al.
A three-network architecture for on-line learning and optimization based on adaptive dynamic programming
Neurocomputing
(2012)
X. Cui et al.
Online finite-horizon optimal learning algorithm for nonzero-sum games with partially unknown dynamics and constrained inputs
Neurocomputing
(2016)
K. Hornik et al.
Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks
Neural Netw.
(1990)
K.G. Vamvoudakis et al.
Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
Automatica
(2010)
T.F. Liu et al.
Distributed output-feedback control of nonlinear multi-agent systems
IEEE Trans. Autom. Control
(2013)

K. Movric et al.

Cooperative optimal control for multi-agent systems on directed graph topologies

IEEE Trans. Autom. Control

(2014)

T. Dierks, S. Jagannathan, Optimal control of affine nonlinear continuous-time systems using an online...

Y. Cao et al.

Optimal linear-consensus algorithmsan LQR perspective

IEEE Trans. Syst. Man Cybern. Cybern.

(2010)

Z. Peng et al.

Distributed neural network control for adaptive synchronization of uncertain dynamical multi-agent systems

IEEE Trans. Neural Netw. Learn. Syst.

(2014)

Z.G. Hou et al.

Decentralized robust adaptive control for the multiagent system consensus problem using neural networks

IEEE Trans. Syst., Man Cybern. Part B: Cybern.

(2009)

A. Das et al.

Cooperative adaptive control for synchronization of second-order systems with unknown nonlinearities

Int. J. Robust Nonlinear Control

(2011)

G. Chen et al.

Cooperative tracking control of nonlinear multiagent systems using self-structuring neural networks

IEEE Trans. Neural Netw. Learn. Syst.

(2014)

Cited by (29)

Optimal consensus model-free control for multi-agent systems subject to input delays and switching topologies
2022, Information Sciences
In this paper, the optimal consensus control problem of the discrete-time multi-agent systems with switching topologies and input delays is investigated by adopting the adaptive dynamic programming method. Through introducing a new state variable, the original input-delayed system can be transformed into a delay-free one. Then, a novel local performance index function is designed for each agent to eliminate the impact of switching topologies, which does not explicitly rely on the information of neighbors. Based on Bellman optimality principle, Lyapunov stability theorem and deep reinforcement learning method, the stability of the error system and the optimality of the value function are proved. In order to solve the consensus problem of the unknown systems, we propose a new value iteration algorithm based on the input and output data of the system, which can not only guarantee the achievement of consensus but also minimize the performance index function. Finally, two numerical simulations based on actor-critic neural networks are given, including the following two cases: periodic switching topologies and Markov switching topologies, to verify the effectiveness of the proposed optimal control scheme.
Sponge Supercapacitor rule-based energy management strategy for wireless sensor nodes optimized by using dynamic programing algorithm
2022, Energy
Citation Excerpt :
The DP algorithm is an optimization method for solving decision processes. As an offline algorithm that achieves global optimization at the expense of computing resources, DP has been widely used in fields such as operations research and swarm intelligence [46]. Researchers have used the DP algorithm to study the energy distribution in hybrid energy EVs recently [35,36].
Hybrid energy storage systems composed of batteries and supercapacitors (SCs) can provide a stable and sustainable power source for wireless sensor network (WSN) nodes, where the energy management strategy (EMS) plays a significant role. However, the design of a traditional EMS is based on empirical rules and cannot fully utilize the SC. In this study, we propose the concept of a sponge SC and design a strategy based on it to prolong the lifetime of solar-powered WSN nodes. This sponge SC EMS is realized by using dynamic programming (DP), which is a global optimization algorithm. Thus, the proposed EMS is significantly improved over other traditional empirical-based strategies. The difference between the lifetime simulated by the proposed strategy and the optimal upper bound given by DP is within 40%, and the novel strategy extends the battery lifetime by 32%–77% compared with the best existing strategies investigated. The robustness of the proposed method is also validated. It can be confirmed that the strategy performs well under different workloads and sunlight conditions. The sponge SC rule-based hybrid EMS provides a theoretical near-optimal solution for the extension of the life of the WSN node, contributing to wider application prospects of the WSN node.
Zero-sum game-based neuro-optimal control of modular robot manipulators with uncertain disturbance using critic only policy iteration
2021, Neurocomputing
Citation Excerpt :
Some researches utilized differential game theory to address control issue of complicated nonlinear systems with disturbance [35]. Nguyen et al. [36] designed a class of NN-structures for nonlinear systems by using the differential game theory. Dong et al. [37] proposed zero-sum strategy-based decentralized robust control of MRMs in contact with uncertain environment.
In this paper, a zero-sum differential game strategy-based neuro-optimal control method is presented via critic only policy iteration-adaptive dynamic programming (COPI-ADP) approach to address optimal trajectory tracking control problem of modular robot manipulators (MRMs) with uncertain disturbance. The dynamic model of modular robot manipulator systems is formulated as an integration of joint subsystems and unknown robotic model uncertainties are identified by the developed linear extension state observer. Then, the optimal control issue of the modular robot manipulator systems with uncertain disturbance is transformed into a two-player zero-sum differential game one. Based on adaptive dynamic programming and policy iteration algorithms, the Hamilton-Jacobi-Issacs (HJI) equation is approximately solved using only critic neural network and thus facilitating the feasible derivation of the approximated optimal control policy. The trajectory of tracking errors of modular robot manipulator system is guaranteed to be uniform ultimate bounded by using the Lyapunov theory. Finally, experiments are provided to demonstrate the advantage and effectiveness of the developed control method.
Large-scale dynamic system optimization using dual decomposition method with approximate dynamic programming
2021, Systems and Control Letters
Citation Excerpt :
Comparing with the literature of the concept, we study the optimization problem for dynamic agents, thus it is different from the literature of decomposition methods for static optimization problems [11,12,15,16]. Our proposed algorithm is also different from [22–24], since in these papers the coupling constraints among the agents have not been considered. The main contributions of our paper are summarized in the following:
In this paper, multi-agent dynamic optimization with a coupling constraint is studied. The aim is to minimize a strongly convex social cost function, by considering a linear stochastic dynamics for each agent and also coupling constraints among the agents. In order to handle the coupling constraint and also, to avoid high computational cost imposed by a centralized method for large scale systems, the dual decomposition method is used to decompose the problem into multiple individual sub-problems, while the dual variable is adjusted by a coordinator. Nevertheless, since each sub-problem is not a linear–quadratic (LQ) optimal control problem, and hence its closed-form solution does not exist, approximate dynamic programming (ADP) is utilized to solve the sub-problems. The main contribution of the paper is to propose an algorithm by considering the interrelated iterations of dual variable adjustment and ADP, and to prove the convergence of the algorithm to the global optimal solution of the social cost function. Additionally, the implementation of the proposed algorithm using a neural network is presented. Also, the computational advantage of the proposed algorithm in comparison with other bench-marking methods is discussed in simulation results.
Distributed finite-time fault-tolerant containment control for multiple ocean bottom flying nodes
2020, Journal of the Franklin Institute
Citation Excerpt :
For multiple underactuated AUV systems, a distributed speed estimation strategy was proposed in [5]. Combining with dynamic surface control technology, an adaptive path following control strategy was put forward to follow the predefined path for each AUV, and the general disturbances were handled by using NNs [6–8]. When only the relative position information could be obtained, Ref. [9] solved the formation control problem under fixed and switching communication topologies.
The Ocean Bottom Flying Node (OBFN) is a kind of small Autonomous Underwater Vehicle (AUV) used in detection of seabed resources. Based on directed commination topology, this paper investigates the problem of distributed finite-time fault-tolerant containment control for multiple OBFN systems in presence of model uncertainties, external disturbances, and thruster faults. By choosing the nonsingular fast terminal sliding surface and defining the containment error variables, a distributed finite-time containment control method is designed, so as to make the states of the multiple OBFN systems converge to the sliding surface in finite time. The thruster faults, model uncertainties, and external disturbances are considered together and estimated by utilizing Neural Networks (NNs). An adaptive law is designed to compensate the upper bounds of estimation error. Based on the graph theory and matrix theory, it is demonstrated that the follower OBFNs could enter the convex hull formed by the leader OBFNs in finite time through using the Lyapunov approach. Numerical simulation is presented to show the effectiveness of the proposed algorithm.
Adaptive deep dynamic programming for integrated frequency control of multi-area multi-microgrid systems
2019, Neurocomputing
To reduce the frequency deviation of a multi-area multi-microgrid system, a framework of integrated frequency control is designed in this paper, which can replace load frequency control (LFC) and generation command dispatch (GCD).Then an adaptive deep dynamic programming (ADDP) scheme is proposed for the integrated frequency control. The ADDP contains three deep neural networks, i.e., deep prediction neural network, deep critic neural network and deep action neural network. Deep prediction neural network is applied to predict the next state of the multi-area multi-microgrid system from the previous states and the previous actions. Deep critic neural network is employed in the evaluations of the performance of the deep action neural network. Deep action neural network is introduced to simultaneously provide generation commands for all the LFC units in the multi-area multi-microgrid system. The ADDP is compared with other 157 algorithms under six case studies, i.e., basic situation, plug-and-play, communication failure, all-day long disturbance, time-varying topology and parameters varying. The other 157 algorithms consist of adaptive dynamic programming and 156 combined algorithms, which combined with 12 control algorithms for the controller of LFC and 13 optimization algorithms for GCD. Simulation results verify the effectiveness and superiority of the ADDP for integrated frequency control of a multi-area multi-microgrid system.

View all citing articles on Scopus

View full text

Adaptive dynamic programming-based design of integrated neural network structure for cooperative control of multiple MIMO nonlinear systems

Abstract

Introduction

Section snippets

Distributed communication graph theory

Design of INN structure for multiple nonlinear systems

Simulation studies

Conclusion

Automatica

Automatica

Neurocomputing

Automatica

Neurocomputing

Neurocomputing

Neurocomputing

Neural Netw.

Automatica

Distributed output-feedback control of nonlinear multi-agent systems

IEEE Trans. Autom. Control

Cooperative optimal control for multi-agent systems on directed graph topologies

IEEE Trans. Autom. Control

Optimal linear-consensus algorithmsan LQR perspective

IEEE Trans. Syst. Man Cybern. Cybern.

Distributed neural network control for adaptive synchronization of uncertain dynamical multi-agent systems

IEEE Trans. Neural Netw. Learn. Syst.

Decentralized robust adaptive control for the multiagent system consensus problem using neural networks

IEEE Trans. Syst., Man Cybern. Part B: Cybern.

Cooperative adaptive control for synchronization of second-order systems with unknown nonlinearities

Int. J. Robust Nonlinear Control

Cooperative tracking control of nonlinear multiagent systems using self-structuring neural networks

IEEE Trans. Neural Netw. Learn. Syst.