Elsevier

Neurocomputing

Volume 237, 10 May 2017, Pages 12-24
Neurocomputing

Adaptive dynamic programming-based design of integrated neural network structure for cooperative control of multiple MIMO nonlinear systems

https://doi.org/10.1016/j.neucom.2016.05.044Get rights and content

Abstract

Solving cooperative problems for multi-agent systems, in which the agent׳s artificial behaviors are similar to naturally biological behaviors of agents in practice, is a major challenge. The problems become more complex if the controlled agents are multi-input and multi-output (MIMO) nonlinear systems lacking knowledge of internal system dynamics and affected by external disturbances. In this paper, firstly, based on adaptive dynamic programming, three neural networks (NNs) (actor/disturber/critic) of control schemes for two-player games are integrated into the structure with only one NN, known as integrated NN (INN), with the aim of reducing computational complexity and waste of resources. Secondly, an INN weight update law and an online control algorithm, which updates parameters in one iterative step, are designed to find H optimal cooperative control solutions. With the aid of Lyapunov theory, we prove that the INN weight approximation errors and the cooperative tracking errors are uniformly ultimately bounded (UUB), and the system parameters converge to the approximately optimal values. Finally, two simulation studies, one of which is compared to three-NN structures in existing literature, are carried out to show the effectiveness of the proposed INN structure.

Introduction

Motivated by works of a flock of birds, colony of ants, and other naturally biological systems, cooperative problems for agents have received much attention in recent years [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. In the cooperative systems, each agent exchanges information with others to do his actions, which not only satisfies his own but also helps his swarm reach the common goals. In practice, the agents are so complex that their models must be presented as a multi-input and multi-output (MIMO) nonlinear system lacking knowledge of internal system dynamics. They are also impacted by external disturbance. Therefore, solving the cooperative problem, in which the agent׳s artificial behaviors are similar to naturally biological behaviors of practical agents, is always a major challenge.

In the cooperative systems, graph theory is used to established a communication graph among the agents [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. Depending on the presence of leader nodes in the graph, the cooperative problem can be divided into two types: the cooperative regulator problem and the cooperative tracking problem. In this paper, the latter is focused, where both state trajectories of follower and leader nodes are synchronized [1], [2], [5], [6], [7], [8], [9].

Inspired by the nonlinear approximation abilities, NN structures have been constantly developed for cooperative adaptive control systems [9], [6], [7], [8], [10]. In [6], a method using the distributed NNs with linearly parametrized structures is proposed to identify uncertainty dynamics directly. Because of dependence on the analytic solution of the Riccati inequality formed by the given matrices, this method is only applied to a specific class of nonlinear systems. In [9], NN structures with a suitable basis set of action functions are selected to design a cooperative robust adaptive control law for linearized dynamic agents tracking the trajectory of the leader node. In [7], NN structures with radial basis functions (RBFs) are used to approximate the uncertain dynamics. The RBF NN weight update laws are based on the upper bound values of the NN weights, which are difficult to determine in practical cooperative applications. In [8], to identify the unknown non-linearity, linear-in-parameter NNs with the basis functions such as sigmoid or Gaussians are chosen. The NN weight update laws are based on the solution of Lyapunov equations derived from the sliding mode control method. In [10], the self-structuring NNs are designed to identify the agent׳s unknown dynamics directly, where the basic functions are sigmoid in the hidden layer, but the number of hidden nodes is dynamically computed and the sign of input dynamic functions in the system must be known.

Most of the NN-based methods mentioned in the paragraph above do not minimize any long-term performance functions, thus they are not considered as optimal control methods. Recently, an actor-critic NN (ACNN) structure from the adaptive dynamic programming (ADP) [11] has been developed for the cooperative systems [4], [12], where optimal value functions and control policies are approximated by critic and actor NNs, respectively, but all node dynamics in communication graphs are limited to double integrator systems.

To solve cooperative problems of nonlinear systems impacted by external disturbances, ACNNs are developed. In [13], [14], [15], ACNNs are used in online algorithms, which synchronously update parameters of NNs, to find the Hamilton–Jacobi–Isaac (HJI) solutions for the non-zero-sum games of the two-player system [14], [15] or the multi-player system [13]. However, there are still disadvantages applying the ACNNs, especially when a structure consists of three NNs, namely AC3NN, is exploited [16]. In AC3NN, one NN (critic NN) is used for approximating the optimal value function, the other two (actor NNs) are used for approximating the controller (first player) and the disturber (second player), respectively. Therefore, the greater number of agents in the cooperative system is, the more the number of NNs is tripled. Assume that AC3NN is used for the cooperative multiple MIMO systems which contain a large number of feedback states. Then the number of NN weights and the dimension of NN basis functions representing combinations of the states will significantly increase (see [4], [13] for more survey). As a result, besides wasting resources, AC3NN can lead to a significant increase in computational complexity. Another disadvantage of AC3NN is that the algorithms bring two iterative loops, i.e. while the parameters of the disturbance law are updated in an iterative loop, the parameters of the control law must be waited for updating in the other. Consequently, they result in low efficiency even being applied to a single agent [17].

To overcome the disadvantages of using more than one NN, the structures with a single NN (SOLA—Single Online Approximator) are proposed to learn the HJI solution of the affine system [3] or the HJB solution of the MIMO nonlinear system [18]. However, since the SOLA weight update laws require the knowledge of the system dynamics [3] or ignore disturbances [18], SOLA techniques are still restricted. In addition, until now, they have been only applied to the uncooperative problems.

In contrast, the control structures in [17], [19] not only use a single NN but also remove the assumption of the known system dynamics [20], [21], [22]. In [17], at every loop step, parameters of the control law and the disturbance law must be constantly held in a period of time to collect the sample set for training the NN. In [19], external disturbance of each player is ignored. Furthermore, the NN structures in [17], [19] are designed for the uncooperative problem.

Contributions: Comparing with the existing works, the main contribution of this paper include three following aspects:

  • 1.

    Design an integrated neural network (INN) structure in contrast to the existing work in [4]. To the best of our knowledge, this paper may be the first work that three NNs in the ADP method are integrated into only one for the cooperative problem of the multiple MIMO nonlinear systems. This integration aims to reduce computational complexity and resources.

  • 2.

    Design an INN weight update law, in which the knowledge of internal system dynamics is not required, and an online H optimal cooperative control algorithm, in which the INN weight parameters and the parameters of the control and disturbance laws are simultaneously and continuously updated in one iterative step.

  • 3.

    Prove that, with the aid of Lyapunov theory, cooperative tracking errors of the closed-loop system and the INN weight approximation errors are UUB, and the value function, the H optimal cooperative control law and the worst disturbance law converge to approximately optimal values.

  • 4.

    The effectiveness of our method is shown by comparing the simulation results of both INN and existing three-NN structures in [14], [21].

The rest of the paper is organized as follows. In Section 2, we introduce a distributed communication graph and derive overall cooperative tracking error dynamics of the MIMO nonlinear systems. In Section 3, we design the INN structure, the weight update law and the control algorithm. Simulation studies are implemented in Section 4. Finally, a brief conclusion is drawn in Section 5.

Notations: R, Rn, and Rn×m are the set of real numbers, the n-dimensional Euclidean space, and the set of all real n×m matrices, respectively. . defines the vector or matrix norm in Rn or Rn×m, respectively. Vx V/x denotes the gradient of V in x. The superscript T is used for the transpose. denotes the Kronecker product with the properties (XY)T=XTYT, β(XY)=(βX)Y=X(βY), where X and Y are matrices and β is a scalar. In denotes a n-dimensional identity matrix. 1¯n=[1,,1]TRn. diag(αi) is a diagonal matrix whose the diagonal element i is αi. L2[0,) is the Banach space, for d(t)L2[0,), 0d(t)2dτL2[0,).

Section snippets

Distributed communication graph theory

Consider N agents in a cooperative system. The distributed communication of the multiple agents can be represented by a directed graph, G(V,E,A), where the agents are characterized by the set of nodes V={s0,,sN}, where s0 is a leader node. Relationships among the agents are determined by the set of edges EV×V with a connectivity weight matrix A=[aij], where aii=0, aij>0 for aijE and aij=0, otherwise. If the states of the agent si is available to sj, then sj is a neighbor of si. All neighbors

Design of INN structure for multiple nonlinear systems

As mentioned in Lemma 1, this section is devoted to design the INN structure for the control law ui, i=1,,N, to force the tracking error E in (13) to converge to zero in an H optimal manner.

Simulation studies

In this section, we study two simulations. First, the simulation and comparison with existing works are conducted then the simulation on the cooperative problem of multiple wheeled mobile robots is implemented.

Conclusion

In this paper, based on the adaptive dynamic programming, the novel integrated neural network structure is designed for the cooperative problem of the MIMO nonlinear systems with disturbance and without knowledge of the internal system dynamics. In the proposed cooperative control scheme, only one INN is used instead of three. The INN weight update law and the control algorithm are designed to make the closed-loop system cooperative tracking errors and the INN weight approximation error be UUB.

Nguyen Tan Luy received the B.S. and the M.Sc. degrees in the Department of Computer Science and Electrical and Electronics Engineering from HCM City University of Technology, Vietnam, in 1996 and in 2006, respectively. He received the Ph.D. degree from HCM City University of Technology, VNU-HCM, HCM City, Vietnam, in 2015. He is currently a Lecturer in the Faculty of Electronics Technology, Industrial University of HCM City, Vietnam. His current research interests include reinforcement

References (34)

  • K. Movric et al.

    Cooperative optimal control for multi-agent systems on directed graph topologies

    IEEE Trans. Autom. Control

    (2014)
  • T. Dierks, S. Jagannathan, Optimal control of affine nonlinear continuous-time systems using an online...
  • Y. Cao et al.

    Optimal linear-consensus algorithmsan LQR perspective

    IEEE Trans. Syst. Man Cybern. Cybern.

    (2010)
  • Z. Peng et al.

    Distributed neural network control for adaptive synchronization of uncertain dynamical multi-agent systems

    IEEE Trans. Neural Netw. Learn. Syst.

    (2014)
  • Z.G. Hou et al.

    Decentralized robust adaptive control for the multiagent system consensus problem using neural networks

    IEEE Trans. Syst., Man Cybern. Part B: Cybern.

    (2009)
  • A. Das et al.

    Cooperative adaptive control for synchronization of second-order systems with unknown nonlinearities

    Int. J. Robust Nonlinear Control

    (2011)
  • G. Chen et al.

    Cooperative tracking control of nonlinear multiagent systems using self-structuring neural networks

    IEEE Trans. Neural Netw. Learn. Syst.

    (2014)
  • Cited by (29)

    • Sponge Supercapacitor rule-based energy management strategy for wireless sensor nodes optimized by using dynamic programing algorithm

      2022, Energy
      Citation Excerpt :

      The DP algorithm is an optimization method for solving decision processes. As an offline algorithm that achieves global optimization at the expense of computing resources, DP has been widely used in fields such as operations research and swarm intelligence [46]. Researchers have used the DP algorithm to study the energy distribution in hybrid energy EVs recently [35,36].

    • Zero-sum game-based neuro-optimal control of modular robot manipulators with uncertain disturbance using critic only policy iteration

      2021, Neurocomputing
      Citation Excerpt :

      Some researches utilized differential game theory to address control issue of complicated nonlinear systems with disturbance [35]. Nguyen et al. [36] designed a class of NN-structures for nonlinear systems by using the differential game theory. Dong et al. [37] proposed zero-sum strategy-based decentralized robust control of MRMs in contact with uncertain environment.

    • Large-scale dynamic system optimization using dual decomposition method with approximate dynamic programming

      2021, Systems and Control Letters
      Citation Excerpt :

      Comparing with the literature of the concept, we study the optimization problem for dynamic agents, thus it is different from the literature of decomposition methods for static optimization problems [11,12,15,16]. Our proposed algorithm is also different from [22–24], since in these papers the coupling constraints among the agents have not been considered. The main contributions of our paper are summarized in the following:

    • Distributed finite-time fault-tolerant containment control for multiple ocean bottom flying nodes

      2020, Journal of the Franklin Institute
      Citation Excerpt :

      For multiple underactuated AUV systems, a distributed speed estimation strategy was proposed in [5]. Combining with dynamic surface control technology, an adaptive path following control strategy was put forward to follow the predefined path for each AUV, and the general disturbances were handled by using NNs [6–8]. When only the relative position information could be obtained, Ref. [9] solved the formation control problem under fixed and switching communication topologies.

    View all citing articles on Scopus

    Nguyen Tan Luy received the B.S. and the M.Sc. degrees in the Department of Computer Science and Electrical and Electronics Engineering from HCM City University of Technology, Vietnam, in 1996 and in 2006, respectively. He received the Ph.D. degree from HCM City University of Technology, VNU-HCM, HCM City, Vietnam, in 2015. He is currently a Lecturer in the Faculty of Electronics Technology, Industrial University of HCM City, Vietnam. His current research interests include reinforcement learning and adaptive dynamic programming, intelligent control, robotics, neural computing.

    View full text