Completely model-free RL-based consensus of continuous-time multi-agent systems

doi:10.1016/j.amc.2020.125312

Applied Mathematics and Computation

Volume 382, 1 October 2020, 125312

https://doi.org/10.1016/j.amc.2020.125312 Get rights and content

Abstract

In this paper, we study the consensus of continuous-time general linear multi-agent systems in the absence of the model information by using the adaptive dynamic programming (ADP) based reinforcement learning (RL) approach. The introduction of the RL approach is to learn the feedback gain matrix to fulfill the construction of the control algorithm to guarantee the reach of consensus only on the basis of the available information. For the state feedback control, the RL algorithm relates only to the state and the input of an arbitrary agent, while for the output feedback control, the RL algorithm depends only on the input and output information of an arbitrary agent, irrelevant any model information. Finally, numerical simulations are given to verify the main results.

Introduction

Since the start of this century, the cooperative control of multi-agent systems has absorbed considerable attention. The investigations on this field were carried out from the perspectives of communication topology [1], [2], [3], [4], the dynamics [5], [6], [7], [8], [9], the effectiveness of the time-delay [10] or the disturbances [11], [12] or external uncertainty [13] and so on. Consensus, as the most typical cooperative behavior, means that each agent in the multi-agent system reaches an agreement and then naturally receives a lot of attention [1], [3], [6], [7], [14]. The existing investigations on consensus include the leader-less consensus [6], [13], [15], the leader-following consensus [12], [16], [17], [18], [19] and containment of multi-agent systems with multiple leaders [20], [21], [22], [23]. For the investigations on the consensus of multi-agent systems, the construction of the feedback gain matrix in the control algorithm plays a crucial role which can influence or even determine the stability of consensus of systems. In [5] and [6], algebraic Riccati equation (ARE) is introduced to the construction of the feedback gain matrix, then the cooperative behavior can be achieved by only tuning the coupling strength constants. In [13], [15], the authors used the parameterized ARE based low-gain technique to solve the consensus of multi-agent systems subject to input saturation. Notice that the construction of the feedback gain matrix in the literature mentioned above related to the model information or named as the model matrices [5], [6], [13], [14], [15]. However, in practical application, the model information in the multi-agent system may not be acquired. How can we solve the consensus problem of the model-free multi-agent systems? In this paper, we will solve this problem by using the RL approach.

Recently, the RL approach is widely used in the cooperative control of multi-agent systems, which can learn and make decisions through interacting with the given environment. Roughly speaking, the RL based investigations on the consensus of multi-agent systems can be divided into two categories: the investigations on discrete-time multi-agent systems [21], [24], [25], [26] and those on continuous-time systems [19], [27]. Compared with the cooperative control of multi-agent systems without RL approach mentioned above, the most remarkable point is the RL results can deal with the coordination control of model-free multi-agent systems. In other words, with the help of the RL approach, we can design control protocol to drive the systems to achieve consensus without model information. As declared in [28], the model-free RL based consensus for discrete-time individual systems [29] is much more difficult than that of the continuous-time individual systems [29], [30] and this fact is also fit for multi-agent systems. The optimal consensus tracking and containment of discrete-time multi-agent systems were investigated in [21], [24] respectively by using the RL method. Also for discrete-time multi-agent systems, [25] solved the global consensus of multi-agent systems subject to input saturation. All the discrete-time investigations in [21], [24], [25], [26] were model-free and all the information related in the construction of the control algorithm was in the discrete-time format. References [29], [30] switched to the RL based consensus of continuous-time multi-agent systems without model information. However, the RL algorithms constructed in these two references were not in continuous-time essentially but provided by discretizing the model.

In this paper, we refocus on the RL based consensus of the continuous-time multi-agent systems with completely unknown model information. The most notable point is that all the theoretical analysis relates to schemes completely in continuous-time. The investigation includes the state feedback control and the output feedback one. For the state feedback control, the model-free ADP based RL method is used to learn the feedback gain matrix instead of solving a parameterized algebraic Riccati equation (ARE), since the ARE closely depends on the model information. The learning of the feedback gain matrix relates only to state and control input of arbitrary one agent of the multi-agent system and circumvents to the information communication among the multi-agent systems, without the requirement of the model information. Taking the fact that it is hard or even impossible to get the state of arbitrary one agent, we also investigate the RL based consensus of model-free multi-agent systems by using the output feedback control. The ADP based RL algorithm is design to learn the feedback gain matrix on the basis of only the output information and the control input of arbitrary one agent, irrelevant to any model information. In analogy to the operation in [28], we firstly express the estimated information of each agent in term of the output information and the control input by doing Laplace transformation and Laplace inverse transformation to each agent. Then, ADP based RL algorithm is given to learn the feedback gain matrix. It is the introduction of the motivated operation in [28], the objective of the consensus on continuous-time multi-agent systems without model information can be achieved, rather than the discretization used in [29], [30].

The rest of the paper is arranged as follows. In Section 2, we give preliminaries about graph theory and state the main problem of this paper. In Section 3, the RL based consensus will be solved via state feedback control, while in Section 4, the RL based consensus will be solved via output feedback control. Section 5 provides numerical simulations to verify the main results obtained in Sections 3 and 4. At last, Section 6 concludes the whole paper.

Nomenclature

$R$ and $R^{n \times m}$ denote the set of real numbers and the set of n × m real matrices, respectively. $Z_{+}$ is the set of positive integers $P^{n \times n}$ denotes the normed space of all $n -$ by $- n$ real symmetric matrices, equipped with the induced matrix norm. $P_{+}^{n \times n} = {P \in P^{n \times n} : P ⪰ 0}$ . For any $A \in R^{n \times n},$ $vec (A) = (a_{1}^{T} a_{2}^{T} \dots a_{n}^{T}) \in R^{n^{2} \times 1},$ with $a_{i} \in R^{n \times 1}$ being the ith column vector of A. For any $A \in P_{+}^{n \times n},$ $vecs (A) = {(a_{11} a_{12} \dots a_{1 n} a_{22} a_{23} \dots a_{2 n} \dots a_{(n - 1) n} a_{n n})}^{T} \in R^{\frac{n (n + 1)}{2} \times 1} .$ ⊗ represents the Kronecker product of two matrices (or vectors) with compatible dimension.

Section snippets

Preliminaries

We first introduce the undirected network $G$ on which the communication among the multi-agents take place. Symbolically, we use the triple $(V, E, W)$ to denote the important elements of $G,$ where $V = {1, 2, \dots, N}$ and $E = {(i, j) |$ if there exists an edge between node i and node j} are the node set and the edge set, respectively, and $W = (w_{i j}) \in R^{N \times N}$ is the adjacency matrix with $w_{i j} = {\begin{matrix} 1, if (i, j) \in E, \\ 0, otherwise . \end{matrix}$ Let $N (i)$ be the neighborhood set of agent i, then the Laplacian matrix of $G$ is $L = D - W,$ where $D = diag (d_{1}, d_{2}, \dots, d_{N}$

State feedback control algorithm

In case of the state feedback control, the control algorithm is designed as $u_{i} = B K \sum_{j \in N (i)} w_{i j} (x_{j} - x_{i}),$ where the feedback gain matrix is $K = B^{T} P,$ with P≻0 being the unique solution of the parameterized ARE $A^{T} P + P A - μ_{0} P B B^{T} P + Q = 0, μ_{0} \leq 2 λ_{2} .$ Then, we have Lemma 2.

Lemma 2

For system (1) interacting on a connected graph $G,$ if the control algorithm u_i is designed in (3), then it can achieve consensus.

Proof

The proof is similar to that of Theorem 1 in [5], then here it is omitted. □

Notice that the design of u_i in (3) is not

Output feedback control algorithm

The previous results are provided on the basis of the state information of the agents. However, in the practical engineering, what we can get is the observer based output information, but not the state information. Beyond the output information, the control input is also known. In what follows, we will devote ourselves to solve the consensus of system (1) on the basis of the input and output information by using the model-free RL technique.

If system (1) satisfies Assumption 1, associated with

Simulation

In this section, we provide numerical simulations to verify the main results.

We consider a multi-agent system consist of $N = 6$ agents, which interacts on a connected network $G$ with Laplacian matrix $L = (\begin{matrix} 8 & - 5 & 0 & - 2 & 0 & - 1 \\ - 5 & 6 & - 1 & 0 & 0 & 0 \\ 0 & - 1 & 2 & 0 & 0 & - 1 \\ - 2 & 0 & 0 & 7 & - 5 & 0 \\ 0 & 0 & 0 & - 5 & 6 & - 1 \\ - 1 & 0 & - 1 & 0 & - 1 & 3 \end{matrix}) .$ Here, we choose $μ_{0} = 1.5$ . What is more, we choose $B_{q} = {P \in P_{+}^{3} : ∥ P ∥ \leq 10 (q + 1)}, q = 0, 1, \dots,$ and $θ_{k} = {(k + 1)}^{- 1}$ .

Conclusion

This paper studies the consensus of general linear multi-agent systems without the model information from the perspective of the ADP based RL. For both cases of the state feedback control and the output feedback control, ADP based RL algorithms are provided to obtain the feedback gain matrices to drive the system to reach consensus, without the requirement of the model information. The observer used in this paper is the local observer of the multi-agent system. However, the neighborhood

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant nos. 61803209, 61991412 and 61873318, in part by the Natural Science Foundation of Jiangsu Province under Grant no. BK20180752, in part by the University Science Research Project of Jiangsu Province under Grant nos. 18KJB120006 and 18KJB510033, in part by the Scientific Foundation of Nanjing University of Posts and Telecommunications (NUPTSF) under Grant no. NY218121, in part by the Frontier Research

References (33)

X. Wang et al.
Self-triggered leader-following consensus of multi-agent systems with input time delay
Neurocomputing
(2019)
X. Wang et al.
Consensus of hybrid multi-agent systems by event-triggered/self-triggered strategy
Appl. Math. Comput.
(2019)
K.B. Shi et al.
Some novel approaches on state estimation of delayed neural networks
Inf. Sci.
(2016)
J. Wang et al.
The synchronization of instantaneously coupled harmonic oscillators using sampled data with measurement noise
Automatica
(2016)
Y. Liu et al.
Some necessary and sufficient conditions for containment of second-order multi-agent systems with sampled position data
Neurocomputing
(2020)
W. Wang et al.
Model-free optimal containment control of multi-agent systems based on actor-critic framework
Neurocomputing
(2018)
Z. Peng et al.
A novel optimal bipartite consensus control scheme for unknown multi-agent systems via model-free reinforcement learning
Appl. Math. Comput.
(2020)
T. Bian et al.
Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design
Automatica
(2016)
G. Wen et al.
Consensus tracking of multi-agent systems with Lipschitz-type node dynamics and switching topologies
IEEE Trans. Circt. Syst. I Regul. Pap.
(2014)
H. Su et al.
A stochastic sampling mechanism for time-varying formation of multiagent systems with multiple leaders and communication delays
IEEE Trans. Neural Netw. Learn. Syst.
(2019)

H.-X. Hu et al.

Robust distributed stabilization of heterogeneous agents over cooperation-competition networks

IEEE Trans. Circt. Syst. II Express Briefs

(2019)

H. Zhang et al.

Optimal design for synchronization of cooperative systems: state feedback, observer and output feedback

IEEE Trans. Autom. Control

(2011)

H. Su et al.

Semi-global observer-based nonnegative edge consensus of networked systems with actuator saturation

IEEE Trans. Cybern.

(2019)

Y. Liu et al.

Containment control of second-order multi-agent systems via intermittent sampled position data communication

Appl. Math. Comput.

(2019)

M. Long et al.

Second-order controllability of two-time-scale multi-agent systems

Appl. Math. Comput.

(2019)

H. Su et al.

Semi-global output consensus for discrete-time switching networked systems subject to input saturation and external disturbances

IEEE Trans. Cybern.

(2019)

Cited by (0)

View full text

Completely model-free RL-based consensus of continuous-time multi-agent systems

Abstract

Introduction

Section snippets

Preliminaries

State feedback control algorithm

Output feedback control algorithm

Simulation

Conclusion

Acknowledgements

Neurocomputing

Appl. Math. Comput.

Inf. Sci.

Automatica

Neurocomputing

Neurocomputing

Appl. Math. Comput.

Automatica

Consensus tracking of multi-agent systems with Lipschitz-type node dynamics and switching topologies

IEEE Trans. Circt. Syst. I Regul. Pap.

A stochastic sampling mechanism for time-varying formation of multiagent systems with multiple leaders and communication delays

IEEE Trans. Neural Netw. Learn. Syst.

Robust distributed stabilization of heterogeneous agents over cooperation-competition networks

IEEE Trans. Circt. Syst. II Express Briefs

Optimal design for synchronization of cooperative systems: state feedback, observer and output feedback

IEEE Trans. Autom. Control

Semi-global observer-based nonnegative edge consensus of networked systems with actuator saturation

IEEE Trans. Cybern.

Containment control of second-order multi-agent systems via intermittent sampled position data communication

Appl. Math. Comput.

Second-order controllability of two-time-scale multi-agent systems

Appl. Math. Comput.

Semi-global output consensus for discrete-time switching networked systems subject to input saturation and external disturbances

IEEE Trans. Cybern.