Optimal couple-group tracking control for the heterogeneous multi-agent systems with cooperative-competitive interactions via reinforcement learning method
Introduction
Group consensus, as one of the distributed coordination problems, has a wide range of applications in autonomous vehicle technology [1], group decision-making [2] and so on. Group consensus is regarded as multiple consensus states which are achieved asymptotically by different subgroups when the multi-agent systems (MASs) are partitioned into several subgroups. Being an extended consensus problem, the group consensus problem has attracted a great amount of attention from many researchers working on biology, mathematics, physics, computer science, and so on.
Over the past decade, the group consensus problem has been greatly developed. But most of the existing works about group consensus are based on homogeneous systems, in which all agents own the same dynamic behavior, such as in [3], [4], [5]. In practice, it is a more frequent situation to have different dynamics among agents on account of some various limitations and prospective goals achieved by utilizing diverse agents with different dynamics. As a result, an increasing number of researchers turn their attention to heterogeneous systems. Relying on the assumption of the estimated velocity for the first-order agent, the work [6] investigates the group consensus of the heterogeneous MASs (HeMASs). Recently, diversified works referring to the group consensus have been discussed completely, such as, group consensus with pinning control [7], clustered event-triggered consensus [8], the consensus of HeMASs with fixed and switching topologies [9], clustered consensus with inter-cluster non-identical inputs [10], [11], group consensus with input constraints [12], which fully investigate the group consensus problem for MASs.
The above studies of group consensus are mainly based on the consensus control protocols designed for the input controllers to get the corresponding parameter constraints so as to ensure that the states of all agents in the same subgroup can be convergent. However, most classical works on group consensus cooperative control of MASs, such as [7], [12], give only system stability analysis under an exact system model and lack technical solutions to optimize the system cost. Nowadays, with the growing demand of engineering practice and the rapid development of computing and communication technologies, the consensus control and optimization of MASs can gain wide attention [13], [14], [15]. [13] proposes a novel distributed model reference adaptive control (D-MRAC) scheme to enhance robustness and improve transient performance for distributed optimization problem of MASs. [14] investigates a distributed optimization problem of double-integrator multiagent systems with unmatched constant disturbances. In [15], an optimal consensus problem for a set of integrator systems with dynamic uncertainties is studied, which can reach the optimal solution of the team objective function. These works highlight the advanced aspects of optimal consensus control techniques for MASs in realistic application scenarios. Therefore, the study of data-driven distributed cooperative control and optimization can not only realize the group consensus control of HeMASs in the case of unknown system knowledge, but also take into account the optimal system performance. Inspired by those motivations, there is a growing number of research reports on optimal consensus, and it has become a research focus of interest for MASs [16], [17]. One of the core problems of optimal control is how to solve the Hamilton–Jacobi-Bellman (HJB) equation or the algebraic Riccati equation (ARE) which is pretty difficult to find analytically [18]. Reinforcement learning (RL) is considered to be an effective core method to deal with the dimension explosion problem caused by the nonlinear system in solving HJB equation [19], [20]. In recent years, optimal coordination control problems have been studied by RL methods [21], [22], [23], [24]. [21] investigates the infinite-horizon robust optimal control problem for a class of continuous-time uncertain nonlinear systems by using data-based adaptive critic designs. In [22], integral reinforcement learning techniques are used to deal with the optimal tracking consensus problem for a class of nonlinear continuous systems with time lags, external disturbance mismatches, and input constraints. [23] develops an optimized tracking control using fuzzy logic system-based RL for a class of unknown nonlinear dynamic systems under canonical form. [24] studies finite-horizon optimal consensus control problem for unknown multiagent systems with state delays by off-policy RL. With the wide application of RL and optimal control in distributed control systems, adaptive dynamic planning (ADP) methods have been developed substantially. The ADP approaches’ advantages in optimization and control are significant for practical engineering systems, such as flight control [25], reactive power control system [26], multi-agents system for anomaly detection [27], and so forth. From the reference [19], ADP methods can be categorized into the following main branches: heuristic dynamic programming (HDP), dual HDP (DHP), globalized dual HDP (GDHP), action-dependent HDP (ADHDP), and action-dependent dual HDP (ADDHP). Lewis, Abu-Khalaf, H. Zhang, D. Liu et al. further developed these approaches, and their works had a widespread impact on the ADP-based optimal consensus control problems [28], [29], [30], [31].
Recently, the works [32], [33], [34], [35] have focused on solving the optimal control problems by utilizing ADP algorithms. In [32], a novel ADP-based optimal tracking control (OTC) scheme is established for an unmanned surface vehicle (USV) in the presence of dead-zone input nonlinearities, system dynamics, and disturbances. [33] presents an ADP approach for OTC of switched systems with application to a grid-tied hybrid generation system. In [34], an event-triggered OTC of discrete-time multi-agent systems is addressed by using the ADP method, which reduces the computational burden and transmission load. [35] investigates the design of an OTC problem for a class of nonlinear continuous-time systems with time-delay, mismatched external disturbances and input constraints. These works have further developed the ADP-based OTC problem in a significant way. However, the research on ADP-based OTC of HeMASs has rarely addressed the issue of the group consensus problems which are generally studied by means of frequency domain analysis [5], [6]. The technical difficulty lies in how ADP methods coordinate the information interactions of agents between different subgroups. Moreover, given that the relation among agents can be cooperative, competitive, even cooperative-competitive [36] in reality, we combine group consensus results with cooperative-competitive interactions, which can be more widely applied to group decision-making [2], target tracking [37], etc. This will greatly enrich and develop the practical scenarios of OTC.
Inspired by the related studies above, the optimal couple-group tracking control (OCGTC) problem for a class of HeMASs based on leader–follower is addressed by a policy-value-based actor-critic algorithm. We consider the principle in which agents cooperate in the same subgroup and compete in the different subgroups. The main contributions are summarized as follows:
(1) We extend the works [31], [32], [33], [34], [35] on the distributed OTC problem to propose an ADP-based OCGTC technique. In particular, the cooperative-competitive relationships among agents in weakly connected network are incorporated into OCGTC, achieving multiple consensus control goals. Moreover, complete knowledge of the system dynamics, which may not be available, is not required.
(2) We introduce an estimated velocity to eliminate the dimension-difference issue caused by the heterogeneous agents, which develops the works [5], [24] for the same-order dynamics only. Then to find the approximate solution to ARE, we propose a policy-value-based distributed actor-critic algorithm. This adaptive learning technique can coordinate followers in different subgroups to reach a collaborative tracking control with the leader.
(3) Consensus analysis of the HeMASs associated with global tracking error is given based on ARE. Furthermore, a boundary criterion, which enables that the local tracking error and the estimated weight errors of actor-critic neural networks (NNs) are uniformly ultimately bounded (UUB), is calculated by Lyapunov-like theorem.
The rest of the paper is organized as follows. Section 2 introduces mainly some algebraic graph theory and group consensus problem. In Section 3, the optimal group consensus problem for two cases of both homogeneous and heterogeneous subgroups is presented, and the stability of global tracking systems is analyzed. Section 4 provides the actor-critic algorithm to approximate the value functions and the optimal policies by gradient descent method. In Section 5, three examples testify the effectiveness of the proposed method, and Section 6 draws some conclusions.
Notations: and represent the set of n-dim real vectors and the set of real matrix, respectively, and denote the Kronecker product and Hadamard product, denotes the trace of matrix represents a piecewise function with denotes Euclidean norm especially, represents the number of elements in the set denotes the real part of the eigenvalue.
Section snippets
Preliminaries and problem formulation
In this section, some basic conceptions in algebraic graph theory for HeMASs are stated, and then the group consensus problem is formulated.
OCGTC
In this section, based on the HeMASs (1), (2), the stability of the global tracking error system is proved.
Distributed actor-critic adaptive algorithm
In this section, according to the iterative local tracking function (11) and optimal control input (15), the learning algorithm (see Algorithm 1) based on discrete-time HeMASs (5) with cooperative-competitive interaction is presented to deal with the optimal couple-group consensus problem. In order to achieve the above goal, the critic NNs as estimated optimal value are used to approximate the performance indexes, and the actor NNs as estimated optimal policy are to perform the control inputs.
Simulations
In this section, three numerical examples will be introduced from the perspective of the case of homogeneous subgroups, the case of heterogeneous subgroups and the cooperative-competitive strength, which illustrate the effectiveness of OCGTC based on HeMASs with cooperative-competitive interaction.
Conclusion
We investigate the optimal couple-group consensus control problem for unknown HeMASs with cooperative-competitive interaction. To solve the dimension-difference problem among heterogeneous agents, the estimated velocity is employed for first-order agents. In addition, a novel cooperative-competitive strength matrix is proposed to construct cooperative-competitive relationships among agents. In order to learn the solution to the ARE, a policy-value-based actor-critic algorithm for the HeMASs is
CRediT authorship contribution statement
Jun Li: Conceptualization, Methodology, Software, Formal analysis, Validation, Investigation, Data curation, Writing – original draft, Writing – review & editing. Lianghao Ji: Conceptualization, Methodology, Formal analysis, Writing – review & editing, Supervision, Project administration, Funding acquisition. Cuijuan Zhang: Conceptualization, Methodology, Writing – review & editing. Huaqing Li: Investigation, Methodology, Data curation, Conceptualization, Investigation, Resources.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 61876200, in part by the Major Scientific and Technological Research Program of Chongqing Municipal Education Commission under Grant KJZD-M202100602, in part by the Natural Science Foundation Project of Chongqing Science and Technology Commission under Grant cstc2018jcyjAX0112, in part by the Postgraduate Scientific Research and Innovation Project of Chongqing under Grant CYS20254, and in part by the
References (47)
- et al.
Group consensus in multi-agent systems with switching topologies and communication delays
Syst. Control Lett.
(2010) - et al.
On pinning group consensus for dynamical multiagent networks with general connected topology
Neurocomputing
(2014) - et al.
Group consensus via pinning control for a class of heterogeneous multi-agent systems with input constraints
Inf. Sci.
(2021) - et al.
Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations
Automatica
(2011) - et al.
Optimal tracking control based on reinforcement learning value iteration algorithm for time-delayed nonlinear systems with external disturbances and input constraints
Inf. Sci.
(2021) - et al.
Optimized tracking control based on reinforcement learning for a class of high-order unknown nonlinear dynamic systems
Inf. Sci.
(2022) - et al.
Reinforcement learning multi-agent system for faults diagnosis of mircoservices in industrial settings
Comput. Commun.
(2021) - et al.
Multi-agent discrete-time graphical games and reinforcement learning solutions
Automatica
(2014) - et al.
Optimal control of unknown non-affine nonlinear discrete-time systems based on adaptive dynamic programming
Automatica
(2012) - et al.
Optimal tracking control based on reinforcement learning value iteration algorithm for time-delayed nonlinear systems with external disturbances and input constraints
Inf. Sci.
(2021)
Couple-group consensus for discrete-time heterogeneous multiagent systems with cooperative-competitive interactions and time delays
Neurocomputing
Data-driven optimal tracking control of discrete-time multi-agent systems with two-stage policy iteration algorithm
Inf. Sci.
Dynamical group consensus of heterogeneous multi-agent systems with input time delays
Neurocomputing
A probabilistic framework for tracking the formation and evolution of multi-vehicle groups in public traffic in the presence of observation uncertainties
IEEE Trans. Intell. Transp. Syst.
A new consensus model for group decision making problems with nonhomogeneous experts
IEEE Trans. Syst., Man, Cybern. Syst.
Group consensus control for double-integrator dynamic multiagent systems with fixed communication topology
Int. J. Robust Nonlinear Control
Event-triggered communication for leader-following consensus of second-order multiagent systems
IEEE Trans. Cybern.
On group synchronization for interacting clusters of heterogeneous systems
IEEE Trans. Cybern.
Clustered event-triggered consensus analysis: an impulsive framework
IEEE Trans. Ind. Electron.
Group consensus control for heterogeneous multi-agent systems with fixed and switching topologies
Int. J. Control
Cluster consensus in discrete-time networks of multiagents with inter-cluster nonidentical inputs
IEEE Trans. Neural Netw. Learn. Syst.
Achieving cluster consensus in continuous time networks of multi-agents with inter-cluster non-identical inputs
IEEE Trans. Autom. Control
Distributed model reference adaptive optimization of disturbed multiagent systems with intermittent communications
IEEE Trans. Cybern.
Cited by (8)
Single-state distributed k-winners-take-all neural network model
2023, Information SciencesRESO-based distributed bipartite tracking control for stochastic MASs with actuator nonlinearity
2023, Information SciencesReinforcement learning and adaptive/approximate dynamic programming: A survey from theory to applications in multi-agent systems
2023, Kongzhi yu Juece/Control and Decision