Elsevier

Information Sciences

Volume 610, September 2022, Pages 401-424
Information Sciences

Optimal couple-group tracking control for the heterogeneous multi-agent systems with cooperative-competitive interactions via reinforcement learning method

https://doi.org/10.1016/j.ins.2022.07.181Get rights and content

Abstract

In this paper, we study a class of optimal couple-group tracking control (OCGTC) problems for heterogeneous multi-agent systems (HeMASs) based on reinforcement learning (RL) method, whose goal is to minimize the local tracking errors (states) and control inputs (actions) of followers by learning the dynamic knowledge of a single leader. The weakly connected multi-agent network is randomly divided into coupled sub-networks, and each agent in the same sub-network cooperates to accomplish tracking control such that the positions and velocities of all the agents converge to the same value, while the agents from different subgroups compete with each other to dissimilar tracking goals. In particular, in the discussed HeMASs, we consider agents with unknown dynamics of first-order and second-order. To solve the algebraic Riccati equation (ARE), an policy-value-based actor-critic technique is applied. Using the Lyapunov-like theorem, we verify that the local tracking error and the estimated weights of actor-critic neural networks are deduced to be uniformly ultimately bounded. Eventually, several simulations demonstrate the correctness of the retrieved theoretical results.

Introduction

Group consensus, as one of the distributed coordination problems, has a wide range of applications in autonomous vehicle technology [1], group decision-making [2] and so on. Group consensus is regarded as multiple consensus states which are achieved asymptotically by different subgroups when the multi-agent systems (MASs) are partitioned into several subgroups. Being an extended consensus problem, the group consensus problem has attracted a great amount of attention from many researchers working on biology, mathematics, physics, computer science, and so on.

Over the past decade, the group consensus problem has been greatly developed. But most of the existing works about group consensus are based on homogeneous systems, in which all agents own the same dynamic behavior, such as in [3], [4], [5]. In practice, it is a more frequent situation to have different dynamics among agents on account of some various limitations and prospective goals achieved by utilizing diverse agents with different dynamics. As a result, an increasing number of researchers turn their attention to heterogeneous systems. Relying on the assumption of the estimated velocity for the first-order agent, the work [6] investigates the group consensus of the heterogeneous MASs (HeMASs). Recently, diversified works referring to the group consensus have been discussed completely, such as, group consensus with pinning control [7], clustered event-triggered consensus [8], the consensus of HeMASs with fixed and switching topologies [9], clustered consensus with inter-cluster non-identical inputs [10], [11], group consensus with input constraints [12], which fully investigate the group consensus problem for MASs.

The above studies of group consensus are mainly based on the consensus control protocols designed for the input controllers to get the corresponding parameter constraints so as to ensure that the states of all agents in the same subgroup can be convergent. However, most classical works on group consensus cooperative control of MASs, such as [7], [12], give only system stability analysis under an exact system model and lack technical solutions to optimize the system cost. Nowadays, with the growing demand of engineering practice and the rapid development of computing and communication technologies, the consensus control and optimization of MASs can gain wide attention [13], [14], [15]. [13] proposes a novel distributed model reference adaptive control (D-MRAC) scheme to enhance robustness and improve transient performance for distributed optimization problem of MASs. [14] investigates a distributed optimization problem of double-integrator multiagent systems with unmatched constant disturbances. In [15], an optimal consensus problem for a set of integrator systems with dynamic uncertainties is studied, which can reach the optimal solution of the team objective function. These works highlight the advanced aspects of optimal consensus control techniques for MASs in realistic application scenarios. Therefore, the study of data-driven distributed cooperative control and optimization can not only realize the group consensus control of HeMASs in the case of unknown system knowledge, but also take into account the optimal system performance. Inspired by those motivations, there is a growing number of research reports on optimal consensus, and it has become a research focus of interest for MASs [16], [17]. One of the core problems of optimal control is how to solve the Hamilton–Jacobi-Bellman (HJB) equation or the algebraic Riccati equation (ARE) which is pretty difficult to find analytically [18]. Reinforcement learning (RL) is considered to be an effective core method to deal with the dimension explosion problem caused by the nonlinear system in solving HJB equation [19], [20]. In recent years, optimal coordination control problems have been studied by RL methods [21], [22], [23], [24]. [21] investigates the infinite-horizon robust optimal control problem for a class of continuous-time uncertain nonlinear systems by using data-based adaptive critic designs. In [22], integral reinforcement learning techniques are used to deal with the optimal tracking consensus problem for a class of nonlinear continuous systems with time lags, external disturbance mismatches, and input constraints. [23] develops an optimized tracking control using fuzzy logic system-based RL for a class of unknown nonlinear dynamic systems under canonical form. [24] studies finite-horizon optimal consensus control problem for unknown multiagent systems with state delays by off-policy RL. With the wide application of RL and optimal control in distributed control systems, adaptive dynamic planning (ADP) methods have been developed substantially. The ADP approaches’ advantages in optimization and control are significant for practical engineering systems, such as flight control [25], reactive power control system [26], multi-agents system for anomaly detection [27], and so forth. From the reference [19], ADP methods can be categorized into the following main branches: heuristic dynamic programming (HDP), dual HDP (DHP), globalized dual HDP (GDHP), action-dependent HDP (ADHDP), and action-dependent dual HDP (ADDHP). Lewis, Abu-Khalaf, H. Zhang, D. Liu et al. further developed these approaches, and their works had a widespread impact on the ADP-based optimal consensus control problems [28], [29], [30], [31].

Recently, the works [32], [33], [34], [35] have focused on solving the optimal control problems by utilizing ADP algorithms. In [32], a novel ADP-based optimal tracking control (OTC) scheme is established for an unmanned surface vehicle (USV) in the presence of dead-zone input nonlinearities, system dynamics, and disturbances. [33] presents an ADP approach for OTC of switched systems with application to a grid-tied hybrid generation system. In [34], an event-triggered OTC of discrete-time multi-agent systems is addressed by using the ADP method, which reduces the computational burden and transmission load. [35] investigates the design of an OTC problem for a class of nonlinear continuous-time systems with time-delay, mismatched external disturbances and input constraints. These works have further developed the ADP-based OTC problem in a significant way. However, the research on ADP-based OTC of HeMASs has rarely addressed the issue of the group consensus problems which are generally studied by means of frequency domain analysis [5], [6]. The technical difficulty lies in how ADP methods coordinate the information interactions of agents between different subgroups. Moreover, given that the relation among agents can be cooperative, competitive, even cooperative-competitive [36] in reality, we combine group consensus results with cooperative-competitive interactions, which can be more widely applied to group decision-making [2], target tracking [37], etc. This will greatly enrich and develop the practical scenarios of OTC.

Inspired by the related studies above, the optimal couple-group tracking control (OCGTC) problem for a class of HeMASs based on leader–follower is addressed by a policy-value-based actor-critic algorithm. We consider the principle in which agents cooperate in the same subgroup and compete in the different subgroups. The main contributions are summarized as follows:

  • (1) We extend the works [31], [32], [33], [34], [35] on the distributed OTC problem to propose an ADP-based OCGTC technique. In particular, the cooperative-competitive relationships among agents in weakly connected network are incorporated into OCGTC, achieving multiple consensus control goals. Moreover, complete knowledge of the system dynamics, which may not be available, is not required.

  • (2) We introduce an estimated velocity to eliminate the dimension-difference issue caused by the heterogeneous agents, which develops the works [5], [24] for the same-order dynamics only. Then to find the approximate solution to ARE, we propose a policy-value-based distributed actor-critic algorithm. This adaptive learning technique can coordinate followers in different subgroups to reach a collaborative tracking control with the leader.

  • (3) Consensus analysis of the HeMASs associated with global tracking error is given based on ARE. Furthermore, a boundary criterion, which enables that the local tracking error and the estimated weight errors of actor-critic neural networks (NNs) are uniformly ultimately bounded (UUB), is calculated by Lyapunov-like theorem.

The rest of the paper is organized as follows. Section 2 introduces mainly some algebraic graph theory and group consensus problem. In Section 3, the optimal group consensus problem for two cases of both homogeneous and heterogeneous subgroups is presented, and the stability of global tracking systems is analyzed. Section 4 provides the actor-critic algorithm to approximate the value functions and the optimal policies by gradient descent method. In Section 5, three examples testify the effectiveness of the proposed method, and Section 6 draws some conclusions.

Notations: Rn and Rn×m represent the set of n-dim real vectors and the set of n×m real matrix, respectively, and denote the Kronecker product and Hadamard product, tr{M} denotes the trace of matrix M,δi represents a piecewise function with i{0,1,,N},· denotes Euclidean norm especially, card(S) represents the number of elements in the set S,Re(λi) denotes the real part of the i-th eigenvalue.

Section snippets

Preliminaries and problem formulation

In this section, some basic conceptions in algebraic graph theory for HeMASs are stated, and then the group consensus problem is formulated.

OCGTC

In this section, based on the HeMASs (1), (2), the stability of the global tracking error system is proved.

Distributed actor-critic adaptive algorithm

In this section, according to the iterative local tracking function (11) and optimal control input (15), the learning algorithm (see Algorithm 1) based on discrete-time HeMASs (5) with cooperative-competitive interaction is presented to deal with the optimal couple-group consensus problem. In order to achieve the above goal, the critic NNs as estimated optimal value are used to approximate the performance indexes, and the actor NNs as estimated optimal policy are to perform the control inputs.

Simulations

In this section, three numerical examples will be introduced from the perspective of the case of homogeneous subgroups, the case of heterogeneous subgroups and the cooperative-competitive strength, which illustrate the effectiveness of OCGTC based on HeMASs with cooperative-competitive interaction.

Conclusion

We investigate the optimal couple-group consensus control problem for unknown HeMASs with cooperative-competitive interaction. To solve the dimension-difference problem among heterogeneous agents, the estimated velocity is employed for first-order agents. In addition, a novel cooperative-competitive strength matrix is proposed to construct cooperative-competitive relationships among agents. In order to learn the solution to the ARE, a policy-value-based actor-critic algorithm for the HeMASs is

CRediT authorship contribution statement

Jun Li: Conceptualization, Methodology, Software, Formal analysis, Validation, Investigation, Data curation, Writing – original draft, Writing – review & editing. Lianghao Ji: Conceptualization, Methodology, Formal analysis, Writing – review & editing, Supervision, Project administration, Funding acquisition. Cuijuan Zhang: Conceptualization, Methodology, Writing – review & editing. Huaqing Li: Investigation, Methodology, Data curation, Conceptualization, Investigation, Resources.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61876200, in part by the Major Scientific and Technological Research Program of Chongqing Municipal Education Commission under Grant KJZD-M202100602, in part by the Natural Science Foundation Project of Chongqing Science and Technology Commission under Grant cstc2018jcyjAX0112, in part by the Postgraduate Scientific Research and Innovation Project of Chongqing under Grant CYS20254, and in part by the

References (47)

  • Y. Jiang et al.

    Couple-group consensus for discrete-time heterogeneous multiagent systems with cooperative-competitive interactions and time delays

    Neurocomputing

    (2018)
  • Z. Peng et al.

    Data-driven optimal tracking control of discrete-time multi-agent systems with two-stage policy iteration algorithm

    Inf. Sci.

    (2019)
  • G. Wen et al.

    Dynamical group consensus of heterogeneous multi-agent systems with input time delays

    Neurocomputing

    (2016)
  • Q. Wang et al.

    A probabilistic framework for tracking the formation and evolution of multi-vehicle groups in public traffic in the presence of observation uncertainties

    IEEE Trans. Intell. Transp. Syst.

    (2018)
  • I. Pirez et al.

    A new consensus model for group decision making problems with nonhomogeneous experts

    IEEE Trans. Syst., Man, Cybern. Syst.

    (2014)
  • Y. Feng et al.

    Group consensus control for double-integrator dynamic multiagent systems with fixed communication topology

    Int. J. Robust Nonlinear Control

    (2014)
  • M. Zhao et al.

    Event-triggered communication for leader-following consensus of second-order multiagent systems

    IEEE Trans. Cybern.

    (2018)
  • J. Qin et al.

    On group synchronization for interacting clusters of heterogeneous systems

    IEEE Trans. Cybern.

    (2017)
  • W. Xu et al.

    Clustered event-triggered consensus analysis: an impulsive framework

    IEEE Trans. Ind. Electron.

    (2016)
  • G. Wen et al.

    Group consensus control for heterogeneous multi-agent systems with fixed and switching topologies

    Int. J. Control

    (2016)
  • Y. Han et al.

    Cluster consensus in discrete-time networks of multiagents with inter-cluster nonidentical inputs

    IEEE Trans. Neural Netw. Learn. Syst.

    (2013)
  • Y. Han et al.

    Achieving cluster consensus in continuous time networks of multi-agents with inter-cluster non-identical inputs

    IEEE Trans. Autom. Control

    (2015)
  • G. Guo et al.

    Distributed model reference adaptive optimization of disturbed multiagent systems with intermittent communications

    IEEE Trans. Cybern.

    (2020)
  • Cited by (8)

    View all citing articles on Scopus
    View full text