A two-layer networked learning control system using actor–critic neural network

https://doi.org/10.1016/j.amc.2008.05.062Get rights and content

Abstract

Aiming at the fact that the controlled objects are becoming more and more complex, while the existing control strategies cannot fulfill the needs for higher quality control performance due to the limitations of the widely used single-layer networked control system architecture, a two-layer networked learning control system architecture is proposed. Under this architecture, independent local controllers are interconnected to form the lower layer, while a learning agent which communicates with the independent local controllers in the lower layer forms the upper layer. To implement such a system, a discard-packet strategy is firstly developed to deal with network-induced delay, data packet out-of-order and data packet loss. The cubic spline interpolator is then employed to compensate the lost data. Finally, the output of the learning agent using actor–critic neural network is used to dynamically tune the control signal of local controller. Control simulations of different ways for a nonlinear heating, ventilation and air-conditioning (HVAC) system are compared. Simulation results show that this new architecture can effectively improve the control performance.

Introduction

Networked control system (NCS) is composed of a controller and a remote system containing the physical plant, sensors and actuators. The controller and the plant are located at different spatial locations and directly connected through network to form a closed loop control [1]. NCS has been widely used in ball maglev system [2], dual-axis hydraulic positioning system [3], car suspension system [4] and large-scale transportation vehicles [5] due to various advantages including low cost of installation, ease of maintenance and greater flexibility.

In the controlled plant of NCS, most researches have focused on linear NCS in the last decade [6], [7], [8]. Recently, researches on NCS for complex plant have attracted significant amount of interests from both the industry and academics. Walsh et al. [9] designed a nonlinear controller without regard to the network for a Multiple-input–multiple-output (MIMO) nonlinear plant. When computer network inserted into the feedback loop, the closed loop properties and global exponential stability were still preserved by choosing the network protocol and bandwidth. Zhang et al. [10] proposed the guaranteed cost networked control method for the T–S fuzzy system with time delays in a network situation, and obtained the corresponding state feedback control law considering the quality of network service. Tanaka et al. [11] proposed shared control in internet-based remote stabilization for nonlinear systems. To design a shared controller to stabilize plural nonlinear systems simultaneously, they derived a shared control version of the stability condition for T–S fuzzy models with time-varying delays. Lian et al. [12] investigated multiple distributed communication delays as well as multiple-inputs and multiple-outputs. Due to the characteristics of a network architecture, piecewise constant plant inputs are assumed and discrete-time models of plant and controller dynamics are adopted to analyze the stability and performance of a closed loop NCS. Mao et al. [13] modeled NCS with sensor faults as a discrete model with system uncertainties, and given a sufficient condition for the stability in terms of linear matrix inequalities.

In the learning algorithm of NCS, some learning and intelligent control algorithms have been proposed to improve NCS performance. Pan et al. [14] proposed a sampled-data iterative learning control (ILC) approach for a class of nonlinear NCS. By assuming a partial prior knowledge on the transmission time delays, previous cycle based learning method was incorporated into the network based control for a general nonlinear system which satisfied global Lipschitz condition. Lian et al. [15] utilized an integrated networked control design chart to help select design parameters. By utilizing deadband control and state estimation, they further presented the communication modules for guaranteeing both control and communication performance. Li and Fang [16] proposed a novel fuzzy logic method with the delays estimated by a delay window. This method could not only save the power of control node but also preserve NCS performance. Yi et al. [17] presented a new modeling scheme with variable-period sampling for NCS. With the aid of the networked-induced delay prediction based on the BP neural network, the model could be controlled.

In the architecture of NCS, single-layer NCS architecture was widely applied, and a series of research results have been found in most of the existing literature. However, two-layer NCS architecture has received increased attention in the past few years. Lee et al. [18] derived the worst case communication delay based on timing analysis of the switched Ethernet with multiple switching hubs. Then, the experiments showed that if the number of stations and levels were chosen appropriately, the switched Ethernet with multiple switching hubs could be used for the real-time industrial network. Yang [19], Yang et al. [20] introduced a two-level communication model. In the model, a basic control problem was achieved via level 1 communication, whereas a controller parameter adaptation problem was achieved via level 2 communication.

Many plants in practical industrial process are characterized with time-variable, nonlinear, uncertain, distributed and multivariable coupling. Due to the limitations of the widely used single-layer NCS architecture, the existing control strategies cannot fulfill the needs for higher quality control performance of these plants. However, learning control strategies are able to improve control performance by on-line mining of valuable knowledge and potential laws, accumulating experience and adapting to environment, but it is difficult to design and implement these learning algorithms using embedded controller or program logic controller (PLC) due to high computational complexity in single-layer NCS architecture. Motivated from the above observations, a two-layer networked learning control system architecture is proposed in this paper, which provides a path for the implementation of learning strategies due to strong computational ability of the upper layer controller (learning agent). Comparing with single-layer NCS, this system is characterized by two-layer network, local controllers, a learning agent, and complex plant.

The rest of this paper is organized as follows: Section 2 describes the two-layer networked learning control system architecture. A discard-packet strategy is proposed and the cubic spline interpolation is used to compensate lost data in Section 3. Section 4 introduces actor–critic neural network in learning agent, and simulations are given in Section 5. Section 6 is the conclusion.

Section snippets

Two-layer networked learning control system architecture

With the objectives to achieve better control performance, better interference rejection and to increase the adaptability to varying environment, two-layer networked learning control system architecture is introduced as shown in Fig. 1. In this architecture, independent local controllers are interconnected to form the lower layer, i.e., local controllers communicate with the sensors and actuators, that are attached to the plant through the first layer communication network called L1C. The

Data packet compensation strategy

The critical problems that NCS faces are networked-induced delay, data packet out-of-order and packet loss, and many methods have been proposed to track with these problems [19], [21]. Discard-packet strategy is adapted to do with networked-induced delay, data packet out-of-order and packet loss, and zero order holding is used to compensate the lost control signals in the local controllers while the cubic spline interpolator is employed to compensate lost signals in the learning agent. Next, we

Actor–critic neural network architecture

Based on the requirements of two-layer networked learning control system, the actor–critic neural network [23], [24], [25], [26] is appropriately chosen as learning algorithm in the learning agent and is depicted in Fig. 2, which consists of two different neural networks. One is used to implement the controller (actor neural network), and the other is used to implement the reinforcement learner (critic neural network). The actor neural network can be thought of as the controller because it

Simulation example

To verify the feasibility and effectiveness of the proposed architecture, we perform a simulation study on a nonlinear heating, ventilation and air-conditioning (HVAC) system [28]. The modeled system is a nonlinear, multivariable system. The states of the modeled system are the input and output temperatures of air and water, and the air and water flow rate, Tai, Tao, Twi, Two, fa, fw. The control signal C affects the water flow rate. The model is given by the following equations:fw(k)=0.008+

Conclusions

This paper presents a two-layer networked learning control system scheme to improve control performance. Discard-packet strategy is developed to do with networked-induced delay, data packet out-of-order and packet loss, and different compensation strategies are adopted to compensate lost data. An actor–critic neural network was employed in upper layer for a learning agent to dynamically tune the output of local controller in the lower layer. The effectiveness and on-line learning capabilities

Acknowledgements

This work is supported by National Natural Science Foundation of China under Grant 60774059, Key Project of Science and Technology Commission of Shanghai Municipality under Grant 061111008 and 06DZ22011, the Sunlight Plan Following Project of Shanghai Municipal Education Commission and Shanghai Educational Development Foundation under Grant 06GG10, China Postdoctoral Science Foundation (20070420643), and Shanghai Leading Academic Disciplines under Grant T0103.

References (28)

  • G.C. Walsh et al.

    Asymptotic behavior of nonlinear networked control systems

    IEEE Transactions on Automatic Control

    (2001)
  • H.G. Zhang et al.

    Guaranteed cost networked control for T–S fuzzy systems with time delays

    IEEE Transactions on Systems, man, and Cybernetics-Part C: Applications and Reviews

    (2007)
  • K. Tanaka, H. Ohtake, H.O. Wang, Shared nonlinear control in internet-based remote stabilization, in: The 2005 IEEE...
  • F.L. Lian, J. Moyne, D. Tilbury, Analysis and modeling of networked control systems: MIMO case with multiple time...
  • Cited by (30)

    • State-of-the-art on research and applications of machine learning in the building life cycle

      2020, Energy and Buildings
      Citation Excerpt :

      One advantage of the actor-critic method is it could encode expert knowledge or some form of pre-training in the actor network [110] by storing and reusing the weights stored in the actor network; rather than training from scratch by just randomly initializing the weights of the actor network. Actor-critic was first introduced by Du and Fei in 2008 to the building control field for the HVAC control purpose [109]. After that, Actor-critic was used by Fuselli et al. [119] and Wei et al. [120] for energy storage control; by Al-Jabery et al. [122] for domestic hot water control; by Zhang et al. for campus-level district heating system control [110]; and by Bahrami et al. [125] for the scheduling of smart home appliances.

    • A review of reinforcement learning for autonomous building energy management

      2019, Computers and Electrical Engineering
      Citation Excerpt :

      The purpose of the critic is to observe both the state of the environment and the reward obtained from the environment based on the actor’s decision. The critic then gives feedback to the actor [3]. Actor critic is an on-policy and model-free RL algorithm.

    • Event-triggered single-network ADP method for constrained optimal tracking control of continuous-time non-linear systems

      2019, Applied Mathematics and Computation
      Citation Excerpt :

      In general, two neural networks (NNs) are employed to implement the ADP method, i.e., a critic NN and an actor NN to approximate the optimal value function and the optimal control, respectively. Based on the actor-critic dual NNs architecture, several significant results regarding optimal regulation control [6–9], robust control [10,11], zero-sum games [12,13], non-zero-sum games [14,15], optimal consensus control [16,17] and networked control [18] have been reported. As an important research topic, OTCP for various systems has also been intensively studied by ADP methods.

    View all citing articles on Scopus
    View full text