A two-layer networked learning control system using actor–critic neural network
Introduction
Networked control system (NCS) is composed of a controller and a remote system containing the physical plant, sensors and actuators. The controller and the plant are located at different spatial locations and directly connected through network to form a closed loop control [1]. NCS has been widely used in ball maglev system [2], dual-axis hydraulic positioning system [3], car suspension system [4] and large-scale transportation vehicles [5] due to various advantages including low cost of installation, ease of maintenance and greater flexibility.
In the controlled plant of NCS, most researches have focused on linear NCS in the last decade [6], [7], [8]. Recently, researches on NCS for complex plant have attracted significant amount of interests from both the industry and academics. Walsh et al. [9] designed a nonlinear controller without regard to the network for a Multiple-input–multiple-output (MIMO) nonlinear plant. When computer network inserted into the feedback loop, the closed loop properties and global exponential stability were still preserved by choosing the network protocol and bandwidth. Zhang et al. [10] proposed the guaranteed cost networked control method for the T–S fuzzy system with time delays in a network situation, and obtained the corresponding state feedback control law considering the quality of network service. Tanaka et al. [11] proposed shared control in internet-based remote stabilization for nonlinear systems. To design a shared controller to stabilize plural nonlinear systems simultaneously, they derived a shared control version of the stability condition for T–S fuzzy models with time-varying delays. Lian et al. [12] investigated multiple distributed communication delays as well as multiple-inputs and multiple-outputs. Due to the characteristics of a network architecture, piecewise constant plant inputs are assumed and discrete-time models of plant and controller dynamics are adopted to analyze the stability and performance of a closed loop NCS. Mao et al. [13] modeled NCS with sensor faults as a discrete model with system uncertainties, and given a sufficient condition for the stability in terms of linear matrix inequalities.
In the learning algorithm of NCS, some learning and intelligent control algorithms have been proposed to improve NCS performance. Pan et al. [14] proposed a sampled-data iterative learning control (ILC) approach for a class of nonlinear NCS. By assuming a partial prior knowledge on the transmission time delays, previous cycle based learning method was incorporated into the network based control for a general nonlinear system which satisfied global Lipschitz condition. Lian et al. [15] utilized an integrated networked control design chart to help select design parameters. By utilizing deadband control and state estimation, they further presented the communication modules for guaranteeing both control and communication performance. Li and Fang [16] proposed a novel fuzzy logic method with the delays estimated by a delay window. This method could not only save the power of control node but also preserve NCS performance. Yi et al. [17] presented a new modeling scheme with variable-period sampling for NCS. With the aid of the networked-induced delay prediction based on the BP neural network, the model could be controlled.
In the architecture of NCS, single-layer NCS architecture was widely applied, and a series of research results have been found in most of the existing literature. However, two-layer NCS architecture has received increased attention in the past few years. Lee et al. [18] derived the worst case communication delay based on timing analysis of the switched Ethernet with multiple switching hubs. Then, the experiments showed that if the number of stations and levels were chosen appropriately, the switched Ethernet with multiple switching hubs could be used for the real-time industrial network. Yang [19], Yang et al. [20] introduced a two-level communication model. In the model, a basic control problem was achieved via level 1 communication, whereas a controller parameter adaptation problem was achieved via level 2 communication.
Many plants in practical industrial process are characterized with time-variable, nonlinear, uncertain, distributed and multivariable coupling. Due to the limitations of the widely used single-layer NCS architecture, the existing control strategies cannot fulfill the needs for higher quality control performance of these plants. However, learning control strategies are able to improve control performance by on-line mining of valuable knowledge and potential laws, accumulating experience and adapting to environment, but it is difficult to design and implement these learning algorithms using embedded controller or program logic controller (PLC) due to high computational complexity in single-layer NCS architecture. Motivated from the above observations, a two-layer networked learning control system architecture is proposed in this paper, which provides a path for the implementation of learning strategies due to strong computational ability of the upper layer controller (learning agent). Comparing with single-layer NCS, this system is characterized by two-layer network, local controllers, a learning agent, and complex plant.
The rest of this paper is organized as follows: Section 2 describes the two-layer networked learning control system architecture. A discard-packet strategy is proposed and the cubic spline interpolation is used to compensate lost data in Section 3. Section 4 introduces actor–critic neural network in learning agent, and simulations are given in Section 5. Section 6 is the conclusion.
Section snippets
Two-layer networked learning control system architecture
With the objectives to achieve better control performance, better interference rejection and to increase the adaptability to varying environment, two-layer networked learning control system architecture is introduced as shown in Fig. 1. In this architecture, independent local controllers are interconnected to form the lower layer, i.e., local controllers communicate with the sensors and actuators, that are attached to the plant through the first layer communication network called L1C. The
Data packet compensation strategy
The critical problems that NCS faces are networked-induced delay, data packet out-of-order and packet loss, and many methods have been proposed to track with these problems [19], [21]. Discard-packet strategy is adapted to do with networked-induced delay, data packet out-of-order and packet loss, and zero order holding is used to compensate the lost control signals in the local controllers while the cubic spline interpolator is employed to compensate lost signals in the learning agent. Next, we
Actor–critic neural network architecture
Based on the requirements of two-layer networked learning control system, the actor–critic neural network [23], [24], [25], [26] is appropriately chosen as learning algorithm in the learning agent and is depicted in Fig. 2, which consists of two different neural networks. One is used to implement the controller (actor neural network), and the other is used to implement the reinforcement learner (critic neural network). The actor neural network can be thought of as the controller because it
Simulation example
To verify the feasibility and effectiveness of the proposed architecture, we perform a simulation study on a nonlinear heating, ventilation and air-conditioning (HVAC) system [28]. The modeled system is a nonlinear, multivariable system. The states of the modeled system are the input and output temperatures of air and water, and the air and water flow rate, , , , , , . The control signal C affects the water flow rate. The model is given by the following equations:
Conclusions
This paper presents a two-layer networked learning control system scheme to improve control performance. Discard-packet strategy is developed to do with networked-induced delay, data packet out-of-order and packet loss, and different compensation strategies are adopted to compensate lost data. An actor–critic neural network was employed in upper layer for a learning agent to dynamically tune the output of local controller in the lower layer. The effectiveness and on-line learning capabilities
Acknowledgements
This work is supported by National Natural Science Foundation of China under Grant 60774059, Key Project of Science and Technology Commission of Shanghai Municipality under Grant 061111008 and 06DZ22011, the Sunlight Plan Following Project of Shanghai Municipal Education Commission and Shanghai Educational Development Foundation under Grant 06GG10, China Postdoctoral Science Foundation (20070420643), and Shanghai Leading Academic Disciplines under Grant T0103.
References (28)
- et al.
Communication and control co-design for networked control systems
Automatica
(2006) - et al.
Network-based robust control of systems with uncertainty
Automatica
(2005) - et al.
BP neural network prediction-based variable-period sampling approach for networked control systems
Applied Mathematics and Computation
(2007) - et al.
Synthesis of reinforcement learning, neural networks and PI control applied to a simulated heating coil
Artificial Intelligence in Engineering
(1997) - G.C. Walsh, Q. Beldiman, L.G. Bushnell, Asymptotic behavior of networked control systems, in: Proceedings of the 1999...
- et al.
Real-time operating environment for networked control systems
IEEE Transactions on Automation Science and Engineering
(2006) - et al.
Compensation for transmission delays in an Ethernet-based control network using variable-horizon predictive control
IEEE Transactions on Control Systems Technology
(2006) - et al.
Optimal integrated control and scheduling of networked control systems with communication constraints: application to a car suspension system
IEEE Transactions on Control Systems Technology
(2006) - et al.
Network-based coordinated motion control of large-scale transportation vehicles
IEEE/ASME Transactions on Mechatronics
(2007) - et al.
Design of networked control systems with packet dropouts
IEEE Transactions on Automatic Control
(2007)
Asymptotic behavior of nonlinear networked control systems
IEEE Transactions on Automatic Control
Guaranteed cost networked control for T–S fuzzy systems with time delays
IEEE Transactions on Systems, man, and Cybernetics-Part C: Applications and Reviews
Cited by (30)
A systematic method for the optimization of gas supply reliability in natural gas pipeline network based on Bayesian networks and deep reinforcement learning
2022, Reliability Engineering and System SafetyReinforcement learning for whole-building HVAC control and demand response
2020, Energy and AIReinforcement learning for building controls: The opportunities and challenges
2020, Applied EnergyState-of-the-art on research and applications of machine learning in the building life cycle
2020, Energy and BuildingsCitation Excerpt :One advantage of the actor-critic method is it could encode expert knowledge or some form of pre-training in the actor network [110] by storing and reusing the weights stored in the actor network; rather than training from scratch by just randomly initializing the weights of the actor network. Actor-critic was first introduced by Du and Fei in 2008 to the building control field for the HVAC control purpose [109]. After that, Actor-critic was used by Fuselli et al. [119] and Wei et al. [120] for energy storage control; by Al-Jabery et al. [122] for domestic hot water control; by Zhang et al. for campus-level district heating system control [110]; and by Bahrami et al. [125] for the scheduling of smart home appliances.
A review of reinforcement learning for autonomous building energy management
2019, Computers and Electrical EngineeringCitation Excerpt :The purpose of the critic is to observe both the state of the environment and the reward obtained from the environment based on the actor’s decision. The critic then gives feedback to the actor [3]. Actor critic is an on-policy and model-free RL algorithm.
Event-triggered single-network ADP method for constrained optimal tracking control of continuous-time non-linear systems
2019, Applied Mathematics and ComputationCitation Excerpt :In general, two neural networks (NNs) are employed to implement the ADP method, i.e., a critic NN and an actor NN to approximate the optimal value function and the optimal control, respectively. Based on the actor-critic dual NNs architecture, several significant results regarding optimal regulation control [6–9], robust control [10,11], zero-sum games [12,13], non-zero-sum games [14,15], optimal consensus control [16,17] and networked control [18] have been reported. As an important research topic, OTCP for various systems has also been intensively studied by ADP methods.