Brief paperR() imitation learning for automatic generation control of interconnected power grids☆
Introduction
The performances of automatic generation controllers (AGCs) have always been assessed by the control performance standards (CPS) released by the North American Electric Reliability Council (NERC) in 1997 (Jaleeli & VanSlyck, 1999). While the state-of-the-art of AGCs developed to work under CPS has been comprehensively investigated and addressed in Ibraheem and Kothari (2005) and Yao, Shoults, and Kelm (2000), existing CPS based AGCs mostly adopt fixed-parameter PI control strategies and do not adapt well to changes in operation modes, parameters and structures of power systems (Gao, Teng, & Tu, 2004). In order to improve the dynamic performance and adaptability of AGCs, reinforcement learning (RL) algorithms (Wu & Pugh, 1993) were introduced to the design of the optimal AGCs for the control of interconnected power grids in Ahamed, Rao, and Sastry (2002), Yu, Zhou, Chan, Chen, and Yang (2011a) and Yu, Zhou, Chan, and Lu (2011b). Most notably, a multi-step learning based on discounted reward optimality criterion (DROC) was applied for the stochastic optimal AGC (Yu et al., 2011a) to effectively overcome the long-time delay problem caused by the steam turbine of thermal units in the secondary frequency control loop.
The main obstacle for the practical application of the above-mentioned conventional RL controllers is that it needs to be scheduled to experience a series of trial-and-error procedures called a pre-learning process before its onsite operation (Ernst, Glavic, & Wehenkel, 2004), and an accurate simulation model of the power system is required for this offline pre-learning process. It is unavoidable that differences do exist between the power system model and real power system or even some scenarios cannot be modeled in a simulation environment. Therefore, the intolerable “trial-and-error noise” (Hu & Yue, 2008) may exist in power systems when RL controllers operate in real systems. To overcome this problem, an imitation pre-learning technique is presented in this paper. The imitation learning philosophy (Price & Boutilier, 2003) has been applied in robot control, in which RL controllers learn from the movements and behaviors of human beings by imitation (McRuer, 1980). The technique can be implemented with high feasibility and applicability because it does not rely on the accurate a priori knowledge of the system model, and thus can be utilized as a solution to the online pre-learning problem.
Furthermore, RL algorithms based on the average reward optimality criterion (AROC) (Mahadevan, 1996) are designed to maximize the long-run cumulative average rewards of generic systems, and would therefore match well with the control objective of CPS which considers the long-term average return performances of AGCs (Jaleeli & VanSlyck, 1999).
In this paper, a novel imitation learning () method is proposed to combine the learning () (Hu & Wu, 1999) with an imitation pre-learning process, so as to develop an optimal AGC under NERC’s CPS. This -based AGC () has been successfully tested on a two-area load frequency control (LFC) power system and the practical-sized China Southern Power Grid (CSG) with four control areas.
Section snippets
Multi-step learning algorithm
The computational goal of the AROC is to find an optimum policy to achieve the maximum expected average reward from interaction with the environment by trial-and-error. is a relatively new AROC-based RL algorithm which extends one-step -learning (Schwartz, 1993) by combining it with multi-step temporal difference () returns for general (Sutton, 1988) in an incremental way for delayed Markov Decision Process (MDP) problems (Hu & Yue, 2008). The algorithm works by successively
learning for AGC
not only can provide AGC with capability of self-learning for the average time-horizon optimization, but also would be flexible in accommodating various state–action pairs and control objectives for the multi-criteria CPS optimization. In the following state–action space analysis, two AGC performance indices, 1 min moving averages of CPS1 and area control error (ACE), are used as binary input states of to pursue the optimum average returns of CPS1 and CPS2 compliances,
Evaluation on two-area LFC power system
The performance of has been evaluated through a two-area LFC power system with model parameters taken from Elgerd (1982). In this case study, control area A is taken as the study area, and the mapping principle from to action space is illustrated in Fig. 4 using (10). The imitation pre-learning process should be implemented continuously for a large number of operating states with different typical load disturbances until there are no more changes in the -functions.
Conclusion
This paper presents a novel methodology for the optimal AGC under NERC’s CPS, which possesses the following advantages:
- •
The proposed online imitation pre-learning method does not rely on any accurate system model for the offline pre-learning process and therefore can provide a feasible and promising means for the practical implementation of RL algorithms in real power systems.
- •
For the first time, AROC is introduced to the design of AGC for interconnected power grids. The undiscounted
T. Yu received his B.E. degree in electrical power systems from Zhejiang University, Hangzhou, China, in 1996, his M.E. degree in hydroelectric engineering from Yunnan Polytechnic University, Kunming, China, in 1999, and his Ph.D. degree in electrical engineering from Tsinghua University, Beijing, China, in 2003. Now, he is an Associate Professor in the College of Electric Power, South China University of Technology, Guangzhou, China. His special fields of interest include nonlinear and
References (19)
Human dynamics in man–machine systems
Automatica
(1980)Dynamic programming, Markov chains, and the method of successive approximations
Journal of Mathematical Analysis and Applications
(1963)- et al.
A reinforcement learning approach to automatic generation control
Electric Power Systems Research
(2002) Electric energy systems theory: an introduction
(1982)- et al.
Power systems stability control: reinforcement learning framework
IEEE Transactions on Power Systems
(2004) - et al.
Hierarchical AGC mode and CPS control strategy for interconnected power systems
Automation of Electric Power Systems
(2004) - et al.
Incremental multi-step -learning
Journal of Beijing Institute of Technology
(1999) - et al.
Markov decision processes with their applications
(2008) - et al.
Recent philosophies of automatic generation control strategies in power systems
IEEE Transactions on Power Systems
(2005)
Cited by (56)
Possibilities of reinforcement learning for nuclear power plants: Evidence on current applications and beyond
2024, Nuclear Engineering and TechnologyA load frequency coordinated control strategy for multimicrogrids with V2G based on improved MA-DDPG
2023, International Journal of Electrical Power and Energy SystemsA multi-agent deep reinforcement learning-based “Octopus” cooperative load frequency control for an interconnected grid with various renewable units
2022, Sustainable Energy Technologies and AssessmentsDeep Stackelberg heuristic dynamic programming for frequency regulation of interconnected power systems considering flexible energy sources
2021, Engineering Applications of Artificial Intelligence
T. Yu received his B.E. degree in electrical power systems from Zhejiang University, Hangzhou, China, in 1996, his M.E. degree in hydroelectric engineering from Yunnan Polytechnic University, Kunming, China, in 1999, and his Ph.D. degree in electrical engineering from Tsinghua University, Beijing, China, in 2003. Now, he is an Associate Professor in the College of Electric Power, South China University of Technology, Guangzhou, China. His special fields of interest include nonlinear and coordinated control theory, artificial intelligence techniques in planning, and operation of power systems.
B. Zhou received his B.E. degree (with First Class Honors) in electrical engineering from Zhengzhou University, Zhengzhou, China, in 2006 and his M.E. degree in electrical engineering from South China University of Technology, Guangzhou, China, in 2009. He is currently pursuing a Ph.D. at the Department of Electrical Engineering in The Hong Kong Polytechnic University. His major research interests include power system control, automation, and optimization methodologies.
K.W. Chan received his B.Sc. (with First Class Honors) and Ph.D. degrees in electronic and electrical engineering from the University of Bath, Bath, UK, in 1988 and 1992, respectively. He currently is an Assistant Professor in the Department of Electrical Engineering of the Hong Kong Polytechnic University. His general research interests include power system stability, analysis, control, security and optimization, real-time simulation of power system transients, distributed and parallel processing, and artificial intelligence techniques.
Y. Yuan received his B.E. degree in electrical engineering from China Three Gorges University, Yichang, China, in 2008 and his M.E. degree in electrical engineering from South China University of Technology, Guangzhou, China, in 2011. He is presently an Engineer in Qingdao Power Supply Company, State Grid Corporation of China. His research interests include power system control and optimization methodologies.
B. Yang received his B.E. degree in electrical engineering from South China University of Technology, Guangzhou, China, in 2010. He is currently pursuing an M.E. degree in the College of Electric Power at South China University of Technology. His fields of interest include power system operation, control, and computation.
Q.H. Wu obtained his M.Sc. (Eng) degree in Electrical Engineering from Huazhong University of Science and Technology (HUST), Wuhan, China, in 1981. From 1981 to 1984, he was appointed Lecturer in Electrical Engineering in the University. He obtained his Ph.D. degree in Electrical Engineering from The Queen’s University of Belfast (QUB), Belfast, UK in 1987. He worked as a Research Fellow and subsequently a Senior Research Fellow in QUB from 1987 to 1991. He joined the Department of Mathematical Sciences, Loughborough University, Loughborough, UK in 1991, as a Lecturer, subsequently he was appointed Senior Lecturer. In September, 1995, he joined The University of Liverpool, Liverpool, UK to take up his appointment to the Chair of Electrical Engineering in the Department of Electrical Engineering and Electronics. Since then, he has been the Head of Intelligence Engineering and Automation Research Group working in the areas of systems control, computational intelligence and electric power and energy. He has authored and coauthored more than 320 technical publications, including 155 journal papers, 20 book chapters and 3 research monographs. Professor Wu is a Chartered Engineer, Fellow of IET and Fellow of IEEE. His research interests include nonlinear adaptive control, mathematical morphology, evolutionary computation, machine learning and power system control and operation.
- ☆
This work was jointly supported by the projects of National Natural Science Foundation of China (No. 51177051, No. 50807016), Guangdong Natural Science Funds Project (No. 9151064101000049), Tsinghua University Open Foundation of the Key Laboratory of Power System Simulation and Control (No. SKLD10KM01), The Key Project of Fundamental Research Funds for the Central Universities, The Hong Kong Polytechnic University (Project A-PK96) and B. Zhou’s research studentship awarded by The Hong Kong Polytechnic University. The material in this paper was not presented at any conference. This paper was recommended for publication in revised form by Associate Editor Pedro Albertos under the direction of Editor Toshiharu Sugie.
- 1
Tel.: +852 27666169; fax: +852 23301544.