Elsevier

Automatica

Volume 48, Issue 9, September 2012, Pages 2130-2136
Automatica

Brief paper
R(λ) imitation learning for automatic generation control of interconnected power grids

https://doi.org/10.1016/j.automatica.2012.05.043Get rights and content

Abstract

The goal of average reward reinforcement learning is to maximize the long-term average rewards of a generic system. This coincides with the design objective of the control performance standards (CPS) which were established to improve the long-term performance of an automatic generation controller (AGC) used for real-time control of interconnected power systems. In this paper, a novel R(λ) imitation learning (R(λ)IL) method based on the average reward optimality criterion is presented to develop an optimal AGC under the CPS. This R(λ)IL-based AGC can operate online in real-time with high CPS compliances and fast convergence rate in the imitation pre-learning process. Its capability to learn the control behaviors of the existing AGC by observing system variations enable it to overcome the serious defect in the applicability of conventional RL controllers, in which an accurate power system model is required for the offline pre-learning process, and significantly enhance the learning efficiency and control performance for power generation control in various power system operation scenarios.

Introduction

The performances of automatic generation controllers (AGCs) have always been assessed by the control performance standards (CPS) released by the North American Electric Reliability Council (NERC) in 1997 (Jaleeli & VanSlyck, 1999). While the state-of-the-art of AGCs developed to work under CPS has been comprehensively investigated and addressed in Ibraheem and Kothari (2005) and Yao, Shoults, and Kelm (2000), existing CPS based AGCs mostly adopt fixed-parameter PI control strategies and do not adapt well to changes in operation modes, parameters and structures of power systems (Gao, Teng, & Tu, 2004). In order to improve the dynamic performance and adaptability of AGCs, reinforcement learning (RL) algorithms (Wu & Pugh, 1993) were introduced to the design of the optimal AGCs for the control of interconnected power grids in Ahamed, Rao, and Sastry (2002), Yu, Zhou, Chan, Chen, and Yang (2011a) and Yu, Zhou, Chan, and Lu (2011b). Most notably, a multi-step Q(λ) learning based on discounted reward optimality criterion (DROC) was applied for the stochastic optimal AGC (Yu et al., 2011a) to effectively overcome the long-time delay problem caused by the steam turbine of thermal units in the secondary frequency control loop.

The main obstacle for the practical application of the above-mentioned conventional RL controllers is that it needs to be scheduled to experience a series of trial-and-error procedures called a pre-learning process before its onsite operation (Ernst, Glavic, & Wehenkel, 2004), and an accurate simulation model of the power system is required for this offline pre-learning process. It is unavoidable that differences do exist between the power system model and real power system or even some scenarios cannot be modeled in a simulation environment. Therefore, the intolerable “trial-and-error noise” (Hu & Yue, 2008) may exist in power systems when RL controllers operate in real systems. To overcome this problem, an imitation pre-learning technique is presented in this paper. The imitation learning philosophy (Price & Boutilier, 2003) has been applied in robot control, in which RL controllers learn from the movements and behaviors of human beings by imitation (McRuer, 1980). The technique can be implemented with high feasibility and applicability because it does not rely on the accurate a priori knowledge of the system model, and thus can be utilized as a solution to the online pre-learning problem.

Furthermore, RL algorithms based on the average reward optimality criterion (AROC) (Mahadevan, 1996) are designed to maximize the long-run cumulative average rewards of generic systems, and would therefore match well with the control objective of CPS which considers the long-term average return performances of AGCs (Jaleeli & VanSlyck, 1999).

In this paper, a novel R(λ) imitation learning (R(λ)IL) method is proposed to combine the R(λ) learning (R(λ)L) (Hu & Wu, 1999) with an imitation pre-learning process, so as to develop an optimal AGC under NERC’s CPS. This R(λ)IL-based AGC (R(λ)ILAGC) has been successfully tested on a two-area load frequency control (LFC) power system and the practical-sized China Southern Power Grid (CSG) with four control areas.

Section snippets

Multi-step R(λ) learning algorithm

The computational goal of the AROC is to find an optimum policy to achieve the maximum expected average reward from interaction with the environment by trial-and-error. R(λ)L is a relatively new AROC-based RL algorithm which extends one-step R-learning (Schwartz, 1993) by combining it with multi-step temporal difference (TD(λ)) returns for general λ (Sutton, 1988) in an incremental way for delayed Markov Decision Process (MDP) problems (Hu & Yue, 2008). The algorithm works by successively

R(λ) learning for AGC

R(λ)L not only can provide AGC with capability of self-learning for the average time-horizon optimization, but also would be flexible in accommodating various state–action pairs and control objectives for the multi-criteria CPS optimization. In the following state–action space analysis, two AGC performance indices, 1 min moving averages of CPS1 and area control error (ACE), are used as binary input states of R(λ)ILAGC to pursue the optimum average returns of CPS1 and CPS2 compliances,

Evaluation on two-area LFC power system

The performance of R(λ)ILAGC has been evaluated through a two-area LFC power system with model parameters taken from Elgerd (1982). In this case study, control area A is taken as the study area, and the mapping principle from dPord-PI to action space A is illustrated in Fig. 4 using (10). The imitation pre-learning process should be implemented continuously for a large number of operating states with different typical load disturbances until there are no more changes in the R-functions.

Conclusion

This paper presents a novel R(λ)IL methodology for the optimal AGC under NERC’s CPS, which possesses the following advantages:

  • The proposed online imitation pre-learning method does not rely on any accurate system model for the offline pre-learning process and therefore can provide a feasible and promising means for the practical implementation of RL algorithms in real power systems.

  • For the first time, AROC is introduced to the design of AGC for interconnected power grids. The undiscounted

T. Yu received his B.E. degree in electrical power systems from Zhejiang University, Hangzhou, China, in 1996, his M.E. degree in hydroelectric engineering from Yunnan Polytechnic University, Kunming, China, in 1999, and his Ph.D. degree in electrical engineering from Tsinghua University, Beijing, China, in 2003. Now, he is an Associate Professor in the College of Electric Power, South China University of Technology, Guangzhou, China. His special fields of interest include nonlinear and

References (19)

  • D. McRuer

    Human dynamics in man–machine systems

    Automatica

    (1980)
  • D. White

    Dynamic programming, Markov chains, and the method of successive approximations

    Journal of Mathematical Analysis and Applications

    (1963)
  • T.P.I. Ahamed et al.

    A reinforcement learning approach to automatic generation control

    Electric Power Systems Research

    (2002)
  • O.I. Elgerd

    Electric energy systems theory: an introduction

    (1982)
  • D. Ernst et al.

    Power systems stability control: reinforcement learning framework

    IEEE Transactions on Power Systems

    (2004)
  • Z.H. Gao et al.

    Hierarchical AGC mode and CPS control strategy for interconnected power systems

    Automation of Electric Power Systems

    (2004)
  • G.H. Hu et al.

    Incremental multi-step R-learning

    Journal of Beijing Institute of Technology

    (1999)
  • Q.Y. Hu et al.

    Markov decision processes with their applications

    (2008)
  • P.K. Ibraheem et al.

    Recent philosophies of automatic generation control strategies in power systems

    IEEE Transactions on Power Systems

    (2005)
There are more references available in the full text version of this article.

Cited by (56)

View all citing articles on Scopus

T. Yu received his B.E. degree in electrical power systems from Zhejiang University, Hangzhou, China, in 1996, his M.E. degree in hydroelectric engineering from Yunnan Polytechnic University, Kunming, China, in 1999, and his Ph.D. degree in electrical engineering from Tsinghua University, Beijing, China, in 2003. Now, he is an Associate Professor in the College of Electric Power, South China University of Technology, Guangzhou, China. His special fields of interest include nonlinear and coordinated control theory, artificial intelligence techniques in planning, and operation of power systems.

B. Zhou received his B.E. degree (with First Class Honors) in electrical engineering from Zhengzhou University, Zhengzhou, China, in 2006 and his M.E. degree in electrical engineering from South China University of Technology, Guangzhou, China, in 2009. He is currently pursuing a Ph.D. at the Department of Electrical Engineering in The Hong Kong Polytechnic University. His major research interests include power system control, automation, and optimization methodologies.

K.W. Chan received his B.Sc. (with First Class Honors) and Ph.D. degrees in electronic and electrical engineering from the University of Bath, Bath, UK, in 1988 and 1992, respectively. He currently is an Assistant Professor in the Department of Electrical Engineering of the Hong Kong Polytechnic University. His general research interests include power system stability, analysis, control, security and optimization, real-time simulation of power system transients, distributed and parallel processing, and artificial intelligence techniques.

Y. Yuan received his B.E. degree in electrical engineering from China Three Gorges University, Yichang, China, in 2008 and his M.E. degree in electrical engineering from South China University of Technology, Guangzhou, China, in 2011. He is presently an Engineer in Qingdao Power Supply Company, State Grid Corporation of China. His research interests include power system control and optimization methodologies.

B. Yang received his B.E. degree in electrical engineering from South China University of Technology, Guangzhou, China, in 2010. He is currently pursuing an M.E. degree in the College of Electric Power at South China University of Technology. His fields of interest include power system operation, control, and computation.

Q.H. Wu obtained his M.Sc. (Eng) degree in Electrical Engineering from Huazhong University of Science and Technology (HUST), Wuhan, China, in 1981. From 1981 to 1984, he was appointed Lecturer in Electrical Engineering in the University. He obtained his Ph.D. degree in Electrical Engineering from The Queen’s University of Belfast (QUB), Belfast, UK in 1987. He worked as a Research Fellow and subsequently a Senior Research Fellow in QUB from 1987 to 1991. He joined the Department of Mathematical Sciences, Loughborough University, Loughborough, UK in 1991, as a Lecturer, subsequently he was appointed Senior Lecturer. In September, 1995, he joined The University of Liverpool, Liverpool, UK to take up his appointment to the Chair of Electrical Engineering in the Department of Electrical Engineering and Electronics. Since then, he has been the Head of Intelligence Engineering and Automation Research Group working in the areas of systems control, computational intelligence and electric power and energy. He has authored and coauthored more than 320 technical publications, including 155 journal papers, 20 book chapters and 3 research monographs. Professor Wu is a Chartered Engineer, Fellow of IET and Fellow of IEEE. His research interests include nonlinear adaptive control, mathematical morphology, evolutionary computation, machine learning and power system control and operation.

This work was jointly supported by the projects of National Natural Science Foundation of China (No. 51177051, No. 50807016), Guangdong Natural Science Funds Project (No. 9151064101000049), Tsinghua University Open Foundation of the Key Laboratory of Power System Simulation and Control (No. SKLD10KM01), The Key Project of Fundamental Research Funds for the Central Universities, The Hong Kong Polytechnic University (Project A-PK96) and B. Zhou’s research studentship awarded by The Hong Kong Polytechnic University. The material in this paper was not presented at any conference. This paper was recommended for publication in revised form by Associate Editor Pedro Albertos under the direction of Editor Toshiharu Sugie.

1

Tel.: +852 27666169; fax: +852 23301544.

View full text