Data-driven heuristic dynamic programming with virtual reality

doi:10.1016/j.neucom.2015.04.014

Neurocomputing

Volume 166, 20 October 2015, Pages 244-255

https://doi.org/10.1016/j.neucom.2015.04.014 Get rights and content

Abstract

In this paper, we propose a virtual reality (VR) platform as a case study of machine learning, in this case applied to the goal representation heuristic dynamic programming (GrHDP) approach. In general, a VR platform normally includes a physical module, a control/learning module, and a VR module. It facilitates machine learning research, where scientists and engineers can participate in the simulation process to analyze dynamic experiments. The internal structure of the VR platform can be replaced according to different research targets, so the platform can be extended to other applications. In this paper, we present the detailed VR design strategy, with a number of applications, including a triple-link inverted pendulum balancing problem, a maze navigation problem, and a robot navigation with obstacle avoidance.

Introduction

Recent research on intelligent systems focuses on developing self-adaptive systems to imitate the signal processing mechanism of biological brain organisms for closing the gap between human behavior and computer decisions [1], [2], [3], [4]. Generally, an intelligent system should be designed with the following abilities:

•
to acquire and retain knowledge through the perception of environments;
•
to analyze and predict actions from prior experiences;
•
to produce and adjust the actions to obtain consistent and reliable results over time [3], [4], [5], [6].

There are two important issues to be considered regarding intelligent systems. The first is how to develop biologically inspired general purpose learning models that can dynamically learn how to achieve a goal [4], [7], [8], [9], and the second is how to represent data information to support the decision-making process, in an efficient and effective way [10], [11], [12], [13].

In the literature, adaptive dynamic programming (ADP) [14], [15], [16], [17] provides a powerful learning mechanism [5], which has been successfully applied in many fields, such as dynamic control systems [8], [16], power systems [18], [19], missile systems [20], and communication systems [21] as well as partial observable Markov decision process [22]. The ADP method can be categorized as heuristic dynamic programming (HDP) [16], dual heuristic dynamic programming (DHP), and globalized dual heuristic dynamic programming (GDHP) [23]. The foundation of ADP can be traced back to the classic Bellman׳s principle of optimality [24]. It is related to the reinforcement learning (RL) while using the adaptive-critic (AC) design framework [25], [26], [27]. For example, in model-free direct HDP structure, two neural networks are used in order to approach on-line learning. The action network interacts with the external environment in order to produce actions according to the state vector observation. The critic network evaluates the performance by means of an external reinforcement signal. The weights of both networks are randomly initialized. The optimal solution is obtained by training the weights using backpropagation to approximate the optimal cost-to-go function [8], [28]. Such model-free technique has also been demonstrated in DHP design successfully in the community [29].

Based on the direct HDP design [16], an enhanced approach, called goal representation heuristic dynamic programming (GrHDP), was proposed in [4], [7]. In the GrHDP model, an additional network, called goal network, was integrated in the framework in order to interact with the critic network [7], [30]. The motivation of the goal network is to provide an internal reinforcement representation to help on-line learning and optimization [8], [31]. The goal network learns from an external reinforcement signal $r (t)$ and adaptively generates an internal reinforcement signal $s (t)$ to guide system behavior [8], [30], [31]. The internal reinforcement signal is also used to feed into the critic network. Since a continuous value is provided, the internal reinforcement signal can be considered more informative. This additional input will contribute to the fine-tuning of the critic network. Also, this internal reinforcement signal can be automatically and efficiently adjusted in order to provide a learning performance improvement in the goal network. Meanwhile, the GrHDP approach also saves the previous cost-to-go value in order to enable on-line learning, association, and optimization over time [7]. More recently, the corresponding goal representation dual heuristic programming (GrDHP) design is also investigated and developed on several control examples with promising results [32], [33].

During the simulation design, virtual reality (VR) provides a straightforward way for the construction of this solution [34], [35]. In general, VR is a powerful technique, which might be implemented through a set of complex hardware devices, such as a mainframe computer, a head-mounted display (HMD), motion trackers, sensor gloves, or either a computer automatic virtual environment (CAVE) [35], [36]. In fact, the key of VR is not related just to the hardware devices, but also to the human–computer interaction (HCI) involving the people participating and exploring a virtual environment (VE). Recently, many researchers focused on the design of visualization/interaction applications instead of the improvement of VR rendering algorithms or hardware devices [37]. In psychology, VE is used to treat patients suffering from anxiety or phobias [38], [39]. In education, a virtual learning environment was developed to help students learn their courses [40], [41]. In urban traffic research, VE has been used to analyze the interactive driving behaviors under different traffic scenarios [42], [43]. In military research, VE simulations have been developed to improve the soldiers׳ skills on decision-making [44] and the cross-cultural communication under different situations [45]. In surgery, a VR simulator was proposed as a cost-effective training platform [46], [47]. In physical rehabilitation, a VE has been used to help the patients to train their movement patterns and enhance their physical rehabilitation [48], [49].

There also seems to be an increasing need for visualization and VR platforms in the machine intelligence research, with the increasing dimension and volume of data information. To this end, a VE must be developed, using computer graphics (CG) software, such that researchers are able to design a virtual experiment adequate to their research tasks. Second, real data information can be used in the VE, making it a suitable representation for the real world. Third, during the dynamic simulation process, the researchers can interact with the experiment through the VE, analyzing the system behavior and possibly proposing changes to the experiment. For example, an additional virtual disturbance might be designed to verify the stability of a dynamic system.

Motivated by our previous works on ADP designs [7], [8], [30] and its applications using VR [50], [51], [52], the goal of this paper is to propose the integration of VR in the development of machine learning solutions. Specifically, we built an interactive VR platform and applied it to GrHDP experiments in order to illustrate how the use of a VR platform can enhance the development of a machine learning application. We first apply VR to GrHDP in the triple-link inverted pendulum balancing problem [7] and the maze navigation problem [30], [31], and then develop a new application benchmark, a robot navigation with obstacle avoidance problem. We hope to demonstrate how the combination of machine learning research and VR improves investigation of many real-world applications.

The rest of this paper is organized as follows. In Section 2, we discuss the structure of the GrHDP approach. The design of a VR interactive platform is described in detail in Section 3. Based on this platform, in 4 GrHDP approach to triple-link inverted pendulum balancing problem with VR, 5 Maze navigation with virtual reality, 6 Robot navigation with obstacle avoidance using virtual reality, we study, respectively, the design and simulation of experiments for the triple-link inverted pendulum balancing problem, the maze navigation problem, and the robot navigation with obstacle avoidance problem. Finally, the conclusion and discussion is provided in Section 7.

Section snippets

Online learning with the GrHDP approach

The main structure of GrHDP is given in Fig. 1. When compared to other ADP approaches [16], in GrHDP, an additional network (goal network) is integrated into the direct HDP structure, interacting with both the critic network and the action network. All the neural networks are multi-layer perceptrons (MLP) with one hidden layer [50]. The input of goal network and critic network could be defined as $x_{g} = [X, u] and x_{c} = [X, u, s]$ , respectively. Meanwhile, the input of action network is still the current

The interactive VR platform design

Basically, a desktop computer is one of the most common VR application interfaces. In a desktop computer, the monitor is used as a visualization platform for the virtual world representation; meanwhile, external equipment, such as the keyboard or the mouse, provides the interactive tools for human interaction. These hardware devices render a real-time visualization/interaction interface supported by CG software [34], [35]. In a CG software, virtual reality modeling language (VRML) is usually

GrHDP approach to triple-link inverted pendulum balancing problem with VR

The stabilization of nonlinear systems has been widely studied for control/learning research purposes [55]. The inverted pendulum balancing model is a popular demonstration benchmark of nonlinear systems. The task of this benchmark is to stabilize an inverted pendulum [16], [55]. We have previously applied GrHDP to the traditional triple-link inverted pendulum balancing problem in [7]. In this work we introduce the advantage of VR to improve the development of experiments. The structure of the

Maze navigation with virtual reality

Maze navigation is a path planning problem where the agent needs to find a proper path from an initial state to a given goal [57], [58]. It is a typical Markov decision process (MDP) problem where the control action is not related to previous state vectors, but only related to current state [30], [31]. The task of this problem is to find an optimal path. Once the optimal path is obtained, the future motion should follow this path. Evaluating value function (or state-action pair) is a

Robot navigation with obstacle avoidance using virtual reality

Robot navigation with obstacle avoidance is a popular research experiment in robot motion planning [57], [61]. The task can be described as to design a robot to observe the environment and find a collision-free path [62], [63], [64], [65]. The solution to this problem can be split into two parts: (a) detecting the robot׳s environment and obstacles; (b) the navigation control strategy design [62]. The VR platform is designed as a physical system for system states detection, a controller module

Conclusions and future research

The contribution of this work is the proposition of using a VR platform to improve machine learning research. Such platform provides a straightforward experiment/simulation interface for real-time dynamic process representation and human computer interaction. Specifically, the virtual experiment environment could be built with real-time data information stored in it. The VR platform visualization could tackle the data information transmission and reproduce the decision-making process.

Acknowledgements

The authors would like to thank the associate editor and the anonymous reviewers for their useful comments, which helped to improve this paper. This work was supported in part by the National Science Foundation (NSF) under Grant ECCS 1053717, and the Army Research Office under Grant W911NF-12-1-0378.

Xiao Fang received his B.S. degree in Department of Mechanical and Electrical Engineering from Hebei University of Engineering, China, in 2007. He is a current Ph.D. student in School of Electrical Engineering, Yanshan University, China. He is also a joint Ph.D. student in Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, RI, USA. His major research interests include adaptive dynamic programming, reinforcement learning, computational intelligence, and

References (65)

P.J. Werbos
Intelligence in the braina theory of how it works and how to build it
IEEE Trans. Neural Netw.
(2009)
H. He et al.
A three-network architecture for on-line learning and optimization based on adaptive dynamic programming
Neurocomputing
(2012)
J. Xu et al.
DCPE co-training for classification
Neurocomputing
(2012)
Y. Tang et al.
Reactive power control of grid-connected wind farm based on adaptive dynamic programming
Neurocomputing
(2014)
P.J. Werbos, Backwards differentiation in ad and neural nets: past links and new opportunities, in: Automatic...
P.J. Werbos, What do neural nets and quantum theory tell us about mind and reality, in: No Matter, Never Mind:...
I.J. Rudas et al.
Intelligent system
Int. J. Comput. Commun. Control
(2008)
H. He
Self-Adaptive System for Machine Intelligence
(2011)
P.J. Werbos
Adpthe key direction for future research in intelligent control and understanding brain intelligence
IEEE Trans. Syst. Man Cybern.: Part B: Cybern.
(2008)
Z. Ni et al.
Adaptive learning in tracking control based on the dual critic network design
IEEE Trans. Neural Netw. Learn. Syst.
(2013)

Z. Ni, H. He, D. Zhao, D.V. Prokhorov, Reinforcement learning control based on multi-goal representation using...

H. He et al.

Learning from imbalanced data

IEEE Trans. Knowl. Data Eng.

(2009)

J. Xu, G. Yang, H. Man, and H. He, L1 graph based on sparse coding for feature selection, Lecture Notes in Computer...

J. Xu et al.

Sparse-representation-based classification with structure-preserving dimension reduction

Cognit. Comput.

(2014)

H. He, Z. Ni, D. Zhao, Learning and optimization in hierarchical adaptive critic design, in: Reinforcement Learning and...

J. Si et al.

On-line learning control by association and reinforcement

IEEE Trans. Neural Netw.

(2001)

D. Prokhorove et al.

Adaptive critic designs

IEEE. Trans. Neural Netw.

(1997)

X. Sui et al.

Energy storage based low frequency oscillation damping control using particle swarm optimization and heuristic dynamic programming

IEEE Trans. Power Syst. (TPS)

(2014)

D.P. Bertsekas et al.

Missile defense and interceptor allocation by neuro-dynamic programming

IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum.

(2000)

M.S. Iyer et al.

Dynamic re-optimization of a fed-batch fermentor using adaptive critic designs

IEEE Trans. Neural Netw.

(2001)

Z. Ni, X. Zhong, H. He, Experimental studies on data-driven heuristic dynamic programming for POMDP, in: Frontiers of...

P.J. Werbos

Handbook of Intelligent Control

(1992)

R.E. Bellman

Dynamic Programming

(1957)

R.S. Sutton et al.

Reinforcement Learning: An Introduction

(1998)

L.P. Kaelbling et al.

Reinforcement learninga survey

J. Artif. Intell. Res.

(1996)

F.L. Lewis et al.

Reinforcement learning and adaptive dynamic programming for feedback control

IEEE Circuits Syst. Mag.

(2009)

F. Wang et al.

Adaptive dynamic programmingan introduction

IEEE Comput. Intell. Mag.

(2009)

Z. Ni, H. He, X. Zhong, D. Prokhorov, Model-free dual heuristic dynamic programming, IEEE Trans. Neural Netw. Learn....

Z. Ni et al.

Goal representation heuristic dynamic programming on maze navigation

IEEE Trans. Neural Netw. Learn. Syst.

(2013)

Z. Ni et al.

Heuristic dynamic programming with internal goal representation

Soft Comput.

(2013)

Z. Ni et al.

GrDHPa general utility function representation for dual heuristic dynamic programming

IEEE Trans. Neural Netw. Learn. Syst.

(2015)

Cited by (15)

Off-policy synchronous iteration IRL method for multi-player zero-sum games with input constraints
2020, Neurocomputing
Citation Excerpt :
The nodus is that the analytic solutions may impossible to be obtained directly even for many simple linear systems. To solve this difficulty, intelligent algorithms were employed to approximate the optimal solutions of the ZS game [6–8]. No significant breakthrough has been made until the emergence of reinforcement learning (RL) method [9].
In this paper, a novel synchronous off-policy method is given to solve multi-player zero-sum (ZS) game under the condition that the knowledge of system data are completely unknown, the actuators of controls are constrained and the disturbances are bounded simultaneously. The cost functions are built by nonquadratic functions to reflect the constrained properties of inputs. The integral reinforcement learning (IRL) technology is employed to solve Hamilton–Jacobi–Bellman equation, so that the system dynamics are not necessary anymore. The obtained value function is proved to converge to the optimal game values. And the equivalent of traditional policy iteration (PI) algorithm and the proposed algorithm is given in solving the multi-player ZS game with constrained inputs. Three neural networks in this paper are utilized, the critic neural network (CNN) to approach the cost function, the action neural network (ANN) to approach the control policies and the disturbance neural networks (DNN) to approach the disturbances are utilized. Finally, a simulation example is given to demonstrate the convergence and performance of the proposed algorithm.
Co-evolutionary multi-task learning for dynamic time series prediction
2018, Applied Soft Computing Journal
Citation Excerpt :
In dynamic programming, a large problem is broken down into sub-problems, from which at least one sub-problem is used as a building block for the optimisation problem. Although dynamic programming has been primarily used for optimisation problems, it has been briefly explored for data driven learning [30,31]. The notion of using sub-problems as building block in dynamic programming can be used in developing algorithms for multi-task learning.
Time series prediction typically consists of a data reconstruction phase where the time series is broken into overlapping windows known as the timespan. The size of the timespan can be seen as a way of determining the extent of past information required for an effective prediction. In certain applications such as the prediction of wind-intensity of storms and cyclones, prediction models need to be dynamic in accommodating different values of the timespan. These applications require robust prediction as soon as the event takes place. We identify a new category of problem called dynamic time series prediction that requires a model to give prediction when presented with varying lengths of the timespan. In this paper, we propose a co-evolutionary multi-task learning method that provides a synergy between multi-task learning and co-evolutionary algorithms to address dynamic time series prediction. The method features effective use of building blocks of knowledge inspired by dynamic programming and multi-task learning. It enables neural networks to retain modularity during training for making a decision in situations even when certain inputs are missing. The effectiveness of the method is demonstrated using one-step-ahead chaotic time series and tropical cyclone wind-intensity prediction.
Co-evolutionary multi-task learning with predictive recurrence for multi-step chaotic time series prediction
2017, Neurocomputing
Citation Excerpt :
Dynamic programming has mainly been used for optimization problems. There has not been much work done for incorporating the approach into machine learning problems that requires knowledge from previous states as building blocks for further decision making [36,37]. Multi-step time series prediction is an application where a synergy between dynamic programming and multi-task learning can be developed.
Multi-task learning employs a shared representation of knowledge for learning several instances of the same problem. Multi-step time series problem is one of the most challenging problems for machine learning methods. The performance of a prediction model face challenges for higher prediction horizons due to the accumulation of errors. Cooperative coevolution employs in a divide and conquer approach for training neural networks and has been very promising for single step ahead time series prediction. Recently, co-evolutionary multi-task learning has been proposed for dynamic time series prediction. In this paper, we adapt co-evolutionary multi-task learning for multi-step prediction where predictive recurrence is developed to feature knowledge from previous states for future prediction horizon. The goal of the paper is to present a network architecture with predictive recurrence which is capable of multi-step prediction through a form of multi-task learning. We employ cooperative neuro-evolution and an evolutionary algorithm as baselines for comparison. The results show that the proposed method provides the best generalization performance in most cases. Comparison of results with the literature has shown to be promising which motivates further application of the approach for related real-world problems.
Neural-network-based synchronous iteration learning method for multi-player zero-sum games
2017, Neurocomputing
Citation Excerpt :
In the nonlinear case the HJI equations are difficult or impossible to solve, and may not have global analytic solutions even in simple cases. Therefore, many approximate methods are proposed to obtain the solution of HJI equations [5–8]. Adaptive dynamic programming (ADP) algorithm is an effective approximate method in optimal control field [9–13].
In this paper, a synchronous solution method for multi-player zero-sum games without system dynamics is established based on neural network. The policy iteration (PI) algorithm is presented to solve the Hamilton–Jacobi–Bellman (HJB) equation. It is proven that the obtained iterative cost function is convergent to the optimal game value. For avoiding system dynamics, off-policy learning method is given to obtain the iterative cost function, controls and disturbances based on PI. Critic neural network (CNN), action neural networks (ANNs) and disturbance neural networks (DNNs) are used to approximate the cost function, controls and disturbances. The weights of neural networks compose the synchronous weight matrix, and the uniformly ultimately bounded (UUB) of the synchronous weight matrix is proven. Two examples are given to show that the effectiveness of the proposed synchronous solution method for multi-player ZS games.
Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning
2016, Information Sciences
Citation Excerpt :
Nevertheless, most of existing ADP and RL methods applied to solve nonlinear optimal problems assume that the plant is known or partially available. To derive the optimal control of unknown nonlinear systems, data-driven ADP approaches are developed [9,38,48,53,55,59,69]. In [69], Zhang et al. proposed a data-driven robust optimal tracking control for continuous-time unknown nonlinear systems via an ADP approach.
This paper presents a data-based robust adaptive control methodology for a class of nonlinear constrained-input systems with completely unknown dynamics. By introducing a value function for the nominal system, the robust control problem is transformed into a constrained optimal control problem. Due to the unavailability of system dynamics, a data-based integral reinforcement learning (RL) algorithm is developed to solve the constrained optimal control problem. Based on the present algorithm, the value function and the control policy can be updated simultaneously using only system data. The convergence of the developed algorithm is proved via an established equivalence relationship. To implement the integral RL algorithm, an actor neural network (NN) and a critic NN are separately utilized to approximate the control policy and the value function, and the least squares method is employed to estimate the unknown parameters. By using Lyapunov’s direct method, the obtained approximate optimal control is verified to guarantee the unknown nonlinear system to be stable in the sense of uniform ultimate boundedness. Two examples are provided to demonstrate the effectiveness and applicability of the theoretical results.
Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming
2016, Neurocomputing
Citation Excerpt :
The general structure utilized to implement ADP algorithms is the actor-critic architecture, where two kinds of NNs referred to as critic NNs and actor NNs are employed to approximate the optimal cost function and the optimal control, respectively. There are several synonyms used for ADP, including “adaptive dynamic programming” [11–18], “approximate dynamic programming” [19], “adaptive critic designs” [20], “neural dynamic programming” [21], and “reinforcement learning (RL)” [22–26]. In the past several years, applications of ADP approaches to optimal tracking control have been extensively studied in the literature [27–32].
This paper presents an adaptive dynamic programming-based guaranteed cost neural tracking control algorithm for a class of continuous-time matched uncertain nonlinear systems. By introducing an augmented system and employing a modified cost function with a discount factor, the guaranteed cost tracking control problem is transformed into an optimal tracking control problem. Unlike existing optimal tracking control algorithms often requiring the control matrix to be invertible, the developed control algorithm relaxes this restrictive condition under the assumption that the system is controllable. A single critic neural network (NN) is constructed to approximate the solution of the modified Hamilton–Jacobi–Bellman equation corresponding to the nominal augmented error dynamics. Utilizing the newly developed critic NN, the optimal tracking control can be derived without policy iteration. All signals in the closed-loop system are proved to be uniformly ultimately bounded via Lyapunov׳s direct method. In addition, the developed control scheme is verified to guarantee that the tracking errors converge to an adjustable neighborhood of the origin. Two numerical examples are provided to illustrate the effectiveness and applicability of the developed approach.

View all citing articles on Scopus

Dezhong Zheng received the B.S. degree and M.S. degree in School of Electrical Engineering, Yanshan University, China, in 1982 and 1988, respectively. He is currently a professor at the Key Lab of Measuring Technology and Instrument of Hebei Province, School of Electrical Engineering, Yanshan University, China. His current research interests include dynamic control system, virtual reality, pattern recognition, virtual instrument, remote control technology, network information technology, and others. He has been an Associate Editor for Chinese Journal of Sensors and Actuators. He is the vice-president of the Computer Aided Design (CAD) Society and the Artificial Intelligence Society in Hebei, China.

Haibo He received the B.S. and M.S. degrees in electrical engineering from Huazhong University of Science and Technology, China, in 1999 and 2002, respectively, and the Ph.D. degree in electrical engineering from Ohio University in 2006. He is currently an Associate Professor at the Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island. From 2006 to 2009, he was an Assistant Professor at the Department of Electrical and Computer Engineering, Stevens Institute of Technology.

His current research interests include adaptive dynamic programming, computational intelligence, self-adaptive systems, machine learning, data mining, embedded intelligent system design (VLSI/FPGA), and various applications such as smart grid, cognitive radio, sensor networks, and others. He has published 1 sole-author research book (Wiley), edited 1 book (Wiley-IEEE) and 6 conference proceedings (Springer), and authored and co-authored over 120 peer-reviewed journal and conference papers. His research results have been covered by national and international medias such as The Wall Street Journal, Yahoo!, Providence Business News, among others. He has delivered numerous keynote and invited talks. He is currently the Chair of the IEEE Computational Intelligence Society (CIS), Neural Network Technical Committee (NNTC), among others. He regularly serves in the Organizing Committee of various international conferences including the General Chair of the IEEE Symposium Series on Computational Intelligence (SSCI 2014). He has been a Guest Editor for several journals including IEEE Computational Intelligence Magazine, IEEE Transactions on Smart Grid, Cognitive Computation (Springer), Applied Mathematics and Computation (Elsevier), Soft Computing (Springer), among others. Currently, he is an Associate Editor of the IEEE Transactions on Neural Networks and Learning Systems and IEEE Transactions on Smart Grid, and also serves on the Editorial Board for several international journals. He received the National Science Foundation (NSF) CAREER Award (2011) and Providence Business News (PBN) “Rising Star Innovator of The Year” Award (2011). He is a Senior Member of IEEE.

Zhen Ni received the B.S. degree in the Department of Control Science and Engineering from Huazhong University of Science and Technology (HUST), Wuhan, China, in 2010, and the M.S. degree in the Department of Electrical, Computer, and Biomedical Engineering from the University of Rhode Island, Kingston, Rhode Island, in 2012. Currently, he is pursuing Ph.D. degree in the same department.

His research interests include computational intelligence and reinforcement learning, specifically in the adaptive dynamic programming (ADP) and optimal/adaptive control.

View full text

Data-driven heuristic dynamic programming with virtual reality

Abstract

Introduction

Section snippets

Online learning with the GrHDP approach

The interactive VR platform design

GrHDP approach to triple-link inverted pendulum balancing problem with VR

Maze navigation with virtual reality

Robot navigation with obstacle avoidance using virtual reality

Conclusions and future research

Acknowledgements

IEEE Trans. Neural Netw.

Neurocomputing

Neurocomputing

Neurocomputing

Intelligent system

Int. J. Comput. Commun. Control

Self-Adaptive System for Machine Intelligence

Adpthe key direction for future research in intelligent control and understanding brain intelligence

IEEE Trans. Syst. Man Cybern.: Part B: Cybern.

Adaptive learning in tracking control based on the dual critic network design

IEEE Trans. Neural Netw. Learn. Syst.

Learning from imbalanced data

IEEE Trans. Knowl. Data Eng.

Sparse-representation-based classification with structure-preserving dimension reduction

Cognit. Comput.

On-line learning control by association and reinforcement

IEEE Trans. Neural Netw.

Adaptive critic designs

IEEE. Trans. Neural Netw.

Energy storage based low frequency oscillation damping control using particle swarm optimization and heuristic dynamic programming

IEEE Trans. Power Syst. (TPS)

Missile defense and interceptor allocation by neuro-dynamic programming

IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum.

Dynamic re-optimization of a fed-batch fermentor using adaptive critic designs

IEEE Trans. Neural Netw.

Handbook of Intelligent Control

Dynamic Programming

Reinforcement Learning: An Introduction

Reinforcement learninga survey

J. Artif. Intell. Res.

Reinforcement learning and adaptive dynamic programming for feedback control

IEEE Circuits Syst. Mag.

Adaptive dynamic programmingan introduction

IEEE Comput. Intell. Mag.

Goal representation heuristic dynamic programming on maze navigation

IEEE Trans. Neural Netw. Learn. Syst.

Heuristic dynamic programming with internal goal representation

Soft Comput.

GrDHPa general utility function representation for dual heuristic dynamic programming

IEEE Trans. Neural Netw. Learn. Syst.