Abstract
Policy iteration, as one kind of reinforcement learning methods is applied here to solve the optimal problem of nonlinear discrete-time non-affine system with continuous-state and continuous-action space. By applying action-value function or Q function, the implementation of policy iteration avoids the dependence on system dynamics. Online model-free recursive least-squares policy iteration (RLSPI) algorithm is proposed with continuous policy approximation. It is the first attempt to develop online LSPI algorithm for nonlinear discrete-time non-affine systems with continuous policy. A nonlinear discrete-time system is simulated to verify the efficiency of our algorithm.
This work was supported in part by National Natural Science Foundation of China (Nos. 61273136, and 61034002,), and Beijing Natural Science Foundation No. 4122083.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Wang, F., Zhang, H., Liu, D.: Adaptive Dynamic Programming: An Introduction. IEEE Comput. Intell. Mag. 4(2), 39–47 (2009)
Lewis, F.L., Vrabie, D.: Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)
Howard, R.: Dynamic Programming and Markov Processes. MIT Press, Cambridge (1960)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
Tsitsiklis, J.N., Van Roy, B.: Feature-Based Methods for Large Scale Dynamic Programming. Machine Learning 22, 59–94 (1996)
Tsitsiklis, J.N., Van Roy, B.: An Analysis of Temporal Difference Learning with Function Approximation. IEEE Trans. Automat. Contr. 42(5), 674–690 (1997)
Zhao, D.B., Bai, X.R., Wang, F.Y., Xu, J., Yu, W.S.: DHP Method for Ramp Metering of Freeway Traffic. IEEE Transactions on Intelligent Transportation Systems 12(4), 990–999 (2011)
Zhao, D.B., Hu, Z.H., Xia, Z.P., Alippi, C., Wang, D.: A Human-Like Full Range Adaptive Cruise Control Based on Supervised Adaptive Dynamic Programming. Neurocomputing (in press), http://dx.doi.org/10.1016/j.neucom.2012.09.034
Si, J., Wang, Y.T.: On-Line Learning Control by Association and Reinforcement. IEEE Trans. Neural Netw. 12(2), 264–276 (2001)
Busoniu, L., Ernst, D., De Schutter, B., Babuska, R.: Online Least-Squares Policy Iteration for Reinforcement Learning Control. In: Proc. 2010 American Control Conf. (ACC 2010), pp. 486–491 (2010)
Lagoudakis, M.G., Parr, R.: Least-Squares Policy Iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
Abu-Khalaf, M., Lewis, F.L.: Nearly Optimal Control Laws for Nonlinear Systems with Saturating Actuators Using a Neural Network HJB Approach. Automatica 41(5), 779–791 (2005)
Zhang, H., Luo, Y., Liu, D.: Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems with Control Constraints. IEEE Trans. Neural Netw. 20(9), 1490–1503 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, Y., Zhao, D. (2013). Online Model-Free RLSPI Algorithm for Nonlinear Discrete-Time Non-affine Systems. In: Lee, M., Hirose, A., Hou, ZG., Kil, R.M. (eds) Neural Information Processing. ICONIP 2013. Lecture Notes in Computer Science, vol 8227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-42042-9_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-42042-9_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-42041-2
Online ISBN: 978-3-642-42042-9
eBook Packages: Computer ScienceComputer Science (R0)