Abstract
This paper presents a hierarchical representation policy iteration (HRPI) algorithm. It is based on the method of state space decomposition implemented by introducing a binary tree. Combining the RPI algorithm with the state space decomposition method, the HRPI algorithm is proposed. In HRPI, the state space is decomposed into multiple sub-spaces according to an approximate value function, then the local policies are estimated on each sub-space and finally the global near-optimal policy is obtained by combining these local policies. The simulation results indicate that the proposed method has better performance compared to the conventional RPI algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)
Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Neural Information Processing Systems 8. The MIT Press, Cambridge (1996)
Xu, X., He, H.G., Hu, D.: Efficient reinforcement learning using recursive least-squares methods. Journal of Artificial Intelligence Research 16(1), 259–292 (2002)
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks 18(4), 973–992 (2007)
Mahadevan, S., Maggioni, M.: Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research 8, 2169–2231 (2007)
Xu, X., He, H.G.: Residual-gradient-based neural reinforcement learning for the optimal control of an acrobat. In: Proceedings of the 2002 IEEE International Symposium on Intelligent Control, Vancouver, Canada, pp. 758–763 (2002)
Dietterich, T.G.: Hierarchical reinforcement learning with the max-q value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
Xu, X., Liu, C., Yang, S.X., Hu, D.: Hierarchical approximate policy iteration with binary-tree state space decomposition. IEEE Transactions on Neural Networks 22(12), 1863–1877 (2011)
Sutton, R.S., Precup, D., Singh, S.: Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)
Andre, D., Russell, S.J.: State abstraction for programmable reinforcement learning agents. In: Proceedings of the 18th National Conference on Artificial Intelligence, CA, pp. 119–125 (2002)
Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems: Theory and Applications 13, 41–47 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, J., Zuo, L., Wang, J., Xu, X., Li, C. (2013). A Hierarchical Representation Policy Iteration Algorithm for Reinforcement Learning. In: Yang, J., Fang, F., Sun, C. (eds) Intelligent Science and Intelligent Data Engineering. IScIDE 2012. Lecture Notes in Computer Science, vol 7751. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36669-7_89
Download citation
DOI: https://doi.org/10.1007/978-3-642-36669-7_89
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36668-0
Online ISBN: 978-3-642-36669-7
eBook Packages: Computer ScienceComputer Science (R0)