A Hierarchical Representation Policy Iteration Algorithm for Reinforcement Learning

Wang, Jian; Zuo, Lei; Wang, Jian; Xu, Xin; Li, Chun

doi:10.1007/978-3-642-36669-7_89

Jian Wang¹⁹,
Lei Zuo¹⁹,
Jian Wang²⁰,
Xin Xu¹⁹ &
…
Chun Li¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7751))

Included in the following conference series:

International Conference on Intelligent Science and Intelligent Data Engineering

2414 Accesses

Abstract

This paper presents a hierarchical representation policy iteration (HRPI) algorithm. It is based on the method of state space decomposition implemented by introducing a binary tree. Combining the RPI algorithm with the state space decomposition method, the HRPI algorithm is proposed. In HRPI, the state space is decomposed into multiple sub-spaces according to an approximate value function, then the local policies are estimated on each sub-space and finally the global near-optimal policy is obtained by combining these local policies. The simulation results indicate that the proposed method has better performance compared to the conventional RPI algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)
Google Scholar
Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Neural Information Processing Systems 8. The MIT Press, Cambridge (1996)
Google Scholar
Xu, X., He, H.G., Hu, D.: Efficient reinforcement learning using recursive least-squares methods. Journal of Artificial Intelligence Research 16(1), 259–292 (2002)
MathSciNet MATH Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
MathSciNet Google Scholar
Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks 18(4), 973–992 (2007)
Article Google Scholar
Mahadevan, S., Maggioni, M.: Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research 8, 2169–2231 (2007)
MathSciNet MATH Google Scholar
Xu, X., He, H.G.: Residual-gradient-based neural reinforcement learning for the optimal control of an acrobat. In: Proceedings of the 2002 IEEE International Symposium on Intelligent Control, Vancouver, Canada, pp. 758–763 (2002)
Google Scholar
Dietterich, T.G.: Hierarchical reinforcement learning with the max-q value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
MathSciNet MATH Google Scholar
Xu, X., Liu, C., Yang, S.X., Hu, D.: Hierarchical approximate policy iteration with binary-tree state space decomposition. IEEE Transactions on Neural Networks 22(12), 1863–1877 (2011)
Article Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)
Article MathSciNet MATH Google Scholar
Andre, D., Russell, S.J.: State abstraction for programmable reinforcement learning agents. In: Proceedings of the 18th National Conference on Artificial Intelligence, CA, pp. 119–125 (2002)
Google Scholar
Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems: Theory and Applications 13, 41–47 (2003)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

College of Mechatronics and Automation, National University of Defense Tech., Changsha, 410073, P.R. China
Jian Wang, Lei Zuo, Xin Xu & Chun Li
Xi’an Air Force Military Representative Office, China
Jian Wang

Authors

Jian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Jian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Chun Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nanjing University of Science & Technology, 210094, Nanjing, China
Jian Yang
Department of Psychology, Peking University, 100871, Beijing, China
Fang Fang
School of Automation, Southeast University, 210096, Nanjing, P.R. China
Changyin Sun

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Zuo, L., Wang, J., Xu, X., Li, C. (2013). A Hierarchical Representation Policy Iteration Algorithm for Reinforcement Learning. In: Yang, J., Fang, F., Sun, C. (eds) Intelligent Science and Intelligent Data Engineering. IScIDE 2012. Lecture Notes in Computer Science, vol 7751. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36669-7_89

Download citation

DOI: https://doi.org/10.1007/978-3-642-36669-7_89
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36668-0
Online ISBN: 978-3-642-36669-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics