Skip to main content

A Hierarchical Representation Policy Iteration Algorithm for Reinforcement Learning

  • Conference paper
Intelligent Science and Intelligent Data Engineering (IScIDE 2012)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7751))

  • 2414 Accesses

Abstract

This paper presents a hierarchical representation policy iteration (HRPI) algorithm. It is based on the method of state space decomposition implemented by introducing a binary tree. Combining the RPI algorithm with the state space decomposition method, the HRPI algorithm is proposed. In HRPI, the state space is decomposed into multiple sub-spaces according to an approximate value function, then the local policies are estimated on each sub-space and finally the global near-optimal policy is obtained by combining these local policies. The simulation results indicate that the proposed method has better performance compared to the conventional RPI algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)

    Google Scholar 

  2. Sutton, R.S.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Neural Information Processing Systems 8. The MIT Press, Cambridge (1996)

    Google Scholar 

  3. Xu, X., He, H.G., Hu, D.: Efficient reinforcement learning using recursive least-squares methods. Journal of Artificial Intelligence Research 16(1), 259–292 (2002)

    MathSciNet  MATH  Google Scholar 

  4. Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)

    MathSciNet  Google Scholar 

  5. Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks 18(4), 973–992 (2007)

    Article  Google Scholar 

  6. Mahadevan, S., Maggioni, M.: Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research 8, 2169–2231 (2007)

    MathSciNet  MATH  Google Scholar 

  7. Xu, X., He, H.G.: Residual-gradient-based neural reinforcement learning for the optimal control of an acrobat. In: Proceedings of the 2002 IEEE International Symposium on Intelligent Control, Vancouver, Canada, pp. 758–763 (2002)

    Google Scholar 

  8. Dietterich, T.G.: Hierarchical reinforcement learning with the max-q value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)

    MathSciNet  MATH  Google Scholar 

  9. Xu, X., Liu, C., Yang, S.X., Hu, D.: Hierarchical approximate policy iteration with binary-tree state space decomposition. IEEE Transactions on Neural Networks 22(12), 1863–1877 (2011)

    Article  Google Scholar 

  10. Sutton, R.S., Precup, D., Singh, S.: Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  11. Andre, D., Russell, S.J.: State abstraction for programmable reinforcement learning agents. In: Proceedings of the 18th National Conference on Artificial Intelligence, CA, pp. 119–125 (2002)

    Google Scholar 

  12. Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems: Theory and Applications 13, 41–47 (2003)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, J., Zuo, L., Wang, J., Xu, X., Li, C. (2013). A Hierarchical Representation Policy Iteration Algorithm for Reinforcement Learning. In: Yang, J., Fang, F., Sun, C. (eds) Intelligent Science and Intelligent Data Engineering. IScIDE 2012. Lecture Notes in Computer Science, vol 7751. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36669-7_89

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36669-7_89

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36668-0

  • Online ISBN: 978-3-642-36669-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics