Skip to main content

An Overview of MAXQ Hierarchical Reinforcement Learning

  • Conference paper
  • First Online:
Abstraction, Reformulation, and Approximation (SARA 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1864))

Abstract

Reinforcement learning addresses the problem of learning optimal policies for sequential decision-making problems involving stochastic operators and numerical reward functions rather than the more traditional deterministic operators and logical goal predicates. In many ways, reinforcement learning research is recapitulating the development of classical research in planning and problem solving. After studying the problem of solving “flat” problem spaces, researchers have recently turned their attention to hierarchical methods that incorporate subroutines and state abstractions. This paper gives an overview of the MAXQ value function decomposition and its support for state abstraction and action abstraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

  • Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.

    MATH  Google Scholar 

  • Crites, R. H., & Barto, A. G. (1995). Improving elevator performance using reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 8, pp. 1017–1023 San Francisco, CA. Morgan Kaufmann.

    Google Scholar 

  • Dayan, P., & Hinton, G. (1993). Feudal reinforcement learning. In Advances in Neural Information Processing Systems, 5, pp. 271–278. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  • Dean, T., & Lin, S.-H. (1995). Decomposition techniques for planning in stochastic domains. Tech. rep. CS-95-10, Department of Computer Science, Brown University, Providence, Rhode Island.

    Google Scholar 

  • Dietterich, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research. To appear.

    Google Scholar 

  • Kaelbling, L. P. (1993). Hierarchical reinforcement learning: Preliminary results. In Proceedings of the Tenth International Conference on Machine Learning, pp. 167–173 San Francisco, CA. Morgan Kaufmann.

    Google Scholar 

  • Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13, 103.

    Google Scholar 

  • Parr, R. (1998). Hierarchical control and learning for Markov decision processes. Ph.D. thesis, University of California, Berkeley, California.

    Google Scholar 

  • Parr, R., & Russell, S. (1998). Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems, Vol. 10, pp. 1043–1049 Cambridge, MA. MIT Press.

    Google Scholar 

  • Singh, S., Jaakkola, T., Littman, M. L., & Szepesvári, C. (1998). Convergence results for single-step on-policy reinforcement-learning algorithms. Tech. rep., University of Colorado, Department of Computer Science, Boulder, CO. To appear in Machine Learning.

    Google Scholar 

  • Singh, S. P. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8, 323.

    MATH  Google Scholar 

  • Sutton, R., & Barto, A. G. (1998). Introduction to Reinforcement Learning. MIT Press, Cambridge, MA.

    Google Scholar 

  • Sutton, R. S., Precup, D., & Singh, S. (1998). Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales. Tech. rep., University of Massachusetts, Department of Computer and Information Sciences, Amherst, MA. To appear in Artificial Intelligence.

    Google Scholar 

  • Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 28(3), 58–68.

    Article  Google Scholar 

  • Zhang, W., & Dietterich, T. G. (1995). A reinforcement learning approach to job-shop scheduling. In 1995 International Joint Conference on Artificial Intelligence, pp. 1114–1120. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dietterich, T.G. (2000). An Overview of MAXQ Hierarchical Reinforcement Learning. In: Choueiry, B.Y., Walsh, T. (eds) Abstraction, Reformulation, and Approximation. SARA 2000. Lecture Notes in Computer Science(), vol 1864. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44914-0_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-44914-0_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67839-7

  • Online ISBN: 978-3-540-44914-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics