An Overview of MAXQ Hierarchical Reinforcement Learning

Dietterich, Thomas G.

doi:10.1007/3-540-44914-0_2

Thomas G. Dietterich³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1864))

Included in the following conference series:

International Symposium on Abstraction, Reformulation, and Approximation

1045 Accesses
31 Citations

Abstract

Reinforcement learning addresses the problem of learning optimal policies for sequential decision-making problems involving stochastic operators and numerical reward functions rather than the more traditional deterministic operators and logical goal predicates. In many ways, reinforcement learning research is recapitulating the development of classical research in planning and problem solving. After studying the problem of solving “flat” problem spaces, researchers have recently turned their attention to hierarchical methods that incorporate subroutines and state abstractions. This paper gives an overview of the MAXQ value function decomposition and its support for state abstraction and action abstraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Bibliography

Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
MATH Google Scholar
Crites, R. H., & Barto, A. G. (1995). Improving elevator performance using reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 8, pp. 1017–1023 San Francisco, CA. Morgan Kaufmann.
Google Scholar
Dayan, P., & Hinton, G. (1993). Feudal reinforcement learning. In Advances in Neural Information Processing Systems, 5, pp. 271–278. Morgan Kaufmann, San Francisco, CA.
Google Scholar
Dean, T., & Lin, S.-H. (1995). Decomposition techniques for planning in stochastic domains. Tech. rep. CS-95-10, Department of Computer Science, Brown University, Providence, Rhode Island.
Google Scholar
Dietterich, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research. To appear.
Google Scholar
Kaelbling, L. P. (1993). Hierarchical reinforcement learning: Preliminary results. In Proceedings of the Tenth International Conference on Machine Learning, pp. 167–173 San Francisco, CA. Morgan Kaufmann.
Google Scholar
Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13, 103.
Google Scholar
Parr, R. (1998). Hierarchical control and learning for Markov decision processes. Ph.D. thesis, University of California, Berkeley, California.
Google Scholar
Parr, R., & Russell, S. (1998). Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems, Vol. 10, pp. 1043–1049 Cambridge, MA. MIT Press.
Google Scholar
Singh, S., Jaakkola, T., Littman, M. L., & Szepesvári, C. (1998). Convergence results for single-step on-policy reinforcement-learning algorithms. Tech. rep., University of Colorado, Department of Computer Science, Boulder, CO. To appear in Machine Learning.
Google Scholar
Singh, S. P. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8, 323.
MATH Google Scholar
Sutton, R., & Barto, A. G. (1998). Introduction to Reinforcement Learning. MIT Press, Cambridge, MA.
Google Scholar
Sutton, R. S., Precup, D., & Singh, S. (1998). Between MDPs and Semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales. Tech. rep., University of Massachusetts, Department of Computer and Information Sciences, Amherst, MA. To appear in Artificial Intelligence.
Google Scholar
Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 28(3), 58–68.
Article Google Scholar
Zhang, W., & Dietterich, T. G. (1995). A reinforcement learning approach to job-shop scheduling. In 1995 International Joint Conference on Artificial Intelligence, pp. 1114–1120. Morgan Kaufmann, San Francisco, CA.
Google Scholar

Download references

Author information

Authors and Affiliations

Oregon State University, Corvallis, Oregon, USA
Thomas G. Dietterich

Authors

Thomas G. Dietterich
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Nebraska - Lincoln, 115 Ferguson Hall, Lincoln, NE, 68588-0115
Berthe Y. Choueiry
Department of Computer Science, University of York, Heslington, York, Y010 5DD, UK
Toby Walsh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dietterich, T.G. (2000). An Overview of MAXQ Hierarchical Reinforcement Learning. In: Choueiry, B.Y., Walsh, T. (eds) Abstraction, Reformulation, and Approximation. SARA 2000. Lecture Notes in Computer Science(), vol 1864. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44914-0_2

Download citation

DOI: https://doi.org/10.1007/3-540-44914-0_2
Published: 11 August 2000
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67839-7
Online ISBN: 978-3-540-44914-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics