Combining Reinforcement Learning with Symbolic Planning

Grounds, Matthew; Kudenko, Daniel

doi:10.1007/978-3-540-77949-0_6

Matthew Grounds¹ &
Daniel Kudenko¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4865))

Included in the following conference series:

1735 Accesses
10 Citations

Abstract

One of the major difficulties in applying Q-learning to real-world domains is the sharp increase in the number of learning steps required to converge towards an optimal policy as the size of the state space is increased. In this paper we propose a method, PLANQ-learning, that couples a Q-learner with a STRIPS planner. The planner shapes the reward function, and thus guides the Q-learner quickly to the optimal policy. We demonstrate empirically that this combination of high-level reasoning and low-level learning displays significant improvements in scaling-up behaviour as the state-space grows larger, compared to both standard Q-learning and hierarchical Q-learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barto, A., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems 13(4), 341–379 (2003)
Article MATH MathSciNet Google Scholar
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
MATH MathSciNet Google Scholar
Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, Cambridge University, U.K. (1989)
Google Scholar
Fikes, R., Nilsson, N.: STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence 2, 189–208 (1971)
Article MATH Google Scholar
Blum, A.L., Furst, M.L.: Fast planning through planning graph analysis. Artificial Intelligence 90, 281–300 (1997)
Article MATH Google Scholar
Hoffmann, J.: A heuristic for domain independent planning and its use in an enforced hill-climbing algorithm. In: Proceedings of the 12th International Symposium on Methodologies for Intelligent Systems, pp. 216–227 (2000)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)
Google Scholar
Ghallab, M., Howe, A., Knoblock, C., McDermott, D., Ram, A., Veloso, M., Weld, D., Wilkins, D.: PDDL—the planning domain definition language. Technical Report CVC TR-98-003/DCS TR-1165, Yale Center for Computational Vision and Control (1998)
Google Scholar
Ryan, M.: Using abstract models of behaviours to automatically generate reinforcement learning hierarchies. In: Proceedings of the 19th International Conference on Machine Learning (2002)
Google Scholar
Boutilier, C., Brafman, R.I., Geib, C.: Prioritized goal decomposition of Markov decision processes: Towards a synthesis of classical and decision theoretic planning. In: International Joint Conference on Artificial Intelligence (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of York, York, YO10 5DD, UK
Matthew Grounds & Daniel Kudenko

Authors

Matthew Grounds
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Kudenko
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Karl Tuyls Ann Nowe Zahia Guessoum Daniel Kudenko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grounds, M., Kudenko, D. (2008). Combining Reinforcement Learning with Symbolic Planning. In: Tuyls, K., Nowe, A., Guessoum, Z., Kudenko, D. (eds) Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning. AAMAS ALAMAS ALAMAS 2005 2007 2006. Lecture Notes in Computer Science(), vol 4865. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77949-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-77949-0_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77947-6
Online ISBN: 978-3-540-77949-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics