Skip to main content

Combining Reinforcement Learning with Symbolic Planning

  • Conference paper
Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning (AAMAS 2005, ALAMAS 2007, ALAMAS 2006)

Abstract

One of the major difficulties in applying Q-learning to real-world domains is the sharp increase in the number of learning steps required to converge towards an optimal policy as the size of the state space is increased. In this paper we propose a method, PLANQ-learning, that couples a Q-learner with a STRIPS planner. The planner shapes the reward function, and thus guides the Q-learner quickly to the optimal policy. We demonstrate empirically that this combination of high-level reasoning and low-level learning displays significant improvements in scaling-up behaviour as the state-space grows larger, compared to both standard Q-learning and hierarchical Q-learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barto, A., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems 13(4), 341–379 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  2. Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)

    MATH  MathSciNet  Google Scholar 

  3. Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, Cambridge University, U.K. (1989)

    Google Scholar 

  4. Fikes, R., Nilsson, N.: STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence 2, 189–208 (1971)

    Article  MATH  Google Scholar 

  5. Blum, A.L., Furst, M.L.: Fast planning through planning graph analysis. Artificial Intelligence 90, 281–300 (1997)

    Article  MATH  Google Scholar 

  6. Hoffmann, J.: A heuristic for domain independent planning and its use in an enforced hill-climbing algorithm. In: Proceedings of the 12th International Symposium on Methodologies for Intelligent Systems, pp. 216–227 (2000)

    Google Scholar 

  7. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)

    Google Scholar 

  8. Ghallab, M., Howe, A., Knoblock, C., McDermott, D., Ram, A., Veloso, M., Weld, D., Wilkins, D.: PDDL—the planning domain definition language. Technical Report CVC TR-98-003/DCS TR-1165, Yale Center for Computational Vision and Control (1998)

    Google Scholar 

  9. Ryan, M.: Using abstract models of behaviours to automatically generate reinforcement learning hierarchies. In: Proceedings of the 19th International Conference on Machine Learning (2002)

    Google Scholar 

  10. Boutilier, C., Brafman, R.I., Geib, C.: Prioritized goal decomposition of Markov decision processes: Towards a synthesis of classical and decision theoretic planning. In: International Joint Conference on Artificial Intelligence (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Karl Tuyls Ann Nowe Zahia Guessoum Daniel Kudenko

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grounds, M., Kudenko, D. (2008). Combining Reinforcement Learning with Symbolic Planning. In: Tuyls, K., Nowe, A., Guessoum, Z., Kudenko, D. (eds) Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning. AAMAS ALAMAS ALAMAS 2005 2007 2006. Lecture Notes in Computer Science(), vol 4865. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77949-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77949-0_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77947-6

  • Online ISBN: 978-3-540-77949-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics