Learning to Solve Sequential Planning Problems Without Rewards

Robinson, Chris

doi:10.1007/978-3-031-18461-1_27

Chris Robinson¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 559))

Included in the following conference series:

Proceedings of the Future Technologies Conference

723 Accesses

Abstract

In this paper we present an algorithm, the Goal Agnostic Planner (GAP), which combines elements of Reinforcement Learning (RL) and Markov Decision Processes (MDPs) into an elegant, effective system for learning to solve sequential problems. The GAP algorithm does not require the design of either an explicit world model or a reward function to drive policy determination, and is capable of operating on both MDP and RL domain problems. The construction of the GAP lends itself to several analytic guarantees such as policy optimality, exponential goal achievement rates, reciprocal learning rates, measurable robustness to error, and explicit convergence conditions for abstracted states. Empirical results confirm these predictions, demonstrate effectiveness over a wide range of domains, and show that the GAP algorithm performance is an order of magnitude faster than standard reinforcement learning and produces plans of equal quality to MDPs, without requiring design of reward functions.

With thanks to Joshua and Ellen Lancaster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Math. Oper. Res. 16(3), 580–595 (1991)
Article MathSciNet Google Scholar
Blum, A.L., Furst, M.L.: Fast planning through planning graph analysis. Artif. Intell. 90(1–2), 281–300 (1997)
Article Google Scholar
Blum, A.L., Langford, J.C.: Probabilistic planning in the graphplan framework. In: Biundo, S., Fox, M. (eds.) Probabilistic planning in the graphplan framework. LNCS (LNAI), vol. 1809, pp. 319–332. Springer, Heidelberg (2000). https://doi.org/10.1007/10720246_25
Chapter Google Scholar
Dimitrov, N.B., Morton, D.P.: Combinatorial design of a stochastic markov decision process. In: Operations Research and Cyber-Infrastructure (2009)
Google Scholar
Grzes, M.: Reward shaping in episodic reinforcement learning (2017)
Google Scholar
Guillot, M., Stauffer, G.: The stochastic shortest path problem: a polyhedral combinatorics perspective. Eur. J. Oper. Res. 285(1), 148–158 (2020)
Article MathSciNet Google Scholar
Hostetler, J., Fern, A., Dietterich, T.: Sample-based tree search with fixed and adaptive state abstractions. J. Artif. Intell. Res. 60, 717–777 (2017)
Article MathSciNet Google Scholar
Hunter, A., Thimm, M.: Probabilistic reasoning with abstract argumentation frameworks. J. Artif. Intell. Res. 59, 565–611 (2017)
Article MathSciNet Google Scholar
Koenig, S., Simmons, R.G.: The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms. Mach. Learn. 22(1), 227–250 (1996)
Article Google Scholar
Kolobov, A., Mausam, M., Weld, D.S., Geffner, H.: Heuristic search for generalized stochastic shortest path mdps. In: Twenty-First International Conference on Automated Planning and Scheduling (2011)
Google Scholar
Konidaris, G., Kaelbling, L.P., Lozano-Perez, T., Learning symbolic representations for abstract high-level planning: From skills to symbols. J. Artif. Intell. Res. 61, 215–289 (2018)
Article Google Scholar
Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artif. Intell. 241, 103–130 (2016)
Article MathSciNet Google Scholar
Lüdtke, S., Schröder, M., Krüger, F., Bader, S., Kirste, T.: State-space abstractions for probabilistic inference: a systematic review. J. Artif. Intell. Res. 63, 789–848 (2018)
Article MathSciNet Google Scholar
Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Reward function and initial values: better choices for accelerated goal-directed reinforcement learning. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 840–849. Springer, Heidelberg (2006). https://doi.org/10.1007/11840817_87
Chapter Google Scholar
Pineda, L., Zilberstein, S.: Probabilistic planning with reduced models. J. Artif. Intell. Res. 65, 271–306 (2019)
Article MathSciNet Google Scholar
Steinmetz, M., Hoffmann, J., Buffet, O.: Goal probability analysis in probabilistic planning: exploring and enhancing the state of the art. J. Artif. Intell. Res. 57, 229–271 (2016)
Article MathSciNet Google Scholar
Szepesvári, C., Littman, M.L.: Generalized markov decision processes: Dynamic-programming and reinforcement-learning algorithms. In: Proceedings of International Conference of Machine Learning, vol. 96 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

6000 Noah, Louisville, KY, 40258, USA
Chris Robinson

Authors

Chris Robinson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chris Robinson .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Robinson, C. (2023). Learning to Solve Sequential Planning Problems Without Rewards. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2022, Volume 1. FTC 2022 2022. Lecture Notes in Networks and Systems, vol 559. Springer, Cham. https://doi.org/10.1007/978-3-031-18461-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-18461-1_27
Published: 13 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18460-4
Online ISBN: 978-3-031-18461-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Learning to Solve Sequential Planning Problems Without Rewards