Abstract
In this paper we present an algorithm, the Goal Agnostic Planner (GAP), which combines elements of Reinforcement Learning (RL) and Markov Decision Processes (MDPs) into an elegant, effective system for learning to solve sequential problems. The GAP algorithm does not require the design of either an explicit world model or a reward function to drive policy determination, and is capable of operating on both MDP and RL domain problems. The construction of the GAP lends itself to several analytic guarantees such as policy optimality, exponential goal achievement rates, reciprocal learning rates, measurable robustness to error, and explicit convergence conditions for abstracted states. Empirical results confirm these predictions, demonstrate effectiveness over a wide range of domains, and show that the GAP algorithm performance is an order of magnitude faster than standard reinforcement learning and produces plans of equal quality to MDPs, without requiring design of reward functions.
With thanks to Joshua and Ellen Lancaster.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Math. Oper. Res. 16(3), 580–595 (1991)
Blum, A.L., Furst, M.L.: Fast planning through planning graph analysis. Artif. Intell. 90(1–2), 281–300 (1997)
Blum, A.L., Langford, J.C.: Probabilistic planning in the graphplan framework. In: Biundo, S., Fox, M. (eds.) Probabilistic planning in the graphplan framework. LNCS (LNAI), vol. 1809, pp. 319–332. Springer, Heidelberg (2000). https://doi.org/10.1007/10720246_25
Dimitrov, N.B., Morton, D.P.: Combinatorial design of a stochastic markov decision process. In: Operations Research and Cyber-Infrastructure (2009)
Grzes, M.: Reward shaping in episodic reinforcement learning (2017)
Guillot, M., Stauffer, G.: The stochastic shortest path problem: a polyhedral combinatorics perspective. Eur. J. Oper. Res. 285(1), 148–158 (2020)
Hostetler, J., Fern, A., Dietterich, T.: Sample-based tree search with fixed and adaptive state abstractions. J. Artif. Intell. Res. 60, 717–777 (2017)
Hunter, A., Thimm, M.: Probabilistic reasoning with abstract argumentation frameworks. J. Artif. Intell. Res. 59, 565–611 (2017)
Koenig, S., Simmons, R.G.: The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms. Mach. Learn. 22(1), 227–250 (1996)
Kolobov, A., Mausam, M., Weld, D.S., Geffner, H.: Heuristic search for generalized stochastic shortest path mdps. In: Twenty-First International Conference on Automated Planning and Scheduling (2011)
Konidaris, G., Kaelbling, L.P., Lozano-Perez, T., Learning symbolic representations for abstract high-level planning: From skills to symbols. J. Artif. Intell. Res. 61, 215–289 (2018)
Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artif. Intell. 241, 103–130 (2016)
Lüdtke, S., Schröder, M., Krüger, F., Bader, S., Kirste, T.: State-space abstractions for probabilistic inference: a systematic review. J. Artif. Intell. Res. 63, 789–848 (2018)
Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Reward function and initial values: better choices for accelerated goal-directed reinforcement learning. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 840–849. Springer, Heidelberg (2006). https://doi.org/10.1007/11840817_87
Pineda, L., Zilberstein, S.: Probabilistic planning with reduced models. J. Artif. Intell. Res. 65, 271–306 (2019)
Steinmetz, M., Hoffmann, J., Buffet, O.: Goal probability analysis in probabilistic planning: exploring and enhancing the state of the art. J. Artif. Intell. Res. 57, 229–271 (2016)
Szepesvári, C., Littman, M.L.: Generalized markov decision processes: Dynamic-programming and reinforcement-learning algorithms. In: Proceedings of International Conference of Machine Learning, vol. 96 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Robinson, C. (2023). Learning to Solve Sequential Planning Problems Without Rewards. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2022, Volume 1. FTC 2022 2022. Lecture Notes in Networks and Systems, vol 559. Springer, Cham. https://doi.org/10.1007/978-3-031-18461-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-18461-1_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18460-4
Online ISBN: 978-3-031-18461-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)