Skip to main content

Learning to Solve Sequential Planning Problems Without Rewards

  • Conference paper
  • First Online:
Proceedings of the Future Technologies Conference (FTC) 2022, Volume 1 (FTC 2022 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 559))

Included in the following conference series:

  • 723 Accesses

Abstract

In this paper we present an algorithm, the Goal Agnostic Planner (GAP), which combines elements of Reinforcement Learning (RL) and Markov Decision Processes (MDPs) into an elegant, effective system for learning to solve sequential problems. The GAP algorithm does not require the design of either an explicit world model or a reward function to drive policy determination, and is capable of operating on both MDP and RL domain problems. The construction of the GAP lends itself to several analytic guarantees such as policy optimality, exponential goal achievement rates, reciprocal learning rates, measurable robustness to error, and explicit convergence conditions for abstracted states. Empirical results confirm these predictions, demonstrate effectiveness over a wide range of domains, and show that the GAP algorithm performance is an order of magnitude faster than standard reinforcement learning and produces plans of equal quality to MDPs, without requiring design of reward functions.

With thanks to Joshua and Ellen Lancaster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Math. Oper. Res. 16(3), 580–595 (1991)

    Article  MathSciNet  Google Scholar 

  2. Blum, A.L., Furst, M.L.: Fast planning through planning graph analysis. Artif. Intell. 90(1–2), 281–300 (1997)

    Article  Google Scholar 

  3. Blum, A.L., Langford, J.C.: Probabilistic planning in the graphplan framework. In: Biundo, S., Fox, M. (eds.) Probabilistic planning in the graphplan framework. LNCS (LNAI), vol. 1809, pp. 319–332. Springer, Heidelberg (2000). https://doi.org/10.1007/10720246_25

    Chapter  Google Scholar 

  4. Dimitrov, N.B., Morton, D.P.: Combinatorial design of a stochastic markov decision process. In: Operations Research and Cyber-Infrastructure (2009)

    Google Scholar 

  5. Grzes, M.: Reward shaping in episodic reinforcement learning (2017)

    Google Scholar 

  6. Guillot, M., Stauffer, G.: The stochastic shortest path problem: a polyhedral combinatorics perspective. Eur. J. Oper. Res. 285(1), 148–158 (2020)

    Article  MathSciNet  Google Scholar 

  7. Hostetler, J., Fern, A., Dietterich, T.: Sample-based tree search with fixed and adaptive state abstractions. J. Artif. Intell. Res. 60, 717–777 (2017)

    Article  MathSciNet  Google Scholar 

  8. Hunter, A., Thimm, M.: Probabilistic reasoning with abstract argumentation frameworks. J. Artif. Intell. Res. 59, 565–611 (2017)

    Article  MathSciNet  Google Scholar 

  9. Koenig, S., Simmons, R.G.: The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms. Mach. Learn. 22(1), 227–250 (1996)

    Article  Google Scholar 

  10. Kolobov, A., Mausam, M., Weld, D.S., Geffner, H.: Heuristic search for generalized stochastic shortest path mdps. In: Twenty-First International Conference on Automated Planning and Scheduling (2011)

    Google Scholar 

  11. Konidaris, G., Kaelbling, L.P., Lozano-Perez, T., Learning symbolic representations for abstract high-level planning: From skills to symbols. J. Artif. Intell. Res. 61, 215–289 (2018)

    Article  Google Scholar 

  12. Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artif. Intell. 241, 103–130 (2016)

    Article  MathSciNet  Google Scholar 

  13. Lüdtke, S., Schröder, M., Krüger, F., Bader, S., Kirste, T.: State-space abstractions for probabilistic inference: a systematic review. J. Artif. Intell. Res. 63, 789–848 (2018)

    Article  MathSciNet  Google Scholar 

  14. Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Reward function and initial values: better choices for accelerated goal-directed reinforcement learning. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 840–849. Springer, Heidelberg (2006). https://doi.org/10.1007/11840817_87

    Chapter  Google Scholar 

  15. Pineda, L., Zilberstein, S.: Probabilistic planning with reduced models. J. Artif. Intell. Res. 65, 271–306 (2019)

    Article  MathSciNet  Google Scholar 

  16. Steinmetz, M., Hoffmann, J., Buffet, O.: Goal probability analysis in probabilistic planning: exploring and enhancing the state of the art. J. Artif. Intell. Res. 57, 229–271 (2016)

    Article  MathSciNet  Google Scholar 

  17. Szepesvári, C., Littman, M.L.: Generalized markov decision processes: Dynamic-programming and reinforcement-learning algorithms. In: Proceedings of International Conference of Machine Learning, vol. 96 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chris Robinson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Robinson, C. (2023). Learning to Solve Sequential Planning Problems Without Rewards. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2022, Volume 1. FTC 2022 2022. Lecture Notes in Networks and Systems, vol 559. Springer, Cham. https://doi.org/10.1007/978-3-031-18461-1_27

Download citation

Publish with us

Policies and ethics