Planning Under Uncertainty Through Goal-Driven Action Selection

Saborío, Juan Carlos; Hertzberg, Joachim

doi:10.1007/978-3-030-05453-3_9

Juan Carlos Saborío¹⁴ &
Joachim Hertzberg^14,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11352))

Included in the following conference series:

International Conference on Agents and Artificial Intelligence

1046 Accesses

Abstract

Online planning in domains with uncertainty and partial observability conveys a series of performance challenges: agents must obtain information about the environment, quickly select actions with high reward prospects and avoid very expensive mistakes, while interleaving planning and execution in highly variable and uncertain domains. In order to reduce the amount of mistakes and help an agent focus on directly relevant actions, we propose a goal-driven, action selection method for planning in (PO)MDP’s. This method introduces a reward bonus and a rollout policy for MCTS planners, both of which depend almost exclusively on a clear specification of the goal and produced promising results when planning in large domains of interest to cognitive and mobile robotics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Planning When Goals Change: A Moving Target Search Approach

First return, then explore

Article 24 February 2021

Learning to Solve Sequential Planning Problems Without Rewards

References

Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Chapter Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47, 235–256 (2002)
Article Google Scholar
Silver, D., Veness, J.: Monte-Carlo planning in large POMDPs. Adv. Neural Inf. Process. Syst. 23, 2164–2172 (2010)
Google Scholar
Saborío, J.C., Hertzberg, J.: Towards domain-independent biases for action selection in robotic task-planning under uncertainty. In: Proceedings of the 10th International Conference on Agents and Artificial Intelligence, ICAART, INSTICC, vol. 2, pp. 85–93. SciTePress (2018)
Google Scholar
Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite Horizon. Oper. Res. 21, 1071–1088 (1973)
Article Google Scholar
Cassandra, A.R., Kaelbling, L.P., Littman, M.L.: Acting optimally in partially observable stochastic domains. In: Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, WA, USA, 31 July – 4 August, vol. 2, pp. 1023–1028 (1994)
Google Scholar
Cassandra, A.R., Littman, M.L., Zhang, N.L.: Incremental pruning: a simple, fast, exact method for partially observable markov decision processes. In: Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, UAI 1997, Brown University, Providence, Rhode Island, USA, 1–3 August 1997, pp. 54–61 (1997)
Google Scholar
Smith, T., Simmons, R.: Heuristic search value iteration for POMDPs. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, UAI 2004, pp. 520–527. AUAI Press, Arlington (2004)
Google Scholar
Pineau, J., Gordon, G.J., Thrun, S.: Anytime point-based approximations for large POMDPs. J. Artif. Intell. Res. 27, 335–380 (2006)
Article Google Scholar
Kurniawati, H., Hsu, D., Lee, W.S.: SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Robotics: Science and Systems IV, Eidgenössische Technische Hochschule Zürich, Zurich, Switzerland, 25–28 June 2008 (2008)
Google Scholar
Ong, S.C.W., Png, S.W., Hsu, D., Lee, W.S.: Planning under uncertainty for robotic tasks with mixed observability. Int. J. Rob. Res. 29, 1053–1068 (2010)
Article Google Scholar
Somani, A., Ye, N., Hsu, D., Lee, W.S.: DESPOT: online POMDP planning with regularization. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 1772–1780. Curran Associates, Inc. (2013)
Google Scholar
Pineau, J., Gordon, G., Thrun, S.: Policy-contingent abstraction for robust robot control. In: Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, UAI 2003, pp. 477–484. Morgan Kaufmann Publishers Inc., San Francisco (2003)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2012). (to be published)
Google Scholar
Hester, T., Stone, P.: TEXPLORE: real-time sample-efficient reinforcement learning for robots. Mach. Learn. 90, 385–429 (2013)
Article MathSciNet Google Scholar
Sutton, R., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 181–211 (1999)
Article MathSciNet Google Scholar
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res. 13, 227–303 (2000)
Article MathSciNet Google Scholar
Konidaris, G.: Constructing abstraction hierarchies using a skill-symbol loop. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016, pp. 1648–1654 (2016)
Google Scholar
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 278–287. Morgan Kaufmann (1999)
Google Scholar
Eck, A., Soh, L.K., Devlin, S., Kudenko, D.: Potential-based reward shaping for finite Horizon online POMDP planning. Auton. Agents Multi-agent Syst. 30, 403–445 (2016)
Article Google Scholar

Download references

Acknowledgements

We would like to thank our colleagues Sebastian Pütz and Felix Igelbrink for their suggested reward distribution in the Cellar domain, and the DAAD for supporting this work with a research grant.

Author information

Authors and Affiliations

Institute of Computer Science, University of Osnabrück, Wachsbleiche 27, Osnabrück, Germany
Juan Carlos Saborío & Joachim Hertzberg
DFKI Robotics Innovation Center (Osnabrück), Albert-Einstein-Straße 1, Osnabrück, Germany
Joachim Hertzberg

Authors

Juan Carlos Saborío
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Hertzberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Carlos Saborío .

Editor information

Editors and Affiliations

Leiden University, Leiden, The Netherlands
Jaap van den Herik
University of Porto, Porto, Portugal
Ana Paula Rocha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saborío, J.C., Hertzberg, J. (2019). Planning Under Uncertainty Through Goal-Driven Action Selection. In: van den Herik, J., Rocha, A. (eds) Agents and Artificial Intelligence. ICAART 2018. Lecture Notes in Computer Science(), vol 11352. Springer, Cham. https://doi.org/10.1007/978-3-030-05453-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-05453-3_9
Published: 30 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05452-6
Online ISBN: 978-3-030-05453-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Planning Under Uncertainty Through Goal-Driven Action Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Planning When Goals Change: A Moving Target Search Approach

First return, then explore

Learning to Solve Sequential Planning Problems Without Rewards

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Planning Under Uncertainty Through Goal-Driven Action Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Planning When Goals Change: A Moving Target Search Approach

First return, then explore

Learning to Solve Sequential Planning Problems Without Rewards

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation