Lazy Planning under Uncertainty by Optimizing Decisions on an Ensemble of Incomplete Disturbance Trees

Defourny, Boris; Ernst, Damien; Wehenkel, Louis

doi:10.1007/978-3-540-89722-4_1

Boris Defourny³,
Damien Ernst³ &
Louis Wehenkel³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5323))

Included in the following conference series:

European Workshop on Reinforcement Learning

1114 Accesses

Abstract

This paper addresses the problem of solving discrete-time optimal sequential decision making problems having a disturbance space W composed of a finite number of elements. In this context, the problem of finding from an initial state x ₀ an optimal decision strategy can be stated as an optimization problem which aims at finding an optimal combination of decisions attached to the nodes of a disturbance tree modeling all possible sequences of disturbances w ₀, w ₁, ..., \(w_{T-1} \in W^T\) over the optimization horizon T. A significant drawback of this approach is that the resulting optimization problem has a search space which is the Cartesian product of O(|W|^T − 1) decision spaces U, which makes the approach computationally impractical as soon as the optimization horizon grows, even if W has just a handful of elements. To circumvent this difficulty, we propose to exploit an ensemble of randomly generated incomplete disturbance trees of controlled complexity, to solve their induced optimization problems in parallel, and to combine their predictions at time t = 0 to obtain a (near-)optimal first-stage decision. Because this approach postpones the determination of the decisions for subsequent stages until additional information about the realization of the uncertain process becomes available, we call it lazy. Simulations carried out on a robot corridor navigation problem show that even for small incomplete trees, this approach can lead to near-optimal decisions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement

Article Open access 10 June 2020

Multi-agent Rapidly-exploring Pseudo-random Tree

Article 07 March 2017

Multi-Vehicle Adaptive Planning with Online Estimated Cost Due to Disturbance Forces

References

Maciejowski, J.: Predictive Control with Constraints. Prentice Hall, Englewood Cliffs (2001)
MATH Google Scholar
Morari, M., Lee, J.: Model predictive control: past, present and future. Computers and Chemical Engineering 23, 667–682 (1999)
Article Google Scholar
Birge, J., Louveaux, F.: Introduction to Stochastic Programming. Springer, New York (1997)
MATH Google Scholar
Launchbury, J.: A natural semantics for lazy evaluation. In: POPL 1993: Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 144–154. ACM, New York (1993)
Google Scholar
Friedman, J., Kohavi, R., Yun, Y.: Lazy decision trees. In: Proc. of 13th National Conference on Artificial Intelligence, AAAI 1996. Part 1(of 2), pp. 717–724 (1996)
Google Scholar
Heitsch, H., Römisch, W., Strugarek, C.: Stability of multistage stochastic programs. SIAM Journal on Optimization 17(2), 511–525 (2006)
Article MathSciNet MATH Google Scholar
Römisch, W.: Stability of stochastic programming problems. In: Ruszczyński, A., Shapiro, A. (eds.) Stochastic Programming. Handbooks in Operations Research and Management Science, vol. 10, pp. 483–554. Elsevier, Amsterdam (2003)
Google Scholar
Dempster, M.: Sequential importance sampling algorithms for dynamic stochastic programming. Annals of Operations Research 84, 153–184 (1998)
MathSciNet Google Scholar
Shapiro, A.: Monte Carlo sampling methods. In: Ruszczyński, A., Shapiro, A. (eds.) Stochastic Programming. Handbooks in Operations Research and Management Science, vol. 10, pp. 353–425. Elsevier, Amsterdam (2003)
Google Scholar
Høyland, K., Wallace, S.: Generating scenario trees for multistage decision problems. Management Science 47(2), 295–307 (2001)
Article MATH Google Scholar
Hochreiter, R., Pflug, G.: Financial scenario generation for stochastic multi-stage decision processes as facility location problems. Annals of Operations Research 152, 257–272 (2007)
Article MathSciNet MATH Google Scholar
Rachev, S., Römisch, W.: Quantitative stability in stochastic programming: The method of probability metrics. Mathematics of Operations Research 27(4), 792–818 (2002)
Article MathSciNet MATH Google Scholar
Ernst, D., Glavic, M., Capitanescu, F., Wehenkel, L.: Reinforcement learning versus model predictive control: a comparison on a power system problem. IEEE Transactions on Systems, Man and Cybernetics - Part B (to appear, 2008)
Google Scholar
Kothare, M., Balakrishnan, V., Morari, M.: Robust constrained model predictive control using matrix inequalities. Automatica 32, 1361–1379 (1996)
Article MathSciNet MATH Google Scholar
Nesterov, Y., Vial, J.P.: Confidence level solutions for stochastic programming. Automatica 44(6), 1559–1568 (2008)
Article MathSciNet MATH Google Scholar
Schapire, R.: The strength of weak learnability. Machine Learning 5(2), 197–227 (1990)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH Google Scholar
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)
MathSciNet MATH Google Scholar
Sutton, R.: Generalization in reinforcement learning: successful examples using sparse coarse coding. Advances in Neural Information Processing Systems 8, 1038–1044 (1996)
Google Scholar
Kearns, M., Mansour, Y., Ng, A.: A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning 49(2-3), 193–208 (2002)
Article MATH Google Scholar
Rubinstein, R., Kroese, D.: The Cross-Entropy Method. A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. In: Information Science and Statistics. Springer, Heidelberg (2004)
Google Scholar
Cassandra, A., Kaelbling, L., Littman, M.: Acting optimally in partially observable stochastic domains. In: Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI 1994), Seattle, Washington, USA, vol. 2, pp. 1023–1028. AAAI Press/MIT Press, Menlo Park (1994)
Google Scholar
Ng, A., Jordan, M.: PEGASUS: a policy search method for large MDPs and POMDPs. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 406–415 (1999)
Google Scholar
Defourny, B.: Approximate solution to multistage stochastic programs with ensembles of randomized scenario trees. Master’s thesis, University of Liège, Department of Electrical Engineering and Computer Science (2007)
Google Scholar
Defourny, B., Wehenkel, L.: Averaging decisions from an ensemble of scenario trees: a validation on newsvendor problems (submitted, 2008)
Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Sutton, R., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems 12, 1057–1063 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, University of Liège, Grande Traverse, 10, Sart-Tilman, B-4000, Liège, Belgium
Boris Defourny, Damien Ernst & Louis Wehenkel

Authors

Boris Defourny
View author publications
You can also search for this author in PubMed Google Scholar
Damien Ernst
View author publications
You can also search for this author in PubMed Google Scholar
Louis Wehenkel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INRIA Lille-Nord Europe, 59650, Villeneuve d’Ascq, France
Sertan Girgin
INRIA, LIFL, CNRS, Université de Lille, Villeneuve d’Ascq, France
Manuel Loth , Rémi Munos , Philippe Preux & Daniil Ryabko , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Defourny, B., Ernst, D., Wehenkel, L. (2008). Lazy Planning under Uncertainty by Optimizing Decisions on an Ensemble of Incomplete Disturbance Trees. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds) Recent Advances in Reinforcement Learning. EWRL 2008. Lecture Notes in Computer Science(), vol 5323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89722-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-540-89722-4_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89721-7
Online ISBN: 978-3-540-89722-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics