Abstract
In probabilistic planning problems which are usually modeled as Markov Decision Processes (MDPs), it is often difficult, or impossible, to obtain an accurate estimate of the state transition probabilities. This limitation can be overcome by modeling these problems as Markov Decision Processes with imprecise probabilities (MDP-IPs). Robust LAO* and Robust LRTDP are efficient algorithms for solving a special class of MDP-IPs where the probabilities lie in a given interval, known as Bounded-Parameter Stochastic-Shortest Path MDP (BSSP-MDP). However, they do not make clear what assumptions must be made to find a robust solution (the best policy under the worst model). In this paper, we propose a new efficient algorithm for BSSP-MDPs, called Robust ILAO* which has a better performance than Robust LAO* and Robust LRTDP, considered the-state-of-the art of robust probabilistic planning. We also define the assumptions required to ensure a robust solution and prove that Robust ILAO* algorithm converges to optimal values if the initial value of all states is admissible.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Barto A, Bradtke S, Singh S (1995) Learning to act using Real-Time dynamic programming. Artif Intell 72:81–138
Datta A, Choudhary A, Bittner ML, Dougherty ER (2003) External control in Markovian genetic regulatory networks. Mach Learn 52(1-2):169191
Tewari A, Barlett PL (2007) Bounded parameter Markov decision processes with average reward criterion. Springer, Berlin Heidelberg, pp 263–277. proceedings of Learning Theory: 20th Annual Conference on Learning Theory
Bonet B, Geffner H, Labeled RTDP (2003) Improving the convergence of real-time dynamic programming in Proc. AAAI Press:12–21. 13th International Conf. on Automated Planning and Sheduling Trento: Italy:
White III CC, El-Deib HK (1994) Markov decision processes with imprecise transition probabilities. Oper Res 42(4):739–749
Bertsekas D (1995) Programming, Dynamic Control, Optimal Athena scientific, belmont MA
Bertsekas DP, Tsitsiklis JN (1991) An analysis of stochastic shortest path problems. Math Oper Res 16(3):580595
Bryce D, Verdicchio M, Kim S (2010) Planning interventions in biological networks. ACM Trans Intell Syst Technol 1(2):111–140
Wu D, Koutsoukos X (2008) Reachability analysis of uncertain systems using bounded-parameter Markov decision processes. Artif Intell 172(8):945–954
Hansen E, Zilberstein S (2001) LAO*: A heuristic search algorithm that finds solutions with loops. Artif Intell 129:35– 62
Hansen E, Zilberstein S (1999) Solving Markov decision problems using heuristic search. AAAI Technical Report:42–47
Trevizan F W, Cozman F G, de Barros L N (2007) Planning under risk and knightian uncertainty. Hyderabad, India, pp 2023–2028. Inproceedings of International Joint Conferences on Artificial Intelligence
McMahan HB, Likhachev M, Gordon GJ (2005) Bounded realtime dynamic programming: RTDP with monotone upper bounds and performance guarantees:569–576. Inproceedings of the 22nd international conference on Machine Learning (ICML ’05) New York NY
Satia JK, Lave Jr RE (1970) Markovian decision processes with uncertain transition probabilities. Oper Res 21:728–740
Delgado KV, Sanner S, de Barros LN (2011) Efficient solutions to factored MDPs with imprecise transition probabilities. Artif Intell 175(9-10):1498–1527
Delgado KV, de Barros LN, Cozman FG, Sanner S (2011) Using mathematical programming to solve factored Markov decision processes with imprecise probabilities. Int J Approx Reason (IJAR) 52(7):1000–1017
Delgado KV, de Barros LN, Dias DB, Sanner S (2015) Real-time dynamic programming for Markov Decision Processes with imprecise probabilities. Artif Intell 230:192–223
Hauskrecht M (1997) Dynamic decision making in stochastic partially observable medical domains. Lecture Notes in Artificial Intelligence 1211:296–299. Ischemic heart disease example, 6th Conference on Artificial Intelligence in Medicine, Springer
Puterman M L, Processes Markov Decision (1994) Discrete Stochastic Dynamic Programming, 1st ed, New York, NY, USA: John Wiley & Sons Inc
Buffet O, Aberdeen D (2005) Robust planning with (l)RTDP, in Proc. of the 19th. Int Joint Conf:1214–1219. on Artificial Intelligence (IJCAI05)
Givan R, Leach S, Dean T (2000) Bounded-parameter Markov Decision Processes. Artif Intell 122:71–109
Pal R, Datta A, Dougherty ER (2008) Robust intervention in probabilistic boolean networks. IEEE Trans Signal Process 56(3):1280–1294
Sanner S, Goetschalckx R, Driessens K, Shani G (2009) Bayesian realtime dynamic programming, in 21st International Joint Conference on Artifical Intelligence (IJCAI-09). Kaufmann Publishers Inc., San Francisco, CA, pp 1784–1789
Cui S, Sun J, Yin M, Lu S (2006) Solving uncertain Markov decision problems: an Interval-Based method, second international conference. ICNC:948–957
Patek SD, Bertsekas DP (1999) Stochastic shortest path games. SIAM J Control Optim 37:804–824
Witwicki SJ, Melo FS, Capitan J, Spaan MTJ (2013) A flexible approach to modeling unpredictable events in MDPs. ICAPS:260–268. proceedings of the Twenty-Third International Conference on Automated Planning and Scheduling
Acknowledgments
We thank the São Paulo Research Foundation for the financial support (FAPESP grant #2015/01587-0).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Moreira, D.A.M., Delgado, K.V. & Nunes de Barros, L. Robust probabilistic planning with ilao. Appl Intell 45, 662–672 (2016). https://doi.org/10.1007/s10489-016-0780-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-016-0780-4