Abstract
An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding its optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies. The k best policies, k > 1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policy problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide a new algorithm, based on our theoretical contribution to prove that the k-th best policy differs from the i-th policy, for some i < k, on exactly one state. We show that the time complexity of the algorithm is quadratic in k, but the number of optimal planning problems it solves is linear in k. We demonstrate empirically that the new algorithm has good scalability.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: Structural assumptions and computational leverage. J. of Artificial Intelligence Research 11, 1–94 (1999)
Bonet, B., Geffner, H.: Planning with incomplete information as heuristic search in belief space. In: ICAPS, pp. 52–61 (2000)
Bresina, J.L., Dearden, R., Meuleau, N., Ramkrishnan, S., Smith, D.E., Washington, R.: Planning under continuous time and resource uncertainty: A challenge for AI. In: UAI, pp. 77–84 (2002)
Bresina, J.L., Jónsson, A.K., Morris, P.H., Rajan, K.: Activity planning for the mars exploration rovers. In: ICAPS, pp. 40–49 (2005)
Aberdeen, D., Thiébaux, S., Zhang, L.: Decision-theoretic military operations planning. In: ICAPS, pp. 402–412 (2004)
Musliner, D.J., Carciofini, J., Goldman, R.P., Durfee, E.H., Wu, J., Boddy, M.S.: Flexibly integrating deliberation and execution in decision-theoretic agents. In: ICAPS Workshop on Planning and Plan-Execution for Real-World Systems (2007)
Galand, L., Perny, P.: Search for compromise solutions in multiobjective state space graphs. In: ECAI, pp. 93–97 (2006)
Bryce, D., Cushing, W., Kambhampati, S.: Probabilistic planning is multiobjective! Technical Report ASU CSE TR-07-006 (June 2007)
Nielsen, L.R., Kristensen, A.R.: Finding the k best policies in finite-horizon mdps. European Journal of Operational Research 175(2), 1164–1179 (2006)
Nielsen, L.R., Pretolani, D., Andersen, K.A.: Finding the k shortest hyperpaths using reoptimization. Oper. Res. Lett. 34(2), 155–164 (2006)
Nielsen, L.R., Andersen, K.A., Pretolani, D.: Finding the k shortest hyperpaths. Computers & OR 32, 1477–1497 (2005)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)
Howard, R.: Dynamic Programming and Markov Processes. MIT Press, Cambridge (1960)
Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York (1994)
Littman, M.L., Dean, T., Kaelbling, L.P.: On the complexity of solving Markov decision problems. In: UAI, pp. 394–402 (1995)
Bonet, B.: On the speed of convergence of value iteration on stochastic shortest-path problems. Mathematics of Operations Research 32(2), 365–373 (2007)
Barto, A., Bradtke, S., Singh, S.: Learning to act using real-time dynamic programming. Artificial Intelligence J. 72, 81–138 (1995)
Wingate, D., Seppi, K.D.: Prioritization methods for accelerating MDP solvers. JMLR 6, 851–881 (2005)
Munos, R., Moore, A.: Influence and variance of a Markov chain: Application to adaptive discretization in optimal control. In: CDC (1999)
Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Mathematics of Operations Research 16(3), 580–595 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dai, P., Goldsmith, J. (2009). Finding Best k Policies. In: Rossi, F., Tsoukias, A. (eds) Algorithmic Decision Theory. ADT 2009. Lecture Notes in Computer Science(), vol 5783. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04428-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-04428-1_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04427-4
Online ISBN: 978-3-642-04428-1
eBook Packages: Computer ScienceComputer Science (R0)