Abstract
We present algorithms for finding optimal strategies for discounted, infinite-horizon, Determinsitc Markov Decision Processes (DMDPs). Our fastest algorithm has a worst-case running time of O(mn), improving the recent bound of O(mn2) obtained by Andersson and Vorbyov [2006]. We also present a randomized O(m1/2n2)-time algorithm for finding Discounted All-Pairs Shortest Paths (DAPSP), improving an O(mn2)-time algorithm that can be obtained using ideas of Papadimitriou and Tsitsiklis [1987].
- Aho, A., Hopcroft, J., and Ullman, J. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley. Google ScholarDigital Library
- Andersson, D., and Vorobyov, S. 2006. Fast algorithms for monotonic discounted linear programs with two variables per inequality. Tech. rep. NI06019-LAA, Isaac Newton Institute for Mathematical Sciences, Cambridge, UK.Google Scholar
- Bellman, R. 1957. Dynamic Programming. Princeton University Press. Google ScholarDigital Library
- Bertsekas, D. 2001. Dynamic Programming and Optimal Control, 2nd Ed. Athena Scientific. Google ScholarDigital Library
- Björklund, H., and Vorobyov, S. 2005. Combinatorial structure and randomized subexponential algorithms for infinite games. Theor. Comput. Sci. 349, 3, 347--360. Google ScholarDigital Library
- Blum, L., Cucker, F., Shub, M., and Smale, S. 1997. Complexity and Real Computation. Springer. Google ScholarDigital Library
- Cohen, E., and Megiddo, N. 1994. Improved algorithms for linear inequalities with two variables per inequality. SIAM J. Comput. 23, 6, 1313--1347. Google ScholarDigital Library
- Condon, A. 1992. The complexity of stochastic games. Inf. Comput. 96, 203--224. Google ScholarDigital Library
- Cormen, T., Leiserson, C., Rivest, R., and Stein, C. 2001. Introduction to Algorithms, 2nd Ed. The MIT Press. Google ScholarDigital Library
- Dasdan, A. 2004. Experimental analysis of the fastest optimum cycle ratio and mean algorithms. ACM Trans. Des. Autom. Electron. Syst. 9, 4, 385--418. Google ScholarDigital Library
- d'Epenoux, F. 1963. A probabilistic production and inventory problem. Manag. Sci. 10, 1, 98--108.Google ScholarCross Ref
- Derman, C. 1972. Finite State Markov Decision Processes. Academic Press. Google ScholarDigital Library
- Ehrenfeucht, A., and Mycielski, J. 1979. Positional strategies for mean payoff games. Int. J. Game Theory 8, 109--113.Google ScholarDigital Library
- Fredman, M., and Tarjan, R. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34, 3, 596--615. Google ScholarDigital Library
- Georgiadis, L., Goldberg, A., Tarjan, R., and Werneck, R. 2009. An experimental study of minimum mean cycle algorithms. In Proceedings of the 11th Workshop on Algorithm Engineering and Experiments (ALENEX). 1--13.Google Scholar
- Gurvich, V., Karzanov, A., and Khachiyan, L. 1988. Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR Comput. Math. Math. Phys. 28, 85--91. Google ScholarDigital Library
- Halman, N. 2007. Simple stochastic games, parity games, mean payoff games and discounted payoff games are all LP-type problems. Algorithmica 49, 1, 37--50. Google ScholarDigital Library
- Hochbaum, D., and Naor, J. 1994. Simple and fast algorithms for linear and integer programs with two variables per inequality. SIAM J. Comput. 23, 6, 1179--1192. Google ScholarDigital Library
- Howard, R. 1960. Dynamic Programming and Markov Processes. MIT Press.Google Scholar
- Karmarkar, N. 1984. A new polynomial-time algorithm for linear programming. Combinatorica 4, 4, 373--395. Google ScholarDigital Library
- Karp, R. 1978. A characterization of the minimum cycle mean in a digraph. Discr. Math. 23, 3, 309--311.Google ScholarCross Ref
- Khachiyan, L. 1979. A polynomial time algorithm in linear programming. Soviet Math. Dokl. 20, 191--194.Google Scholar
- Littman, M. 1996. Algorithms for sequential decision making. Ph.D. thesis, Brown University. Google ScholarDigital Library
- Littman, M., Dean, T., and Kaelbling, L. 1995. On the complexity of solving markov decision problems. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence (UAI). 394--402. Google ScholarDigital Library
- Ludwig, W. 1995. A subexponential randomized algorithm for the simple stochastic game problem. Inf. Comput. 117, 1, 151--155. Google ScholarDigital Library
- Madani, O. 2000. Complexity results for infinite-horizon markov decision processes. Ph.D. thesis, University of Washington. Google ScholarDigital Library
- Madani, O. 2002a. On policy iteration as a Newton's method and polynomial policy iteration algorithms. In Proceedings of the 18th National Conference on Artificial Intelligence (AAAI). 273--278. Google ScholarDigital Library
- Madani, O. 2002b. Polynomial value iteration algorithms for detrerminstic MDPs. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI). 311--318. Google ScholarDigital Library
- Mansour, Y., and Singh, S. 1999. On the complexity of policy iteration. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI). 401--408. Google ScholarDigital Library
- Melekopoglou, M., and Condon, A. 1994. On the complexity of the policy improvement algorithm for markov decision processes. ORSA J. Comput. 6, 2, 188--192.Google ScholarCross Ref
- Ng, A., Harada, D., and Russell, S. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the 6th International Conference on Machine Learning (ICML). 278--287. Google ScholarDigital Library
- Papadimitriou, C., and Tsitsiklis, J. 1987. The complexity of Markov decision processes. Math. Oper. Res. 12, 3, 441--450. Google ScholarDigital Library
- Puterman, M. 1994. Markov Decision Processes. Wiley.Google Scholar
- Shapley, L. 1953. Stochastic games. Proc. Nat. Acad. Sci. 39, 1095--1100.Google ScholarCross Ref
- Ullman, J., and Yannakakis, M. 1991. High-Probability parallel transitive-closure algorithms. SIAM J. Comput. 20, 1, 100--125. Google ScholarDigital Library
- Ye, Y. 2005. A new complexity result on solving the Markov decision problem. Math. Oper. Res. 30, 3, 733--749. Google ScholarDigital Library
- Young, N., Tarjan, R., and Orlin, J. 1991. Faster parametric shortest path and minimum-balance algorithms. Netw. 21, 205--221.Google ScholarCross Ref
- Zwick, U. 2002. All-pairs shortest paths using bridging sets and rectangular matrix multiplication. J. ACM 49, 289--317. Google ScholarDigital Library
- Zwick, U., and Paterson, M. 1996. The complexity of mean payoff games on graphs. Theor. Comput. Sci. 158, 1--2, 343--359. Google ScholarDigital Library
Index Terms
- Discounted deterministic Markov decision processes and discounted all-pairs shortest paths
Recommendations
Discounted deterministic Markov decision processes and discounted all-pairs shortest paths
SODA '09: Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithmsWe present two new algorithms for finding optimal strategies for discounted, infinite-horizon, Deterministic Markov Decision Processes (DMDP). The first one is an adaptation of an algorithm of Young, Tarjan and Orlin for finding minimum mean weight ...
Continuous Time Discounted Jump Markov Decision Processes: A Discrete-Event Approach
This paper introduces and develops a new approach to the theory of continuous time jump Markov decision processes (CTJMDP). This approach reduces discounted CTJMDPs to discounted semi-Markov decision processes (SMDPs) and eventually to discrete-time ...
Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces
This paper deals with continuous-time Markov decision processes in Polish spaces, under an expected discounted reward criterion. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates ...
Comments