research-article

Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

Authors:

Uri ZwickAuthors Info & Claims

ACM Transactions on Algorithms (TALG), Volume 6, Issue 2

Article No.: 33, Pages 1 - 25

https://doi.org/10.1145/1721837.1721849

Published: 06 April 2010 Publication History

Abstract

We present algorithms for finding optimal strategies for discounted, infinite-horizon, Determinsitc Markov Decision Processes (DMDPs). Our fastest algorithm has a worst-case running time of O(mn), improving the recent bound of O(mn²) obtained by Andersson and Vorbyov [2006]. We also present a randomized O(m^1/2n²)-time algorithm for finding Discounted All-Pairs Shortest Paths (DAPSP), improving an O(mn²)-time algorithm that can be obtained using ideas of Papadimitriou and Tsitsiklis [1987].

References

[1]

Aho, A., Hopcroft, J., and Ullman, J. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley.

Digital Library

[2]

Andersson, D., and Vorobyov, S. 2006. Fast algorithms for monotonic discounted linear programs with two variables per inequality. Tech. rep. NI06019-LAA, Isaac Newton Institute for Mathematical Sciences, Cambridge, UK.

[3]

Bellman, R. 1957. Dynamic Programming. Princeton University Press.

Digital Library

[4]

Bertsekas, D. 2001. Dynamic Programming and Optimal Control, 2nd Ed. Athena Scientific.

Digital Library

[5]

Björklund, H., and Vorobyov, S. 2005. Combinatorial structure and randomized subexponential algorithms for infinite games. Theor. Comput. Sci. 349, 3, 347--360.

Digital Library

[6]

Blum, L., Cucker, F., Shub, M., and Smale, S. 1997. Complexity and Real Computation. Springer.

Digital Library

[7]

Cohen, E., and Megiddo, N. 1994. Improved algorithms for linear inequalities with two variables per inequality. SIAM J. Comput. 23, 6, 1313--1347.

Digital Library

[8]

Condon, A. 1992. The complexity of stochastic games. Inf. Comput. 96, 203--224.

Digital Library

[9]

Cormen, T., Leiserson, C., Rivest, R., and Stein, C. 2001. Introduction to Algorithms, 2nd Ed. The MIT Press.

Digital Library

[10]

Dasdan, A. 2004. Experimental analysis of the fastest optimum cycle ratio and mean algorithms. ACM Trans. Des. Autom. Electron. Syst. 9, 4, 385--418.

Digital Library

[11]

d'Epenoux, F. 1963. A probabilistic production and inventory problem. Manag. Sci. 10, 1, 98--108.

[12]

Derman, C. 1972. Finite State Markov Decision Processes. Academic Press.

Digital Library

[13]

Ehrenfeucht, A., and Mycielski, J. 1979. Positional strategies for mean payoff games. Int. J. Game Theory 8, 109--113.

Digital Library

[14]

Fredman, M., and Tarjan, R. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34, 3, 596--615.

Digital Library

[15]

Georgiadis, L., Goldberg, A., Tarjan, R., and Werneck, R. 2009. An experimental study of minimum mean cycle algorithms. In Proceedings of the 11th Workshop on Algorithm Engineering and Experiments (ALENEX). 1--13.

[16]

Gurvich, V., Karzanov, A., and Khachiyan, L. 1988. Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR Comput. Math. Math. Phys. 28, 85--91.

Digital Library

[17]

Halman, N. 2007. Simple stochastic games, parity games, mean payoff games and discounted payoff games are all LP-type problems. Algorithmica 49, 1, 37--50.

Digital Library

[18]

Hochbaum, D., and Naor, J. 1994. Simple and fast algorithms for linear and integer programs with two variables per inequality. SIAM J. Comput. 23, 6, 1179--1192.

Digital Library

[19]

Howard, R. 1960. Dynamic Programming and Markov Processes. MIT Press.

[20]

Karmarkar, N. 1984. A new polynomial-time algorithm for linear programming. Combinatorica 4, 4, 373--395.

Digital Library

[21]

Karp, R. 1978. A characterization of the minimum cycle mean in a digraph. Discr. Math. 23, 3, 309--311.

[22]

Khachiyan, L. 1979. A polynomial time algorithm in linear programming. Soviet Math. Dokl. 20, 191--194.

[23]

Littman, M. 1996. Algorithms for sequential decision making. Ph.D. thesis, Brown University.

Digital Library

[24]

Littman, M., Dean, T., and Kaelbling, L. 1995. On the complexity of solving markov decision problems. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence (UAI). 394--402.

Digital Library

[25]

Ludwig, W. 1995. A subexponential randomized algorithm for the simple stochastic game problem. Inf. Comput. 117, 1, 151--155.

Digital Library

[26]

Madani, O. 2000. Complexity results for infinite-horizon markov decision processes. Ph.D. thesis, University of Washington.

Digital Library

[27]

Madani, O. 2002a. On policy iteration as a Newton's method and polynomial policy iteration algorithms. In Proceedings of the 18th National Conference on Artificial Intelligence (AAAI). 273--278.

Digital Library

[28]

Madani, O. 2002b. Polynomial value iteration algorithms for detrerminstic MDPs. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI). 311--318.

Digital Library

[29]

Mansour, Y., and Singh, S. 1999. On the complexity of policy iteration. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI). 401--408.

Digital Library

[30]

Melekopoglou, M., and Condon, A. 1994. On the complexity of the policy improvement algorithm for markov decision processes. ORSA J. Comput. 6, 2, 188--192.

[31]

Ng, A., Harada, D., and Russell, S. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the 6th International Conference on Machine Learning (ICML). 278--287.

Digital Library

[32]

Papadimitriou, C., and Tsitsiklis, J. 1987. The complexity of Markov decision processes. Math. Oper. Res. 12, 3, 441--450.

Digital Library

[33]

Puterman, M. 1994. Markov Decision Processes. Wiley.

[34]

Shapley, L. 1953. Stochastic games. Proc. Nat. Acad. Sci. 39, 1095--1100.

[35]

Ullman, J., and Yannakakis, M. 1991. High-Probability parallel transitive-closure algorithms. SIAM J. Comput. 20, 1, 100--125.

Digital Library

[36]

Ye, Y. 2005. A new complexity result on solving the Markov decision problem. Math. Oper. Res. 30, 3, 733--749.

Digital Library

[37]

Young, N., Tarjan, R., and Orlin, J. 1991. Faster parametric shortest path and minimum-balance algorithms. Netw. 21, 205--221.

[38]

Zwick, U. 2002. All-pairs shortest paths using bridging sets and rectangular matrix multiplication. J. ACM 49, 289--317.

Digital Library

[39]

Zwick, U., and Paterson, M. 1996. The complexity of mean payoff games on graphs. Theor. Comput. Sci. 158, 1--2, 343--359.

Digital Library

Cited By

Boker USobocinski PLago UEsparza J(2024)Discounted-Sum Automata with Real-Valued Discount FactorsProceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science10.1145/3661814.3662090(1-14)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3661814.3662090
Almagor SDafni N(2024)Determinization of Integral Discounted-Sum Automata is DecidableFoundations of Software Science and Computation Structures10.1007/978-3-031-57228-9_10(191-211)Online publication date: 5-Apr-2024
https://doi.org/10.1007/978-3-031-57228-9_10
Boker UHefetz G(2023)On the Comparison of Discounted-Sum Automata with Multiple Discount FactorsFoundations of Software Science and Computation Structures10.1007/978-3-031-30829-1_18(371-391)Online publication date: 21-Apr-2023
https://doi.org/10.1007/978-3-031-30829-1_18
Show More Cited By

Index Terms

Discounted deterministic Markov decision processes and discounted all-pairs shortest paths
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms
2. Theory of computation
  1. Randomness, geometry and discrete structures

Recommendations

Discounted deterministic Markov decision processes and discounted all-pairs shortest paths
SODA '09: Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms

We present two new algorithms for finding optimal strategies for discounted, infinite-horizon, Deterministic Markov Decision Processes (DMDP). The first one is an adaptation of an algorithm of Young, Tarjan and Orlin for finding minimum mean weight ...
Continuous Time Discounted Jump Markov Decision Processes: A Discrete-Event Approach

This paper introduces and develops a new approach to the theory of continuous time jump Markov decision processes (CTJMDP). This approach reduces discounted CTJMDPs to discounted semi-Markov decision processes (SMDPs) and eventually to discrete-time ...
Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces

This paper deals with continuous-time Markov decision processes in Polish spaces, under an expected discounted reward criterion. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Algorithms

ACM Transactions on Algorithms Volume 6, Issue 2

March 2010

373 pages

ISSN:1549-6325

EISSN:1549-6333

DOI:10.1145/1721837

Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 April 2010

Accepted: 01 April 2009

Revised: 01 March 2009

Received: 01 December 2008

Published in TALG Volume 6, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

BSF

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
513
Total Downloads

Downloads (Last 12 months)52
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Boker USobocinski PLago UEsparza J(2024)Discounted-Sum Automata with Real-Valued Discount FactorsProceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science10.1145/3661814.3662090(1-14)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3661814.3662090
Almagor SDafni N(2024)Determinization of Integral Discounted-Sum Automata is DecidableFoundations of Software Science and Computation Structures10.1007/978-3-031-57228-9_10(191-211)Online publication date: 5-Apr-2024
https://doi.org/10.1007/978-3-031-57228-9_10
Boker UHefetz G(2023)On the Comparison of Discounted-Sum Automata with Multiple Discount FactorsFoundations of Software Science and Computation Structures10.1007/978-3-031-30829-1_18(371-391)Online publication date: 21-Apr-2023
https://doi.org/10.1007/978-3-031-30829-1_18
Gao SXiang CLee T(2020)Optimal Control of Boolean Control Networks with Discounted Cost: An Efficient Approach based on Deterministic Markov Decision Process2020 IEEE 16th International Conference on Control & Automation (ICCA)10.1109/ICCA51439.2020.9264464(588-593)Online publication date: 9-Oct-2020
https://doi.org/10.1109/ICCA51439.2020.9264464
Berg K(2018)Extremal Pure Strategies and Monotonicity in Repeated GamesComputational Economics10.1007/s10614-016-9565-449:3(387-404)Online publication date: 29-Dec-2018
https://dl.acm.org/doi/10.1007/s10614-016-9565-4
Gupta AKalyanakrishnan S(2017)Improved strong worst-case upper bounds for MDP planningProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172136(1788-1794)Online publication date: 19-Aug-2017
https://dl.acm.org/doi/10.5555/3172077.3172136
Post IYe Y(2015)The Simplex Method is Strongly Polynomial for Deterministic Markov Decision ProcessesMathematics of Operations Research10.1287/moor.2014.069940:4(859-868)Online publication date: Oct-2015
https://doi.org/10.1287/moor.2014.0699
Hansen TKaplan HZwick UChekuri C(2014)Dantzig's pivoting rule for shortest paths, deterministic MDPs, and minimum cost to time ratio cyclesProceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms10.5555/2634074.2634137(847-860)Online publication date: 5-Jan-2014
https://dl.acm.org/doi/10.5555/2634074.2634137
Post IYe YKhanna S(2013)The simplex method is strongly polynomial for deterministic Markov decision processesProceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms10.5555/2627817.2627922(1465-1473)Online publication date: 6-Jan-2013
https://dl.acm.org/doi/10.5555/2627817.2627922
Friedmann OHansen TZwick URandall D(2011)A subexponential lower bound for the random facet algorithm for parity gamesProceedings of the twenty-second annual ACM-SIAM symposium on Discrete algorithms10.5555/2133036.2133055(202-216)Online publication date: 23-Jan-2011
https://dl.acm.org/doi/10.5555/2133036.2133055
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents