skip to main content
research-article

Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

Published: 06 April 2010 Publication History

Abstract

We present algorithms for finding optimal strategies for discounted, infinite-horizon, Determinsitc Markov Decision Processes (DMDPs). Our fastest algorithm has a worst-case running time of O(mn), improving the recent bound of O(mn2) obtained by Andersson and Vorbyov [2006]. We also present a randomized O(m1/2n2)-time algorithm for finding Discounted All-Pairs Shortest Paths (DAPSP), improving an O(mn2)-time algorithm that can be obtained using ideas of Papadimitriou and Tsitsiklis [1987].

References

[1]
Aho, A., Hopcroft, J., and Ullman, J. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley.
[2]
Andersson, D., and Vorobyov, S. 2006. Fast algorithms for monotonic discounted linear programs with two variables per inequality. Tech. rep. NI06019-LAA, Isaac Newton Institute for Mathematical Sciences, Cambridge, UK.
[3]
Bellman, R. 1957. Dynamic Programming. Princeton University Press.
[4]
Bertsekas, D. 2001. Dynamic Programming and Optimal Control, 2nd Ed. Athena Scientific.
[5]
Björklund, H., and Vorobyov, S. 2005. Combinatorial structure and randomized subexponential algorithms for infinite games. Theor. Comput. Sci. 349, 3, 347--360.
[6]
Blum, L., Cucker, F., Shub, M., and Smale, S. 1997. Complexity and Real Computation. Springer.
[7]
Cohen, E., and Megiddo, N. 1994. Improved algorithms for linear inequalities with two variables per inequality. SIAM J. Comput. 23, 6, 1313--1347.
[8]
Condon, A. 1992. The complexity of stochastic games. Inf. Comput. 96, 203--224.
[9]
Cormen, T., Leiserson, C., Rivest, R., and Stein, C. 2001. Introduction to Algorithms, 2nd Ed. The MIT Press.
[10]
Dasdan, A. 2004. Experimental analysis of the fastest optimum cycle ratio and mean algorithms. ACM Trans. Des. Autom. Electron. Syst. 9, 4, 385--418.
[11]
d'Epenoux, F. 1963. A probabilistic production and inventory problem. Manag. Sci. 10, 1, 98--108.
[12]
Derman, C. 1972. Finite State Markov Decision Processes. Academic Press.
[13]
Ehrenfeucht, A., and Mycielski, J. 1979. Positional strategies for mean payoff games. Int. J. Game Theory 8, 109--113.
[14]
Fredman, M., and Tarjan, R. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34, 3, 596--615.
[15]
Georgiadis, L., Goldberg, A., Tarjan, R., and Werneck, R. 2009. An experimental study of minimum mean cycle algorithms. In Proceedings of the 11th Workshop on Algorithm Engineering and Experiments (ALENEX). 1--13.
[16]
Gurvich, V., Karzanov, A., and Khachiyan, L. 1988. Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR Comput. Math. Math. Phys. 28, 85--91.
[17]
Halman, N. 2007. Simple stochastic games, parity games, mean payoff games and discounted payoff games are all LP-type problems. Algorithmica 49, 1, 37--50.
[18]
Hochbaum, D., and Naor, J. 1994. Simple and fast algorithms for linear and integer programs with two variables per inequality. SIAM J. Comput. 23, 6, 1179--1192.
[19]
Howard, R. 1960. Dynamic Programming and Markov Processes. MIT Press.
[20]
Karmarkar, N. 1984. A new polynomial-time algorithm for linear programming. Combinatorica 4, 4, 373--395.
[21]
Karp, R. 1978. A characterization of the minimum cycle mean in a digraph. Discr. Math. 23, 3, 309--311.
[22]
Khachiyan, L. 1979. A polynomial time algorithm in linear programming. Soviet Math. Dokl. 20, 191--194.
[23]
Littman, M. 1996. Algorithms for sequential decision making. Ph.D. thesis, Brown University.
[24]
Littman, M., Dean, T., and Kaelbling, L. 1995. On the complexity of solving markov decision problems. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence (UAI). 394--402.
[25]
Ludwig, W. 1995. A subexponential randomized algorithm for the simple stochastic game problem. Inf. Comput. 117, 1, 151--155.
[26]
Madani, O. 2000. Complexity results for infinite-horizon markov decision processes. Ph.D. thesis, University of Washington.
[27]
Madani, O. 2002a. On policy iteration as a Newton's method and polynomial policy iteration algorithms. In Proceedings of the 18th National Conference on Artificial Intelligence (AAAI). 273--278.
[28]
Madani, O. 2002b. Polynomial value iteration algorithms for detrerminstic MDPs. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI). 311--318.
[29]
Mansour, Y., and Singh, S. 1999. On the complexity of policy iteration. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI). 401--408.
[30]
Melekopoglou, M., and Condon, A. 1994. On the complexity of the policy improvement algorithm for markov decision processes. ORSA J. Comput. 6, 2, 188--192.
[31]
Ng, A., Harada, D., and Russell, S. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the 6th International Conference on Machine Learning (ICML). 278--287.
[32]
Papadimitriou, C., and Tsitsiklis, J. 1987. The complexity of Markov decision processes. Math. Oper. Res. 12, 3, 441--450.
[33]
Puterman, M. 1994. Markov Decision Processes. Wiley.
[34]
Shapley, L. 1953. Stochastic games. Proc. Nat. Acad. Sci. 39, 1095--1100.
[35]
Ullman, J., and Yannakakis, M. 1991. High-Probability parallel transitive-closure algorithms. SIAM J. Comput. 20, 1, 100--125.
[36]
Ye, Y. 2005. A new complexity result on solving the Markov decision problem. Math. Oper. Res. 30, 3, 733--749.
[37]
Young, N., Tarjan, R., and Orlin, J. 1991. Faster parametric shortest path and minimum-balance algorithms. Netw. 21, 205--221.
[38]
Zwick, U. 2002. All-pairs shortest paths using bridging sets and rectangular matrix multiplication. J. ACM 49, 289--317.
[39]
Zwick, U., and Paterson, M. 1996. The complexity of mean payoff games on graphs. Theor. Comput. Sci. 158, 1--2, 343--359.

Cited By

View all
  • (2024)Discounted-Sum Automata with Real-Valued Discount FactorsProceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science10.1145/3661814.3662090(1-14)Online publication date: 8-Jul-2024
  • (2024)Determinization of Integral Discounted-Sum Automata is DecidableFoundations of Software Science and Computation Structures10.1007/978-3-031-57228-9_10(191-211)Online publication date: 5-Apr-2024
  • (2023)On the Comparison of Discounted-Sum Automata with Multiple Discount FactorsFoundations of Software Science and Computation Structures10.1007/978-3-031-30829-1_18(371-391)Online publication date: 21-Apr-2023
  • Show More Cited By

Index Terms

  1. Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Algorithms
      ACM Transactions on Algorithms  Volume 6, Issue 2
      March 2010
      373 pages
      ISSN:1549-6325
      EISSN:1549-6333
      DOI:10.1145/1721837
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 April 2010
      Accepted: 01 April 2009
      Revised: 01 March 2009
      Received: 01 December 2008
      Published in TALG Volume 6, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Markov decision processes
      2. minimum mean weight cycles
      3. shortest paths

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • BSF

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)52
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 19 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Discounted-Sum Automata with Real-Valued Discount FactorsProceedings of the 39th Annual ACM/IEEE Symposium on Logic in Computer Science10.1145/3661814.3662090(1-14)Online publication date: 8-Jul-2024
      • (2024)Determinization of Integral Discounted-Sum Automata is DecidableFoundations of Software Science and Computation Structures10.1007/978-3-031-57228-9_10(191-211)Online publication date: 5-Apr-2024
      • (2023)On the Comparison of Discounted-Sum Automata with Multiple Discount FactorsFoundations of Software Science and Computation Structures10.1007/978-3-031-30829-1_18(371-391)Online publication date: 21-Apr-2023
      • (2020)Optimal Control of Boolean Control Networks with Discounted Cost: An Efficient Approach based on Deterministic Markov Decision Process2020 IEEE 16th International Conference on Control & Automation (ICCA)10.1109/ICCA51439.2020.9264464(588-593)Online publication date: 9-Oct-2020
      • (2018)Extremal Pure Strategies and Monotonicity in Repeated GamesComputational Economics10.1007/s10614-016-9565-449:3(387-404)Online publication date: 29-Dec-2018
      • (2017)Improved strong worst-case upper bounds for MDP planningProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172136(1788-1794)Online publication date: 19-Aug-2017
      • (2015)The Simplex Method is Strongly Polynomial for Deterministic Markov Decision ProcessesMathematics of Operations Research10.1287/moor.2014.069940:4(859-868)Online publication date: Oct-2015
      • (2014)Dantzig's pivoting rule for shortest paths, deterministic MDPs, and minimum cost to time ratio cyclesProceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms10.5555/2634074.2634137(847-860)Online publication date: 5-Jan-2014
      • (2013)The simplex method is strongly polynomial for deterministic Markov decision processesProceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms10.5555/2627817.2627922(1465-1473)Online publication date: 6-Jan-2013
      • (2011)A subexponential lower bound for the random facet algorithm for parity gamesProceedings of the twenty-second annual ACM-SIAM symposium on Discrete algorithms10.5555/2133036.2133055(202-216)Online publication date: 23-Jan-2011
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media