skip to main content
research-article

Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

Published:06 April 2010Publication History
Skip Abstract Section

Abstract

We present algorithms for finding optimal strategies for discounted, infinite-horizon, Determinsitc Markov Decision Processes (DMDPs). Our fastest algorithm has a worst-case running time of O(mn), improving the recent bound of O(mn2) obtained by Andersson and Vorbyov [2006]. We also present a randomized O(m1/2n2)-time algorithm for finding Discounted All-Pairs Shortest Paths (DAPSP), improving an O(mn2)-time algorithm that can be obtained using ideas of Papadimitriou and Tsitsiklis [1987].

References

  1. Aho, A., Hopcroft, J., and Ullman, J. 1974. The Design and Analysis of Computer Algorithms. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andersson, D., and Vorobyov, S. 2006. Fast algorithms for monotonic discounted linear programs with two variables per inequality. Tech. rep. NI06019-LAA, Isaac Newton Institute for Mathematical Sciences, Cambridge, UK.Google ScholarGoogle Scholar
  3. Bellman, R. 1957. Dynamic Programming. Princeton University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bertsekas, D. 2001. Dynamic Programming and Optimal Control, 2nd Ed. Athena Scientific. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Björklund, H., and Vorobyov, S. 2005. Combinatorial structure and randomized subexponential algorithms for infinite games. Theor. Comput. Sci. 349, 3, 347--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Blum, L., Cucker, F., Shub, M., and Smale, S. 1997. Complexity and Real Computation. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cohen, E., and Megiddo, N. 1994. Improved algorithms for linear inequalities with two variables per inequality. SIAM J. Comput. 23, 6, 1313--1347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Condon, A. 1992. The complexity of stochastic games. Inf. Comput. 96, 203--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cormen, T., Leiserson, C., Rivest, R., and Stein, C. 2001. Introduction to Algorithms, 2nd Ed. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dasdan, A. 2004. Experimental analysis of the fastest optimum cycle ratio and mean algorithms. ACM Trans. Des. Autom. Electron. Syst. 9, 4, 385--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. d'Epenoux, F. 1963. A probabilistic production and inventory problem. Manag. Sci. 10, 1, 98--108.Google ScholarGoogle ScholarCross RefCross Ref
  12. Derman, C. 1972. Finite State Markov Decision Processes. Academic Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ehrenfeucht, A., and Mycielski, J. 1979. Positional strategies for mean payoff games. Int. J. Game Theory 8, 109--113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fredman, M., and Tarjan, R. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34, 3, 596--615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Georgiadis, L., Goldberg, A., Tarjan, R., and Werneck, R. 2009. An experimental study of minimum mean cycle algorithms. In Proceedings of the 11th Workshop on Algorithm Engineering and Experiments (ALENEX). 1--13.Google ScholarGoogle Scholar
  16. Gurvich, V., Karzanov, A., and Khachiyan, L. 1988. Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR Comput. Math. Math. Phys. 28, 85--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Halman, N. 2007. Simple stochastic games, parity games, mean payoff games and discounted payoff games are all LP-type problems. Algorithmica 49, 1, 37--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hochbaum, D., and Naor, J. 1994. Simple and fast algorithms for linear and integer programs with two variables per inequality. SIAM J. Comput. 23, 6, 1179--1192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Howard, R. 1960. Dynamic Programming and Markov Processes. MIT Press.Google ScholarGoogle Scholar
  20. Karmarkar, N. 1984. A new polynomial-time algorithm for linear programming. Combinatorica 4, 4, 373--395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Karp, R. 1978. A characterization of the minimum cycle mean in a digraph. Discr. Math. 23, 3, 309--311.Google ScholarGoogle ScholarCross RefCross Ref
  22. Khachiyan, L. 1979. A polynomial time algorithm in linear programming. Soviet Math. Dokl. 20, 191--194.Google ScholarGoogle Scholar
  23. Littman, M. 1996. Algorithms for sequential decision making. Ph.D. thesis, Brown University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Littman, M., Dean, T., and Kaelbling, L. 1995. On the complexity of solving markov decision problems. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence (UAI). 394--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ludwig, W. 1995. A subexponential randomized algorithm for the simple stochastic game problem. Inf. Comput. 117, 1, 151--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Madani, O. 2000. Complexity results for infinite-horizon markov decision processes. Ph.D. thesis, University of Washington. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Madani, O. 2002a. On policy iteration as a Newton's method and polynomial policy iteration algorithms. In Proceedings of the 18th National Conference on Artificial Intelligence (AAAI). 273--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Madani, O. 2002b. Polynomial value iteration algorithms for detrerminstic MDPs. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI). 311--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mansour, Y., and Singh, S. 1999. On the complexity of policy iteration. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI). 401--408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Melekopoglou, M., and Condon, A. 1994. On the complexity of the policy improvement algorithm for markov decision processes. ORSA J. Comput. 6, 2, 188--192.Google ScholarGoogle ScholarCross RefCross Ref
  31. Ng, A., Harada, D., and Russell, S. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the 6th International Conference on Machine Learning (ICML). 278--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Papadimitriou, C., and Tsitsiklis, J. 1987. The complexity of Markov decision processes. Math. Oper. Res. 12, 3, 441--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Puterman, M. 1994. Markov Decision Processes. Wiley.Google ScholarGoogle Scholar
  34. Shapley, L. 1953. Stochastic games. Proc. Nat. Acad. Sci. 39, 1095--1100.Google ScholarGoogle ScholarCross RefCross Ref
  35. Ullman, J., and Yannakakis, M. 1991. High-Probability parallel transitive-closure algorithms. SIAM J. Comput. 20, 1, 100--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ye, Y. 2005. A new complexity result on solving the Markov decision problem. Math. Oper. Res. 30, 3, 733--749. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Young, N., Tarjan, R., and Orlin, J. 1991. Faster parametric shortest path and minimum-balance algorithms. Netw. 21, 205--221.Google ScholarGoogle ScholarCross RefCross Ref
  38. Zwick, U. 2002. All-pairs shortest paths using bridging sets and rectangular matrix multiplication. J. ACM 49, 289--317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zwick, U., and Paterson, M. 1996. The complexity of mean payoff games on graphs. Theor. Comput. Sci. 158, 1--2, 343--359. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Algorithms
        ACM Transactions on Algorithms  Volume 6, Issue 2
        March 2010
        373 pages
        ISSN:1549-6325
        EISSN:1549-6333
        DOI:10.1145/1721837
        Issue’s Table of Contents

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 April 2010
        • Accepted: 1 April 2009
        • Revised: 1 March 2009
        • Received: 1 December 2008
        Published in talg Volume 6, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader