Abstract
Howard’s policy iteration algorithm is one of the most widely used algorithms for finding optimal policies for controlling Markov Decision Processes (MDPs). When applied to weighted directed graphs, which may be viewed as Deterministic MDPs (DMDPs), Howard’s algorithm can be used to find Minimum Mean-Cost cycles (MMCC). Experimental studies suggest that Howard’s algorithm works extremely well in this context. The theoretical complexity of Howard’s algorithm for finding MMCCs is a mystery. No polynomial time bound is known on its running time. Prior to this work, there were only linear lower bounds on the number of iterations performed by Howard’s algorithm. We provide the first weighted graphs on which Howard’s algorithm performs Ω(n 2) iterations, where n is the number of vertices in the graph.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bellman, R.E.: Dynamic programming. Princeton University Press, Princeton (1957)
Bellman, R.E.: On a routing problem. Quarterly of Applied Mathematics 16, 87–90 (1958)
Dasdan, A.: Experimental analysis of the fastest optimum cycle ratio and mean algorithms. ACM Trans. Des. Autom. Electron. Syst. 9(4), 385–418 (2004)
Derman, C.: Finite state Markov decision processes. Academic Press, London (1972)
Fearnley, J.: Exponential lower bounds for policy iteration. In: Proc. of 37th ICALP (2010), Preliminaey version available at http://arxiv.org/abs/1003.3418v1
Ford Jr., L.R., Fulkerson, D.R.: Maximal flow through a network. Canadian Journal of Mathematics 8, 399–404 (1956)
Friedmann, O.: An exponential lower bound for the parity game strategy improvement algorithm as we know it. In: Proc. of 24th LICS, pp. 145–156 (2009)
Georgiadis, L., Goldberg, A.V., Tarjan, R.E., Werneck, R.F.F.: An experimental study of minimum mean cycle algorithms. In: Proc. of 11th ALENEX, pp. 1–13 (2009)
Goldberg, A.V., Tarjan, R.E.: Finding minimum-cost circulations by canceling negative cycles. Journal of the ACM 36(4), 873–886 (1989)
Hansen, T.D., Miltersen, P.B., Zwick, U.: Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. CoRR, abs/1008.0530 (2010)
Howard, R.A.: Dynamic programming and Markov processes. MIT Press, Cambridge (1960)
Karp, R.M.: A characterization of the minimum cycle mean in a digraph. Discrete Mathematics 23(3), 309–311 (1978)
Madani, O.: Personal communication (2008)
Megiddo, N.: Combinatorial optimization with rational objective functions. Mathematics of Operations Research 4(4), 414–424 (1979)
Megiddo, N.: Applying parallel computation algorithms in the design of serial algorithms. Journal of the ACM 30(4), 852–865 (1983)
Puterman, M.L.: Markov decision processes. Wiley, Chichester (1994)
Ye, Y.: The simplex method is strongly polynomial for the Markov decision problem with a fixed discount rate (2010), http://www.stanford.edu/~yyye/simplexmdp1.pdf
Young, N.E., Tarjan, R.E., Orlin, J.B.: Faster parametric shortest path and minimum-balance algorithms. Networks 21, 205–221 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hansen, T.D., Zwick, U. (2010). Lower Bounds for Howard’s Algorithm for Finding Minimum Mean-Cost Cycles. In: Cheong, O., Chwa, KY., Park, K. (eds) Algorithms and Computation. ISAAC 2010. Lecture Notes in Computer Science, vol 6506. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17517-6_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-17517-6_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17516-9
Online ISBN: 978-3-642-17517-6
eBook Packages: Computer ScienceComputer Science (R0)