Lower Bounds for Howard’s Algorithm for Finding Minimum Mean-Cost Cycles

Hansen, Thomas Dueholm; Zwick, Uri

doi:10.1007/978-3-642-17517-6_37

Thomas Dueholm Hansen¹⁸ &
Uri Zwick¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6506))

Included in the following conference series:

International Symposium on Algorithms and Computation

1323 Accesses

Abstract

Howard’s policy iteration algorithm is one of the most widely used algorithms for finding optimal policies for controlling Markov Decision Processes (MDPs). When applied to weighted directed graphs, which may be viewed as Deterministic MDPs (DMDPs), Howard’s algorithm can be used to find Minimum Mean-Cost cycles (MMCC). Experimental studies suggest that Howard’s algorithm works extremely well in this context. The theoretical complexity of Howard’s algorithm for finding MMCCs is a mystery. No polynomial time bound is known on its running time. Prior to this work, there were only linear lower bounds on the number of iterations performed by Howard’s algorithm. We provide the first weighted graphs on which Howard’s algorithm performs Ω(n ²) iterations, where n is the number of vertices in the graph.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes

A multi-objective approach for PH-graphs with applications to stochastic shortest paths

Article Open access 24 October 2020

Bounding an Optimal Search Path with a Game of Cop and Robber on Graphs

References

Bellman, R.E.: Dynamic programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Bellman, R.E.: On a routing problem. Quarterly of Applied Mathematics 16, 87–90 (1958)
Article MathSciNet MATH Google Scholar
Dasdan, A.: Experimental analysis of the fastest optimum cycle ratio and mean algorithms. ACM Trans. Des. Autom. Electron. Syst. 9(4), 385–418 (2004)
Article Google Scholar
Derman, C.: Finite state Markov decision processes. Academic Press, London (1972)
MATH Google Scholar
Fearnley, J.: Exponential lower bounds for policy iteration. In: Proc. of 37th ICALP (2010), Preliminaey version available at http://arxiv.org/abs/1003.3418v1
Ford Jr., L.R., Fulkerson, D.R.: Maximal flow through a network. Canadian Journal of Mathematics 8, 399–404 (1956)
Article MathSciNet MATH Google Scholar
Friedmann, O.: An exponential lower bound for the parity game strategy improvement algorithm as we know it. In: Proc. of 24th LICS, pp. 145–156 (2009)
Google Scholar
Georgiadis, L., Goldberg, A.V., Tarjan, R.E., Werneck, R.F.F.: An experimental study of minimum mean cycle algorithms. In: Proc. of 11th ALENEX, pp. 1–13 (2009)
Google Scholar
Goldberg, A.V., Tarjan, R.E.: Finding minimum-cost circulations by canceling negative cycles. Journal of the ACM 36(4), 873–886 (1989)
Article MathSciNet MATH Google Scholar
Hansen, T.D., Miltersen, P.B., Zwick, U.: Strategy iteration is strongly polynomial for 2-player turn-based stochastic games with a constant discount factor. CoRR, abs/1008.0530 (2010)
Google Scholar
Howard, R.A.: Dynamic programming and Markov processes. MIT Press, Cambridge (1960)
MATH Google Scholar
Karp, R.M.: A characterization of the minimum cycle mean in a digraph. Discrete Mathematics 23(3), 309–311 (1978)
Article MathSciNet MATH Google Scholar
Madani, O.: Personal communication (2008)
Google Scholar
Megiddo, N.: Combinatorial optimization with rational objective functions. Mathematics of Operations Research 4(4), 414–424 (1979)
Article MathSciNet MATH Google Scholar
Megiddo, N.: Applying parallel computation algorithms in the design of serial algorithms. Journal of the ACM 30(4), 852–865 (1983)
Article MathSciNet MATH Google Scholar
Puterman, M.L.: Markov decision processes. Wiley, Chichester (1994)
Book MATH Google Scholar
Ye, Y.: The simplex method is strongly polynomial for the Markov decision problem with a fixed discount rate (2010), http://www.stanford.edu/~yyye/simplexmdp1.pdf
Young, N.E., Tarjan, R.E., Orlin, J.B.: Faster parametric shortest path and minimum-balance algorithms. Networks 21, 205–221 (1991)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Aarhus University, Denmark
Thomas Dueholm Hansen
School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
Uri Zwick

Authors

Thomas Dueholm Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Uri Zwick
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, KAIST, Gwahangno 335, 305-701, Yuseong-gu, Daejeon, Korea
Otfried Cheong & Kyung-Yong Chwa &
School of Computer Science and Engineering, Seoul National University, 151-742, Seoul, Korea
Kunsoo Park

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hansen, T.D., Zwick, U. (2010). Lower Bounds for Howard’s Algorithm for Finding Minimum Mean-Cost Cycles. In: Cheong, O., Chwa, KY., Park, K. (eds) Algorithms and Computation. ISAAC 2010. Lecture Notes in Computer Science, vol 6506. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17517-6_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-17517-6_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17516-9
Online ISBN: 978-3-642-17517-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Lower Bounds for Howard’s Algorithm for Finding Minimum Mean-Cost Cycles

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes

A multi-objective approach for PH-graphs with applications to stochastic shortest paths

Bounding an Optimal Search Path with a Game of Cop and Robber on Graphs

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Lower Bounds for Howard’s Algorithm for Finding Minimum Mean-Cost Cycles

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes

A multi-objective approach for PH-graphs with applications to stochastic shortest paths

Bounding an Optimal Search Path with a Game of Cop and Robber on Graphs

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation