Skip to main content
Log in

On Canonical Forms for Zero-Sum Stochastic Mean Payoff Games

  • Published:
Dynamic Games and Applications Aims and scope Submit manuscript

Abstract

We consider two-person zero-sum mean payoff undiscounted stochastic games and obtain sufficient conditions for the existence of a saddle point in uniformly optimal stationary strategies. Namely, these conditions enable us to bring the game, by applying potential transformations, to a canonical form in which locally optimal strategies are globally optimal, and hence the value for every initial position and the optimal strategies of both players can be obtained by playing the local game at each state. We show that these conditions hold for the class of additive transition (AT) games, that is, the special case when the transitions at each state can be decomposed into two parts, each controlled completely by one of the two players. An important special case of AT-games form the so-called BWR-games which are played by two players on a directed graph with positions of three types: Black, White and Random. We give an independent proof for the existence of a canonical form in such games, and use this result to derive the existence of a canonical form (and hence, of a saddle point in uniformly optimal stationary strategies) in a wide class of games, which includes stochastic games with perfect information (PI), switching controller (SC) games and additive rewards, additive transition (ARAT) games. Unlike the proof for AT-games, our proof for the BWR-case does not rely on the existence of a saddle point in stationary strategies. We also derive some algorithmic consequences from these our reductions to BWR-games, in terms of solving PI-, and ARAT-games in sub-exponential time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1

Similar content being viewed by others

Notes

  1. Shapley’s original stochastic games were assumed to have positive stopping probabilities, i.e., at each state v, \(\sum_{u\in V} p_{k\ell}^{vu} <1\), and with probability \(1-\sum_{u\in V} p_{k\ell}^{vu}\), the game stops at state v if actions k and are selected by the players.

  2. Note that a BW-game on an arbitrary digraph G=(V B V W ,E) can be reduced to a BW-game on a bipartite graph \(G=(V_{B}'\cup V_{W}', E')\) (where the values are halved) by splitting every vV B into two nodes \(v'\in V_{W}'\) and \(v''\in V_{B}'\) with an additional arc (v′,v″)∈E′, and subdividing each arc (v,u)∈E, where vV W , with a new node \(v_{u}\in V_{B}'\); new arcs have reward 0.

  3. We thank an anonymous reviewer for pointing out this connection to us.

  4. log denotes the logarithm to the base 2

  5. This assumption is without loss of generality since one can add a loop to each terminal vertex.

References

  1. Andersson D, Miltersen PB (2009) The complexity of solving stochastic games on graphs. In: Proc 20th ISAAC. LNCS, vol 5878, pp 112–121

    Google Scholar 

  2. Beffara E, Vorobyov S (2001) Adapting Gurvich-Karzanov-Khachiyan’s algorithm for parity games: Implementation and experimentation. Technical Report 2001-020, Department of Information Technology, Uppsala University, available at https://www.it.uu.se/research/reports/#2001

  3. Beffara E, Vorobyov S (2001) Is randomized Gurvich-Karzanov-Khachiyan’s algorithm for parity games polynomial? Technical Report 2001-025, Department of Information Technology, Uppsala University, available at https://www.it.uu.se/research/reports/#2001

  4. Blackwell D (1962) Discrete dynamic programming. Ann Math Stat 33:719–726

    Article  MathSciNet  MATH  Google Scholar 

  5. Boros E, Elbassioni K, Gurvich V, Makino K (2009) Every stochastic game with perfect information admits a canonical form. RRR-09-2009, RUTCOR, Rutgers University

  6. Boros E, Elbassioni K, Gurvich V, Makino K (2012) Discounted approximations of undiscounted stochastic games and Markov decision processes are already poor in the almost deterministic case. RRR-24-2012, RUTCOR, Rutgers University

  7. Boros E, Elbassioni K, Gurvich V, Makino K (2010) A pumping algorithm for ergodic stochastic mean payoff games with perfect information. In: Proc 14th IPCO. LNCS, vol 6080. Springer, Berlin, pp 341–354

    Google Scholar 

  8. Boros E, Elbassioni K, Gurvich V, Makino K (2012) On canonical forms for two-person zero-sum limit average payoff stochastic games. RRR-15-2012, RUTCOR, Rutgers University

  9. Boros E, Elbassioni K, Gurvich V, Makino K (2012) A potential reduction algorithm for two-person zero-sum limiting average payoff stochastic games. RRR-13-2012, RUTCOR, Rutgers University

  10. Bewley T, Kohlberg E (1978) On stochastic games with stationary optimal strategies. Math Oper Res 3(2):104–125

    Article  MathSciNet  MATH  Google Scholar 

  11. Chatterjee K, Henzinger TA (2008) Reduction of stochastic parity to stochastic mean-payoff games. Inf Process Lett 106(1):1–7

    Article  MathSciNet  MATH  Google Scholar 

  12. Chatterjee K, Jurdziński M, Henzinger TA (2004) Quantitative stochastic parity games. In: Proc 15th SODA, pp 121–130

    Google Scholar 

  13. Condon A (1992) The complexity of stochastic games. Inf Comput 96:203–224

    Article  MathSciNet  MATH  Google Scholar 

  14. Eherenfeucht A, Mycielski J (1979) Positional strategies for mean payoff games. Int J Game Theory 8:109–113

    Article  Google Scholar 

  15. Federgruen A (1980) Successive approximation methods in undiscounted stochastic games. Oper Res 1:794–810

    Article  MathSciNet  Google Scholar 

  16. Filar JA (1981) Ordered field property for stochastic games when the player who controls transitions changes from state to state. J Optim Theory Appl 34(4):503–515

    Article  MathSciNet  MATH  Google Scholar 

  17. Flesch J, Thuijsman F, Vrieze OJ (2007) Stochastic games with additive transitions. Eur J Oper Res 179(2):483–497

    Article  MathSciNet  MATH  Google Scholar 

  18. Gallai T (1958) Maximum-minimum Sätze über Graphen. Acta Math Acad Sci Hung 9:395–434

    Article  MathSciNet  MATH  Google Scholar 

  19. Gillette D (1957) Stochastic games with zero stop probabilities. In: Dresher M, Tucker AW, Wolfe P (eds) Contribution to the theory of games III. Annals of mathematics studies, vol 39. Princeton University Press, Princeton, pp 179–187

    Google Scholar 

  20. Gurvich V, Karzanov A, Khachiyan L (1988) Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR Comput Math Math Phys 28:85–91

    Article  MathSciNet  MATH  Google Scholar 

  21. Halman N (2007) Simple stochastic games, parity games, mean payoff games and discounted payoff games are all LP-type problems. Algorithmica 49(1):37–50

    Article  MathSciNet  MATH  Google Scholar 

  22. Hardy GH, Littlewood JE (1931) Notes on the theory of series (xvi): two Tauberian theorems. J Lond Math Soc 6:281–286

    Article  MathSciNet  Google Scholar 

  23. Hoffman AJ, Karp RM (1966) On nonterminating stochastic games. Manag Sci, Ser A 12(5):359–370

    MathSciNet  MATH  Google Scholar 

  24. Howard RA (1960) Dynamic programming and Markov processes. Technology press and Willey, New York

    MATH  Google Scholar 

  25. Jurdziński M (1998) Deciding the winner in parity games is in UP ∩ co-UP. Inf Process Lett 68(3):119–124

    Article  Google Scholar 

  26. Jurdziński M, Paterson M, Zwick U (2006) A deterministic subexponential algorithm for solving parity games. In: Proc 17th SODA, pp 117–123

    Chapter  Google Scholar 

  27. Karp RM (1978) A characterization of the minimum cycle mean in a digraph. Discrete Math 23:309–311

    MathSciNet  MATH  Google Scholar 

  28. Kemeny JG, Snell JL (1963) Finite Markov chains. Springer, Berlin

    Google Scholar 

  29. Korevaar J (2010) Tauberian theory: a century of developments. Grundlehren der mathematischen Wissenschaften. Springer, Berlin

    Google Scholar 

  30. Kratsch D, McConnell RM, Mehlhorn K, Spinrad JP (2003) Certifying algorithms for recognizing interval graphs and permutation graphs. In: SODA’03: proceedings of the fourteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, Philadelphia, pp 158–167

    Google Scholar 

  31. Krishnamurthy N, Parthasarathy T, Ravindran G (2010) Orderfield property of mixtures of stochastic games. Math Stat Probab 72(1):246–275

    MathSciNet  MATH  Google Scholar 

  32. Liggett TM, Lippman SA (1969) Stochastic games with perfect information and time-average payoff. SIAM Rev 4:604–607

    Article  MathSciNet  Google Scholar 

  33. Mertens JF, Neyman A (1981) Stochastic games. Int J Game Theory 10:53–66

    Article  MathSciNet  MATH  Google Scholar 

  34. Miltersen PB (2011) Discounted stochastic games poorly approximate undiscounted ones. Manuscript. Technical report

  35. Mine H, Osaki S (1970) Markovian decision process. Elsevier, New York

    Google Scholar 

  36. Moulin H (1976) Extension of two person zero sum games. J Math Anal Appl 5(2):490–507

    Article  MathSciNet  Google Scholar 

  37. Moulin H (1976) Prolongement des jeux à deux joueurs de somme nulle. Bull Soc Math Fr Mem 45

  38. Pisaruk NN (1999) Mean cost cyclical games. Math Oper Res 24(4):817–828

    Article  MathSciNet  MATH  Google Scholar 

  39. Parthasarathy T, Raghavan TES (1981) An orderfield property for stochastic games when one player controls transition probabilities. J Optim Theory Appl 33:375–392. doi:10.1007/BF00935250

    Article  MathSciNet  MATH  Google Scholar 

  40. Raghavan TES, Tijs SH, Vrieze OJ (1985) On stochastic games with additive reward and transition structure. J Optim Theory Appl 47:451–464. doi:10.1007/BF00942191

    Article  MathSciNet  MATH  Google Scholar 

  41. Shapley LS (1953) Stochastic games. Proc Natl Acad Sci USA 39:1095–1100

    Article  MathSciNet  MATH  Google Scholar 

  42. Shapley LS, Snow RN (1950) Basic solutions of discrete games. Ann Math Stud 24:27–35

    MathSciNet  Google Scholar 

  43. Sinha S (1989) A contribution to the theory of stochastic games. PhD thesis, Indian Statistical Institute, New Delhi, India

  44. Sznajder R, Filar JA (1992) Some comments on a theorem of Hardy and Littlewood. J Optim Theory Appl 75(1):201–208

    Article  MathSciNet  MATH  Google Scholar 

  45. von Neumann J (1928) Zur Theorie der Gesellschaftsspiele. Math Ann 100:295–320

    Article  MathSciNet  MATH  Google Scholar 

  46. Vrieze OJ (1980) Stochastic games with finite state and action spaces. PhD thesis, Centrum voor Wiskunde en Informatica, Amsterdam, The Netherlands

  47. Zwick U, Paterson M (1996) The complexity of mean payoff games on graphs. Theor Comput Sci 158(1–2):343–359

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We thank the anonymous referees for careful reading and many helpful remarks. This research was partially supported by the Scientific Grant-in-Aid from Ministry of Education, Science, Sports, and Culture of Japan. The first author also acknowledges the partial support of NSF Grant IIS-0803444 and NSF Grant CMMI-0856663.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khaled Elbassioni.

Additional information

Part of this research was done at the Mathematisches Forschungsinstitut Oberwolfach during a stay within the Research in Pairs Program from March 7 to March 18, 2011.

Appendices

Appendix A: Related Results from Theory of Markov Chains

Given a n×n transition matrix P, the Cesáro partial sums \(\frac{1}{k+1} \sum_{i=1}^{k} P^{i}\) converge, as k→∞, to the limit Markov matrix Q such that:

  1. (i)

    PQ=QP=QQ=Q;

  2. (ii)

    \(\operatorname {rank}(I - P) + \operatorname {rank}Q = n\).

  3. (iii)

    For each n-vector c system Px=x, Qx=c has a unique solution.

  4. (iv)

    matrix I−(PQ) is nonsingular and

    $$ H(\delta ) = \sum_{i = 0}^\infty \delta ^i \bigl(P^i - Q\bigr) \rightarrow H = \bigl(I - (P - Q)\bigr)^{-1} - Q \quad \mbox{as } \delta \rightarrow1^-. $$
    (62)
  5. (v)
    $$H(\delta ) Q = Q H(\delta ) = H Q = Q H = 0 \quad \mbox{and} \quad (I - P) H = H (I - P) = I -Q. $$

Claim (iv) (which is used in Sect. 7.4) was proved in 1962 by Blackwell [4], while for the remaining four claims, he cited the text-book in finite Markov chains by Kemeny and Snell [28] (that was published, in fact, one year later, in 1963).

Appendix B: Proof of Lemma 3

Let us first fix the strategy \(\bar{\beta}\) of Black, and compute the uniformly best response by White by solving a controlled Markov chain problem (see, e.g., [35]). It is well-known that this can be done by solving a linear program whose dual LP provides us with a potential vector y∈ℝV such that

$$ g^v\geq \mathrm {Val}\bigl(A^v(y)\bigr) $$
(63)

holds for all states vV. Let us next fix \(\bar{\alpha}\) and compute similarly the best response of Black, providing analogously a potential vector z∈ℝV satisfying

$$ g^v\leq \mathrm {Val}\bigl(A^v(z)\bigr) $$
(64)

for all states vV. Since adding a constant to a potential vector does not change the potential transformation and the value matrices, we can assume w.l.o.g. that

$$ y\leq z. $$
(65)

Let us define a matrix valued mapping B v(d) for all states vV and vectors d∈ℝV by

$$B^v(d) = A^v(d)-d^vJ_{|K^v|\times|L^v|}. $$

Then we have by (65) that B v(z)≤B v(y) (componentwise), and since the value function of matrix games is monotone we can conclude by (63) and (64), and by the fact that changing the payoff matrix by a constant changes the value of the game by the same constant that

$$ g^v-z^v \leq \mathrm {Val}\bigl(B^v(z) \bigr)\leq \mathrm {Val}\bigl(B^v(y)\bigr)\leq g^v-y^v, $$
(66)

for all states vV. Note that if gzgdgy, then we have B v(z)≤B v(d)≤B v(y) for all vV, and hence by the above cited properties of the value function and by (66)

$$ \mathrm {Val}\bigl(B^v(z)\bigr)\leq \mathrm {Val}\bigl(B^v(d) \bigr) \leq \mathrm {Val}\bigl(B^v(y)\bigr) \quad \text{for all states } v\in V. $$
(67)

Since the mapping F:gd↦Val(B(d)) (where Val(B(d))=(Val(B v(d)) : vV)) is Lipschitz-continuous and since by property (67) and (66) it maps the compact box [gz,gy] into a subset of itself, we can conclude by Brouwer’s theorem that F has a fixed point, that is there exists a potential vector x∈[y,z] (i.e., gx∈[gz,gy]) for which gx=F(gx)=Val(B(x)). This implies that g=Val(A(x)), completing our proof. □

Appendix C: Proof of the Implication \(\mathrm{(B2)}\Rightarrow\mathrm{(A1)}\)

Let g,x,y∈ℝV be the vectors satisfying condition (B2). Then there exist strategies and such that, for all states vV, the following hold: (1) \(\bar{\alpha}^{v} G^{v}(g)\beta\ge g^{v}\) and \(\bar{\alpha}^{v} A^{v}(x)\beta \ge g^{v}\) for all βΔ(L v), and (2) \(\alpha^{v} G^{v}(g)\bar{\beta}^{v}\le g^{v}\) and \(\alpha^{v} A^{v}(y)\bar{\beta}^{v}\le g^{v}\) for all αΔ(K v).

Fix a starting position v 0=w. It is enough to show that White can guarantee at least g w while Black can guarantee at most g w. We only show the former statement since the latter can be shown similarly. At time i, we will let White play his/her locally optimal (stationary) strategy \(\bar{\alpha}^{v}\) whenever (s)he is at position v i =v, while Black chooses an arbitrary, not necessarily stationary, strategy , where is the history of the play leading to v i =v and is the set of all such histories. Let us note that and denote by β v,iΔ(L v) the Markovian strategy given by .

Consider a play w=v 0,v 1,v 2,… (where each v i is a random variable). By (7) and the fact that potential transformations do not change the Cesáro sum (Sect. 2.4), it is enough to show that \(\mathbb {E}[b_{i}(x)]\ge g^{w}\) for all i. Note that

(68)

We prove by induction on i=0,1,2,… , that ∑ v g v⋅Pr[v i =v]≥g w, which will imply the lemma by (68). Indeed, the statement is trivially true for i=0. For any i, we have

and the latter is at least g w by the induction hypothesis. □

Appendix D: Some Examples

Example 2

Vrieze [46, Chap. 8] showed an example, see Fig. 2, for a stochastic game which has values and uniformly optimal stationary strategies, and which has no canonical form. We can see that condition (A2) is violated. In this game we have V={1,2,3}, states 2 and 3 are absorbing with |K 2|=|K 3|=|L 2|=|L 3|=1, while in state 1 we have |K 1|=|L 1|=3. The reward matrix of state one is shown in Fig. 2 together with the transition probabilities which are all zero or one.

Fig. 2
figure 3

In this game Γ, we have |V|=3 states, |K 1|=|L 1|=3, |K 2|=|L 2|=1, and |K 3|=|L 3|=1. The numbers in the state matrices are the local rewards. All transition probabilities are zero or one, and arcs in the picture indicate the probability 1 transitions. The thick arc in the picture indicates that for pairs of strategies from the top left 2×2 area in state 1 the game remains in state 1 with probability 1. This game has values and uniformly optimal stationary strategies, but it does not have a canonical form

This game has values, g=(0,−1,1), and unique uniformly optimal stationary strategies, namely \(\alpha^{1}=(\frac{1}{2},\frac{1}{2},0)\) and \(\beta^{1}=(\frac{1}{2},\frac{1}{2},0)\), and the trivial strategies in states 2 and 3. We have

$$G^1 = \left ( \begin{array}{c@{\quad }c@{\quad }c}0&0&1\\0&0&-1\\1&-1&0 \end{array} \right ). $$

For a potential vector x∈ℝV we can assume w.l.o.g. that x 1=0, and thus we have

$$A^1(x) = \left ( \begin{array}{c@{\quad }c@{\quad }c}1&-1&-1-x_3\\-1&1&-1-x_2\\1-x_3&1-x_2&0 \end{array} \right ). $$

Here, \(\bar {K}^{1}=\{(\frac{1}{2},\frac{1}{2},0),(1,0,0)\}\), \(\bar {L}^{1}=\{(\frac{1}{2},\frac{1}{2},0),(0,1,0)\}\), and only the first vectors are optimal in the matrix game with payoffs A 1(x) (for any potential transformation), and thus α 1 and β 1 given above are the unique optimal strategies. For the canonical form for some potential vector x∈ℝV (x 1=0), we would need the inequalities that α 1 A 1(x)≥0 and A 1(x)β 1≤0, implying that \(-1-\frac{x_{2}+x_{3}}{2}\geq0\) and \(1-\frac{x_{2}+x_{3}}{2}\leq0\), leading to a contradiction. Consequently, this example does not have a canonical form.

Example 3

Raghavan, Tijs, and Vrieze [40] showed an example, see Fig. 3, for an AT game, with a rational input data, in which the optimal values and strategies are irrational. This example is ergodic, with states V={1,2} and values \(g^{1}=g^{2}=-(6-\sqrt{30})^{2}\). The vector \(x=(0,22-4\sqrt{30})\) is a potential transformation providing the canonical form for this example. We have K 1=K 2=L 1=L 2={1,2}, and the strategies \(\alpha ^{1}=\beta^{1}=(\frac{-4+\sqrt{30}}{2},\frac{6-\sqrt{30}}{2})\), and \(\alpha^{2}=\beta^{2}=(\frac{-9+2\sqrt{30}}{3},\frac{12-2\sqrt {30}}{3})\) are the uniformly optimal stationary strategies.

Fig. 3
figure 4

This example has two states with |K 1|=|L 1|=|K 2|=|L 2|=2. In each cell in the figure, we have the reward in the top left corner, while the transition probabilities to states 1 and 2 (in this order) are in bottom right area. This is an ergodic AT-game with irrational values and optimal strategies

Appendix E: Summary of Implications

Figure 4 summarizes the implications between game properties considered in the paper.

Fig. 4
figure 5

Summary of the results and considered game classes. Most implications are by Theorem 1. Example 2 shows that property (B2) does not imply (B3). Theorem 3 claims that properties (A2) and (B6) together imply (B4), which is equivalent with the existence of the canonical form (B1). Dotted lines indicate subclass relations

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boros, E., Elbassioni, K., Gurvich, V. et al. On Canonical Forms for Zero-Sum Stochastic Mean Payoff Games. Dyn Games Appl 3, 128–161 (2013). https://doi.org/10.1007/s13235-013-0075-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13235-013-0075-x

Keywords

Navigation