On Canonical Forms for Zero-Sum Stochastic Mean Payoff Games

Boros, Endre; Elbassioni, Khaled; Gurvich, Vladimir; Makino, Kazuhisa

doi:10.1007/s13235-013-0075-x

On Canonical Forms for Zero-Sum Stochastic Mean Payoff Games

Published: 05 February 2013

Volume 3, pages 128–161, (2013)
Cite this article

Dynamic Games and Applications Aims and scope Submit manuscript

Endre Boros¹,
Khaled Elbassioni²,
Vladimir Gurvich¹ &
…
Kazuhisa Makino³

163 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

We consider two-person zero-sum mean payoff undiscounted stochastic games and obtain sufficient conditions for the existence of a saddle point in uniformly optimal stationary strategies. Namely, these conditions enable us to bring the game, by applying potential transformations, to a canonical form in which locally optimal strategies are globally optimal, and hence the value for every initial position and the optimal strategies of both players can be obtained by playing the local game at each state. We show that these conditions hold for the class of additive transition (AT) games, that is, the special case when the transitions at each state can be decomposed into two parts, each controlled completely by one of the two players. An important special case of AT-games form the so-called BWR-games which are played by two players on a directed graph with positions of three types: Black, White and Random. We give an independent proof for the existence of a canonical form in such games, and use this result to derive the existence of a canonical form (and hence, of a saddle point in uniformly optimal stationary strategies) in a wide class of games, which includes stochastic games with perfect information (PI), switching controller (SC) games and additive rewards, additive transition (ARAT) games. Unlike the proof for AT-games, our proof for the BWR-case does not rely on the existence of a saddle point in stationary strategies. We also derive some algorithmic consequences from these our reductions to BWR-games, in terms of solving PI-, and ARAT-games in sub-exponential time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stationary, completely mixed and symmetric optimal and equilibrium strategies in stochastic games

Article 25 October 2016

A Pseudo-Polynomial Algorithm for Mean Payoff Stochastic Games with Perfect Information and a Few Random Positions

Total Reward Semi-Markov Mean-Field Games with Complementarity Properties

Article Open access 14 May 2016

Notes

Shapley’s original stochastic games were assumed to have positive stopping probabilities, i.e., at each state v, $\sum_{u\in V} p_{k\ell}^{vu} <1$, and with probability $1-\sum_{u\in V} p_{k\ell}^{vu}$, the game stops at state v if actions k and ℓ are selected by the players.
Note that a BW-game on an arbitrary digraph G=(V _B∪V _W,E) can be reduced to a BW-game on a bipartite graph $G=(V_{B}'\cup V_{W}', E')$ (where the values are halved) by splitting every v∈V _B into two nodes $v'\in V_{W}'$ and $v''\in V_{B}'$ with an additional arc (v′,v″)∈E′, and subdividing each arc (v,u)∈E, where v∈V _W, with a new node $v_{u}\in V_{B}'$; new arcs have reward 0.
We thank an anonymous reviewer for pointing out this connection to us.
log denotes the logarithm to the base 2
This assumption is without loss of generality since one can add a loop to each terminal vertex.

References

Andersson D, Miltersen PB (2009) The complexity of solving stochastic games on graphs. In: Proc 20th ISAAC. LNCS, vol 5878, pp 112–121
Google Scholar
Beffara E, Vorobyov S (2001) Adapting Gurvich-Karzanov-Khachiyan’s algorithm for parity games: Implementation and experimentation. Technical Report 2001-020, Department of Information Technology, Uppsala University, available at https://www.it.uu.se/research/reports/#2001
Beffara E, Vorobyov S (2001) Is randomized Gurvich-Karzanov-Khachiyan’s algorithm for parity games polynomial? Technical Report 2001-025, Department of Information Technology, Uppsala University, available at https://www.it.uu.se/research/reports/#2001
Blackwell D (1962) Discrete dynamic programming. Ann Math Stat 33:719–726
Article MathSciNet MATH Google Scholar
Boros E, Elbassioni K, Gurvich V, Makino K (2009) Every stochastic game with perfect information admits a canonical form. RRR-09-2009, RUTCOR, Rutgers University
Boros E, Elbassioni K, Gurvich V, Makino K (2012) Discounted approximations of undiscounted stochastic games and Markov decision processes are already poor in the almost deterministic case. RRR-24-2012, RUTCOR, Rutgers University
Boros E, Elbassioni K, Gurvich V, Makino K (2010) A pumping algorithm for ergodic stochastic mean payoff games with perfect information. In: Proc 14th IPCO. LNCS, vol 6080. Springer, Berlin, pp 341–354
Google Scholar
Boros E, Elbassioni K, Gurvich V, Makino K (2012) On canonical forms for two-person zero-sum limit average payoff stochastic games. RRR-15-2012, RUTCOR, Rutgers University
Boros E, Elbassioni K, Gurvich V, Makino K (2012) A potential reduction algorithm for two-person zero-sum limiting average payoff stochastic games. RRR-13-2012, RUTCOR, Rutgers University
Bewley T, Kohlberg E (1978) On stochastic games with stationary optimal strategies. Math Oper Res 3(2):104–125
Article MathSciNet MATH Google Scholar
Chatterjee K, Henzinger TA (2008) Reduction of stochastic parity to stochastic mean-payoff games. Inf Process Lett 106(1):1–7
Article MathSciNet MATH Google Scholar
Chatterjee K, Jurdziński M, Henzinger TA (2004) Quantitative stochastic parity games. In: Proc 15th SODA, pp 121–130
Google Scholar
Condon A (1992) The complexity of stochastic games. Inf Comput 96:203–224
Article MathSciNet MATH Google Scholar
Eherenfeucht A, Mycielski J (1979) Positional strategies for mean payoff games. Int J Game Theory 8:109–113
Article Google Scholar
Federgruen A (1980) Successive approximation methods in undiscounted stochastic games. Oper Res 1:794–810
Article MathSciNet Google Scholar
Filar JA (1981) Ordered field property for stochastic games when the player who controls transitions changes from state to state. J Optim Theory Appl 34(4):503–515
Article MathSciNet MATH Google Scholar
Flesch J, Thuijsman F, Vrieze OJ (2007) Stochastic games with additive transitions. Eur J Oper Res 179(2):483–497
Article MathSciNet MATH Google Scholar
Gallai T (1958) Maximum-minimum Sätze über Graphen. Acta Math Acad Sci Hung 9:395–434
Article MathSciNet MATH Google Scholar
Gillette D (1957) Stochastic games with zero stop probabilities. In: Dresher M, Tucker AW, Wolfe P (eds) Contribution to the theory of games III. Annals of mathematics studies, vol 39. Princeton University Press, Princeton, pp 179–187
Google Scholar
Gurvich V, Karzanov A, Khachiyan L (1988) Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR Comput Math Math Phys 28:85–91
Article MathSciNet MATH Google Scholar
Halman N (2007) Simple stochastic games, parity games, mean payoff games and discounted payoff games are all LP-type problems. Algorithmica 49(1):37–50
Article MathSciNet MATH Google Scholar
Hardy GH, Littlewood JE (1931) Notes on the theory of series (xvi): two Tauberian theorems. J Lond Math Soc 6:281–286
Article MathSciNet Google Scholar
Hoffman AJ, Karp RM (1966) On nonterminating stochastic games. Manag Sci, Ser A 12(5):359–370
MathSciNet MATH Google Scholar
Howard RA (1960) Dynamic programming and Markov processes. Technology press and Willey, New York
MATH Google Scholar
Jurdziński M (1998) Deciding the winner in parity games is in UP ∩ co-UP. Inf Process Lett 68(3):119–124
Article Google Scholar
Jurdziński M, Paterson M, Zwick U (2006) A deterministic subexponential algorithm for solving parity games. In: Proc 17th SODA, pp 117–123
Chapter Google Scholar
Karp RM (1978) A characterization of the minimum cycle mean in a digraph. Discrete Math 23:309–311
MathSciNet MATH Google Scholar
Kemeny JG, Snell JL (1963) Finite Markov chains. Springer, Berlin
Google Scholar
Korevaar J (2010) Tauberian theory: a century of developments. Grundlehren der mathematischen Wissenschaften. Springer, Berlin
Google Scholar
Kratsch D, McConnell RM, Mehlhorn K, Spinrad JP (2003) Certifying algorithms for recognizing interval graphs and permutation graphs. In: SODA’03: proceedings of the fourteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, Philadelphia, pp 158–167
Google Scholar
Krishnamurthy N, Parthasarathy T, Ravindran G (2010) Orderfield property of mixtures of stochastic games. Math Stat Probab 72(1):246–275
MathSciNet MATH Google Scholar
Liggett TM, Lippman SA (1969) Stochastic games with perfect information and time-average payoff. SIAM Rev 4:604–607
Article MathSciNet Google Scholar
Mertens JF, Neyman A (1981) Stochastic games. Int J Game Theory 10:53–66
Article MathSciNet MATH Google Scholar
Miltersen PB (2011) Discounted stochastic games poorly approximate undiscounted ones. Manuscript. Technical report
Mine H, Osaki S (1970) Markovian decision process. Elsevier, New York
Google Scholar
Moulin H (1976) Extension of two person zero sum games. J Math Anal Appl 5(2):490–507
Article MathSciNet Google Scholar
Moulin H (1976) Prolongement des jeux à deux joueurs de somme nulle. Bull Soc Math Fr Mem 45
Pisaruk NN (1999) Mean cost cyclical games. Math Oper Res 24(4):817–828
Article MathSciNet MATH Google Scholar
Parthasarathy T, Raghavan TES (1981) An orderfield property for stochastic games when one player controls transition probabilities. J Optim Theory Appl 33:375–392. doi:10.1007/BF00935250
Article MathSciNet MATH Google Scholar
Raghavan TES, Tijs SH, Vrieze OJ (1985) On stochastic games with additive reward and transition structure. J Optim Theory Appl 47:451–464. doi:10.1007/BF00942191
Article MathSciNet MATH Google Scholar
Shapley LS (1953) Stochastic games. Proc Natl Acad Sci USA 39:1095–1100
Article MathSciNet MATH Google Scholar
Shapley LS, Snow RN (1950) Basic solutions of discrete games. Ann Math Stud 24:27–35
MathSciNet Google Scholar
Sinha S (1989) A contribution to the theory of stochastic games. PhD thesis, Indian Statistical Institute, New Delhi, India
Sznajder R, Filar JA (1992) Some comments on a theorem of Hardy and Littlewood. J Optim Theory Appl 75(1):201–208
Article MathSciNet MATH Google Scholar
von Neumann J (1928) Zur Theorie der Gesellschaftsspiele. Math Ann 100:295–320
Article MathSciNet MATH Google Scholar
Vrieze OJ (1980) Stochastic games with finite state and action spaces. PhD thesis, Centrum voor Wiskunde en Informatica, Amsterdam, The Netherlands
Zwick U, Paterson M (1996) The complexity of mean payoff games on graphs. Theor Comput Sci 158(1–2):343–359
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We thank the anonymous referees for careful reading and many helpful remarks. This research was partially supported by the Scientific Grant-in-Aid from Ministry of Education, Science, Sports, and Culture of Japan. The first author also acknowledges the partial support of NSF Grant IIS-0803444 and NSF Grant CMMI-0856663.

Author information

Authors and Affiliations

RUTCOR, Rutgers University, 640 Bartholomew Road, Piscataway, NJ, 08854-8003, USA
Endre Boros & Vladimir Gurvich
Masdar Institute of Science and Technology, Abu Dhabi, UAE
Khaled Elbassioni
Graduate School of Information Science and Technology, University of Tokyo, Tokyo, 113-8656, Japan
Kazuhisa Makino

Authors

Endre Boros
View author publications
You can also search for this author in PubMed Google Scholar
Khaled Elbassioni
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Gurvich
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhisa Makino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khaled Elbassioni.

Additional information

Part of this research was done at the Mathematisches Forschungsinstitut Oberwolfach during a stay within the Research in Pairs Program from March 7 to March 18, 2011.

Appendices

Appendix A: Related Results from Theory of Markov Chains

Given a n×n transition matrix P, the Cesáro partial sums $\frac{1}{k+1} \sum_{i=1}^{k} P^{i}$ converge, as k→∞, to the limit Markov matrix Q such that:

(i)
PQ=QP=QQ=Q;
(ii)
$\operatorname {rank}(I - P) + \operatorname {rank}Q = n$.
(iii)
For each n-vector c system Px=x, Qx=c has a unique solution.
(iv)
matrix I−(P−Q) is nonsingular and
$$ H(\delta ) = \sum_{i = 0}^\infty \delta ^i \bigl(P^i - Q\bigr) \rightarrow H = \bigl(I - (P - Q)\bigr)^{-1} - Q \quad \mbox{as } \delta \rightarrow1^-. $$
(62)
(v)
$$H(\delta ) Q = Q H(\delta ) = H Q = Q H = 0 \quad \mbox{and} \quad (I - P) H = H (I - P) = I -Q. $$

Claim (iv) (which is used in Sect. 7.4) was proved in 1962 by Blackwell [4], while for the remaining four claims, he cited the text-book in finite Markov chains by Kemeny and Snell [28] (that was published, in fact, one year later, in 1963).

Appendix B: Proof of Lemma 3

Let us first fix the strategy $\bar{\beta}$ of Black, and compute the uniformly best response by White by solving a controlled Markov chain problem (see, e.g., [35]). It is well-known that this can be done by solving a linear program whose dual LP provides us with a potential vector y∈ℝ^V such that

$$ g^v\geq \mathrm {Val}\bigl(A^v(y)\bigr) $$

(63)

holds for all states v∈V. Let us next fix $\bar{\alpha}$ and compute similarly the best response of Black, providing analogously a potential vector z∈ℝ^V satisfying

$$ g^v\leq \mathrm {Val}\bigl(A^v(z)\bigr) $$

(64)

for all states v∈V. Since adding a constant to a potential vector does not change the potential transformation and the value matrices, we can assume w.l.o.g. that

$$ y\leq z. $$

(65)

Let us define a matrix valued mapping B ^v(d) for all states v∈V and vectors d∈ℝ^V by

$$B^v(d) = A^v(d)-d^vJ_{|K^v|\times|L^v|}. $$

Then we have by (65) that B ^v(z)≤B ^v(y) (componentwise), and since the value function of matrix games is monotone we can conclude by (63) and (64), and by the fact that changing the payoff matrix by a constant changes the value of the game by the same constant that

$$ g^v-z^v \leq \mathrm {Val}\bigl(B^v(z) \bigr)\leq \mathrm {Val}\bigl(B^v(y)\bigr)\leq g^v-y^v, $$

(66)

for all states v∈V. Note that if g−z≤g−d≤g−y, then we have B ^v(z)≤B ^v(d)≤B ^v(y) for all v∈V, and hence by the above cited properties of the value function and by (66)

$$ \mathrm {Val}\bigl(B^v(z)\bigr)\leq \mathrm {Val}\bigl(B^v(d) \bigr) \leq \mathrm {Val}\bigl(B^v(y)\bigr) \quad \text{for all states } v\in V. $$

(67)

Since the mapping F:g−d↦Val(B(d)) (where Val(B(d))=(Val(B ^v(d)) : v∈V)) is Lipschitz-continuous and since by property (67) and (66) it maps the compact box [g−z,g−y] into a subset of itself, we can conclude by Brouwer’s theorem that F has a fixed point, that is there exists a potential vector x∈[y,z] (i.e., g−x∈[g−z,g−y]) for which g−x=F(g−x)=Val(B(x)). This implies that g=Val(A(x)), completing our proof. □

Appendix C: Proof of the Implication $\mathrm{(B2)}\Rightarrow\mathrm{(A1)}$

Let g,x,y∈ℝ^V be the vectors satisfying condition (B2). Then there exist strategies and such that, for all states v∈V, the following hold: (1) $\bar{\alpha}^{v} G^{v}(g)\beta\ge g^{v}$ and $\bar{\alpha}^{v} A^{v}(x)\beta \ge g^{v}$ for all β∈Δ(L ^v), and (2) $\alpha^{v} G^{v}(g)\bar{\beta}^{v}\le g^{v}$ and $\alpha^{v} A^{v}(y)\bar{\beta}^{v}\le g^{v}$ for all α∈Δ(K ^v).

Fix a starting position v ₀=w. It is enough to show that White can guarantee at least g ^w while Black can guarantee at most g ^w. We only show the former statement since the latter can be shown similarly. At time i, we will let White play his/her locally optimal (stationary) strategy $\bar{\alpha}^{v}$ whenever (s)he is at position v _i=v, while Black chooses an arbitrary, not necessarily stationary, strategy , where is the history of the play leading to v _i=v and is the set of all such histories. Let us note that and denote by β ^v,i∈Δ(L ^v) the Markovian strategy given by .

Consider a play w=v ₀,v ₁,v ₂,… (where each v _i is a random variable). By (7) and the fact that potential transformations do not change the Cesáro sum (Sect. 2.4), it is enough to show that $\mathbb {E}[b_{i}(x)]\ge g^{w}$ for all i. Note that

(68)

We prove by induction on i=0,1,2,… , that ∑_v g ^v⋅Pr[v _i=v]≥g ^w, which will imply the lemma by (68). Indeed, the statement is trivially true for i=0. For any i, we have

and the latter is at least g ^w by the induction hypothesis. □

Appendix D: Some Examples

Example 2

Vrieze [46, Chap. 8] showed an example, see Fig. 2, for a stochastic game which has values and uniformly optimal stationary strategies, and which has no canonical form. We can see that condition (A2) is violated. In this game we have V={1,2,3}, states 2 and 3 are absorbing with |K ²|=|K ³|=|L ²|=|L ³|=1, while in state 1 we have |K ¹|=|L ¹|=3. The reward matrix of state one is shown in Fig. 2 together with the transition probabilities which are all zero or one.

This game has values, g=(0,−1,1), and unique uniformly optimal stationary strategies, namely $\alpha^{1}=(\frac{1}{2},\frac{1}{2},0)$ and $\beta^{1}=(\frac{1}{2},\frac{1}{2},0)$, and the trivial strategies in states 2 and 3. We have

$$G^1 = \left ( \begin{array}{c@{\quad }c@{\quad }c}0&0&1\\0&0&-1\\1&-1&0 \end{array} \right ). $$

For a potential vector x∈ℝ^V we can assume w.l.o.g. that x ₁=0, and thus we have

$$A^1(x) = \left ( \begin{array}{c@{\quad }c@{\quad }c}1&-1&-1-x_3\\-1&1&-1-x_2\\1-x_3&1-x_2&0 \end{array} \right ). $$

Here, $\bar {K}^{1}=\{(\frac{1}{2},\frac{1}{2},0),(1,0,0)\}$, $\bar {L}^{1}=\{(\frac{1}{2},\frac{1}{2},0),(0,1,0)\}$, and only the first vectors are optimal in the matrix game with payoffs A ¹(x) (for any potential transformation), and thus α ¹ and β ¹ given above are the unique optimal strategies. For the canonical form for some potential vector x∈ℝ^V (x ₁=0), we would need the inequalities that α ¹ A ¹(x)≥0 and A ¹(x)β ¹≤0, implying that $-1-\frac{x_{2}+x_{3}}{2}\geq0$ and $1-\frac{x_{2}+x_{3}}{2}\leq0$, leading to a contradiction. Consequently, this example does not have a canonical form.

Example 3

Raghavan, Tijs, and Vrieze [40] showed an example, see Fig. 3, for an AT game, with a rational input data, in which the optimal values and strategies are irrational. This example is ergodic, with states V={1,2} and values $g^{1}=g^{2}=-(6-\sqrt{30})^{2}$. The vector $x=(0,22-4\sqrt{30})$ is a potential transformation providing the canonical form for this example. We have K ¹=K ²=L ¹=L ²={1,2}, and the strategies $\alpha ^{1}=\beta^{1}=(\frac{-4+\sqrt{30}}{2},\frac{6-\sqrt{30}}{2})$, and $\alpha^{2}=\beta^{2}=(\frac{-9+2\sqrt{30}}{3},\frac{12-2\sqrt {30}}{3})$ are the uniformly optimal stationary strategies.

Appendix E: Summary of Implications

Figure 4 summarizes the implications between game properties considered in the paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boros, E., Elbassioni, K., Gurvich, V. et al. On Canonical Forms for Zero-Sum Stochastic Mean Payoff Games. Dyn Games Appl 3, 128–161 (2013). https://doi.org/10.1007/s13235-013-0075-x

Download citation

Published: 05 February 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s13235-013-0075-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Canonical Forms for Zero-Sum Stochastic Mean Payoff Games

Abstract

Access this article

Similar content being viewed by others

Stationary, completely mixed and symmetric optimal and equilibrium strategies in stochastic games

A Pseudo-Polynomial Algorithm for Mean Payoff Stochastic Games with Perfect Information and a Few Random Positions

Total Reward Semi-Markov Mean-Field Games with Complementarity Properties

Notes

References

Acknowledgements