Abstract
This paper introduces conditional Markov strategies in discrete-time discounted dynamic games with perfect monitoring. These are strategies in which players follow Markov policies after all histories. Policies induced by conditional Markov equilibria can be supported with the threat of reverting to the policy that yields the smallest expected equilibrium payoff for the deviator. This leads to a set-valued fixed-point characterization of equilibrium payoff functions. The result can be used for the computation of equilibria and for showing the existence in behavior strategies.

Similar content being viewed by others
Notes
Sleet and Yeltekin (2003) have one example involving stochastic fluctuations although their present the numerical method only for deterministic games.
There is a slight abuse of notation throughout the paper when we need to write \(y\) as \((y_i,y_{-i})\) or \(\sigma =(\sigma _i,\sigma _{-i})\). For example, instead of writing \(u_i((y_i,y_{-i}),x,w)\) we denote \(u_i(y_i,y_{-i},x,w)\). Respectively, \(u_i(\sigma _i(h),\sigma _{-i}(h),x(h),w)\) stands for \(u_i((\sigma _i(h),\sigma _{-i}(h)),x(h),w)\).
References
Abreu D (1986) Extremal equilibria of oligopolistic supergames. J Econ Theory 39(2):191–225
Abreu D (1988) On the theory of infinitely repeated games with discounting. Econometrica 56(2):383–396
Abreu D, Pearce D, Stacchetti E (1986) Optimal cartel equilibria with imperfect monitoring. J Econ Theory 39(1):251–269
Abreu D, Pearce D, Stacchetti E (1990) Toward a theory of discounted repeated games with imperfect monitoring. Econometrica 58(5):1041–1063
Aguirregabiria V, Mira P (2007) Sequential estimation of dynamic discrete games. Econometrica 75(1):1–53
Bajari P, Benkard L, Levin J (2007) Estimating dynamic models of imperfect competition. Econometrica 75(5):1331–1370
Berg K, Kitti M (2012) Equilibrium paths in discounted supergames. Working paper
Berg K, Kitti M (2013) Computing equilibria in discounted \(2\times 2\) supergames. Comput Econ 41(1):71–88
Berry S, Ostrovsky M, Pakes A (2007) Simple estimators for the parameters of discrete dynamic games. RAND J Econ 38(2):373–399
Bertsekas DP, Shreve SE (1996) Stochastic optimal control: the discrete time case. Athena Scientific, Belmont, Massachusetts
Cole HL, Kocherlakota N (2001) Dynamic games with hidden actions and hidden states. J Econ Theory 98(1):114–126
Cronshaw MB (1997) Algorithms for finding repeated game equilibria. Comput Econ 10(2):139–168
Cronshaw MB, Luenberger DG (1994) Strongly symmetric subgame perfect equilibria in infinitely repeated games with perfect monitoring. Games Econ Behav 6(2):220–237
Doraszelski U, Escobar J (2012) Restricted feedback in long term relationships. J Econ Theory 147(1): 142–161
Doraszelski U, Pakes A (2007) A framework for applied dynamic analysis in IO. In: Armstrong M, Porter R (eds) Handbook of industrial organization, vol 3. North-Holland, Amsterdam, pp 1887–1966
Doraszelski U, Satterthwaite M (2010) Computable Markov-perfect industry dynamics. RAND J Econ 41(2):215–243
Duffie D, Geanakoplos J, Mas-Colell A, McLennan A (1994) Stationary Markov equilibria. Econometrica 62(4):745–781
Dutta PK, Radner R (2006) A game-theoretic approach to global warming. In: Kasuoka S, Yamazaki A (eds). Advances in mathematical economics, vol 8. Springer-Verlag, Tokyo, pp 135–153
Ely JC, Hörner J, Olszewski W (2005) Belief-free equilibria in repeated games. Econometrica 73(2): 377–415
Ericson R, Pakes A (1995) Markov-perfect industry dynamics: a framework for empirical work. Rev Econ Stud 62:53–82
Fink A (1964) Equilibrium points of stochastic noncooperative games. Hiroshima Univ Ser A 28:89–93
Fudenberg D, Levine D (1983) Subgame-perfect equilibria of finite- and infinite-horizon games. J Econ Theory 31(2):251–267
Judd K, Yeltekin Ş, Conklin J (2003) Computing supergame equilibria. Econometrica 71(4):1239–1254
Käenmäki A, Vilppolainen M, (2010) Dimension and measures of sub-self-affine sets. Monatshefte für Mathematik 161(3): 271–293
Kandori M (2011) Weakly belief-free equilibria in repeated games with private monitoring. Econometrica 79(3):877–892
Kitti M (2011) Conditionally stationary equilibria in discounted dynamic games. Dyn Games Appl 1(4): 514–533
Maitra AP, Sudderth WD (2007) Subgame-perfect equilibria for stochastic games. Math Oper Res 32(3):711–722
Mertens JF, Parthasarathy T (1991) Nonzerosum stochastic games. In: Raghavan TES, Ferguson TS, Parthasarathy T, Vrieze OJ (eds) Stochastic games and related topics. Kluwer, Boston
Nash JF (1951) Non-cooperative games. Ann Math 54(2):286–295
Phelan C, Stacchetti E (2001) Sequential equilibria in a Ramsey tax model. Econometrica 69(6):1491–1518
Rockafellar RT, Wets RJ-B (1998) Variational analysis. Springer, Berlin
Shapley LS (1953) Stochastic games. Proc Natl Acad Sci USA 39(10):1095–1100
Sleet C, Yeltekin Ş (2003) On the computation of value correspondences. Working paper
Sleet C, Yeltekin Ş (2006) Optimal taxation with endogenously incomplete asset markets. J Econ Theory 127(1):36–73
Sleet C, Yeltekin Ş (2007) Recursive monetary policy games with incomplete information. J Econ Dyn Control 31(5):1557–1583
Solan E (1998) Discounted stochastic games. Math Oper Res 23(4):1010–1021
Whitt W (1980) Representation and approximation of noncooperative sequential games. SIAM J Control Optim 18(1):33–48
Author information
Authors and Affiliations
Corresponding author
Additional information
I thank two anonymous referees for their comments. Funding from the Academy of Finland is gratefully acknowledged.
Appendix: Auxiliary Proofs
Appendix: Auxiliary Proofs
Proof of lemmas
2 and 3. The purpose is to first show that if we pick a sequence of equilibrium payoff functions, then there is a policy that gives the payoff function obtained in the limit of a convergent subsequence of the original sequence. First, note that \(\mathbb{E } [u_i(y,x,w)]\) can be replaced with a function \(\bar{u}_i(y,x)\) that is the expected value of \(u_i\) over \(w\), i.e., there is no loss in generality by assuming that \(u\) is only a function of \(y\) and \(x\), see, e.g., Section 8 in Bertsekas and Shreve (1996) for more on this argument. Let \(U^k(x)\) denote the expected payoffs at stage \(k\) for initial state \(x\) and for a given policy profile \(\mu ^0,\mu ^ 1,\ldots \), i.e., \(U^k(x)=\bar{u}(\mu ^k(x),x)\). Moreover, for a given policy profile we can determine the corresponding state transition probabilities \(\text{ Prob}(x^{k+1}|x^k,k)\). It follows that we can find probabilities of states at each stage of the game conditional on the initial state, i.e., \(\text{ Prob}(x|x^0,k)\). Then player \(i\)’s expected payoff in stage \(k\) is
If we pick a sequence of equilibrium payoff functions \(\{v^j\}_j\), then it has a convergent subsequence because \(X\) is finite and players’ payoffs are bounded. Let \(v\) be the limit. Recall that functions \(v^j, j\ge 0\), can be associated with finite dimensional vectors because there are finitely many states. Hence, convergence can be considered in the usual Euclidean metric. Then by the usual diagonalization argument it is possible to pick a convergent subsequence in which the terms \(U^{k}(x)\) and \(\text{ Prob}(x|x^0,k)\) corresponding to the elements of the sequence \(\{v^j\}_j\) converge for all \(x,x^0\in X\) and \(k\ge 0\). Let \(\bar{U}^k(x)\) and \(q^k(x|x^0)\) denote the resulting limits. Note that the assumption on the finiteness of \(X\) is crucial for this step. We also obtain the expected payoffs \(v_i(x,k), x\in X, k\ge 0, i\in I\), in the limit, with \(v_i(x)=v_i(x,0), x\in X, i\in I\). Moreover, \(v_i(x^0,k)\) satisfies (6) for \(\bar{U}_i^k(x)\) and \(q^k(x|x^0), x\in X, k\ge 0\). By the compactness of payoffs, there are decision functions \(\mu ^k\in M, k\ge 0\), which lead to these payoffs and probabilities of states. Hence, we can construct a policy which yields the limit payoff \(v\).
Let us now show the result of Lemma 2. The above deduction holds particularly for the sequence of payoff functions \(\{v^j\}_j\) in which the component \(v_i^j(x), j\ge 0\), converges to \(v_i^-(V)(x)=\inf \{v_i: v\in V(x)\}\) for a given \(x\in X\). Consequently, there is a subsequence \(\{v^{j_k}\}_k\) that converges to \(\bar{v}^i\) with \(\bar{v}_i^i(x)=v_i^-(V)(x)\). Corresponding to \(\{v^{j_k}\}_k\) we can construct a sequence of policies \(\{\pi ^{x,i,k}\}_k\) giving these payoffs. As observed previously, we can find a policy corresponding to the limit \(\bar{v}^i\). Consequently, we obtain \(\pi ^{x,i}\in \Pi \) for all \(x\in X\) and \(i\in I\) such that
Let \(p^*\) be the penal code composed of these policies. This penal code is an equilibrium because it gives punishment payoffs that are not larger than any other equilibrium payoffs. To be more specific, we can first observe from Proposition 1 that \(\sigma (\pi ^{x,i,k},p^*)\) is a conditional Markov equilibrium, i.e., \(\mu ^j(\pi ^{x,i,k}), j\ge 0\), are incentive compatible for \(v^{j+1}(\pi ^{x,i,k})\) and \(v(p^*), j\ge 0\). By the compactness of payoffs and finiteness of \(X\) it is possible to find a subsequence of policies \(\{\pi ^{x,i,k_l}\}_l\) such that \(v^j(\pi ^{x,i,k_l})(x^0)\) converge for all \(j\ge 0\) and \(x,x^0\in X\) to \(v^j(\pi ^{x,i})(x^0)\). It follows that the limit is incentive compatible for \(v^{j+1}(\pi ^{x,i})\) and \(v(p^*), j\ge 0\). Note that the inequality in the incentive compatibility condition remains in the limit. Consequently, Proposition 1 implies that \(p^*\) is an equilibrium penal code. This proves Lemma 2.
Let us finally show Lemma 3. As argued previously, for any sequence of payoff functions in \(V\), we have a convergent subsequence, and corresponding to the limit payoff \(v\) of the subsequence we can find a policy \(\pi \) that yields \(v\) as its outcome. Similarly as \(p^*\) is shown to be an equilibrium, it can be shown that \(\sigma (\pi ,p^*)\) is an equilibrium. This implies \(v\in V\), i.e., Lemma 3.
Proof of Lemma
4 Let us begin by showing the first result that \(B\) maps compact sets of payoff functions into compact sets. Since \(X\) is finite, compactness is in the sense of the usual topology defined by the Euclidean metric. By the finiteness of \(X\), payoff functions can be associated with finite dimensional vectors. If we pick a sequence vectors (payoff functions) in \(B(S)\), there is a convergent subsequence \(\{v^j\}_j\) because payoffs are bounded. Moreover, there is \(\mu \) corresponding to the limit \(v\) of this subsequence. The limit satisfies \(v(x)=T(\mu (x),x,v^{\prime })\) for all \(x\in X\) and some \(v^{\prime }\in S\). The payoff function \(v^{\prime }\) can be constructed by diagonalization argument, i.e., choosing a subsequence \(\{v^{j_k}\}_k\) with \(v^{j_k}(x)=T(\mu ^{k}(x),x,\bar{v}^k), \bar{v}^ k\in S, x\in X, k\ge 0\), such that \(v^{\prime }\) is obtained in the limit of \(\{\bar{v}^{j_k}\}_k\). Moreover, \(\mu \) is incentive compatible, i.e., it satisfies \(\mu \in IC(v,S)\). Hence, \(B(S)\) is compact.
The proof of the second result is straightforward: we construct a strategy profile \(\sigma \) corresponding to \(v^0\in B(S)\) for which \(U(\sigma ,x)=v^0(x)\) for all \(x\in X\), and then prove that it is an equilibrium.
Let us take \(v^0\in S\). Then \(v^0\in B(S)\), i.e., there is \(\mu ^0\) and \(v^1\in S\) such that \(v^0(x)=T(\mu ^0(x),x,v^1)\) for all \(x\in X\). We can repeat the same deduction for \(v^1\) and so on. This construction gives us \(\pi =(\mu ^0,\mu ^1,\ldots )\) and the corresponding continuation payoff functions \(v^0,v^1,\ldots \). Furthermore, we can construct \(\pi ^{x,i}\) corresponding to \(v_i^-(S)(x)\). Observe that for each \(v_i^-(S)(x)\) there is a continuation payoff function \(v_i^x\in F_i\) such that \(v_i^x(x)= v_i^-(S)(x)\). The construction of \(\pi ^{x,i}\) is similar to that of \(\pi \). As a result, we get a penal code \(p\). Consequently, we obtain a simple strategy \(\sigma (\pi ,p)\) and by construction \(v^k(x)\) is the expected payoff that the players will get when they follow this strategy starting from period \(k\) and state \(x\in X\). By the definition of \(B\) we have
Proposition 1 implies that \(\sigma (\pi ,p)\) is a conditional Markov equilibrium. Hence, it holds that \(v^0\in V\).
Proof of Lemma
5 Lemma 1 follows directly from the results for dynamic programming models, see, e.g., Section 9.4 in Bertsekas and Shreve (1996).
The results of Lemmas 2 and 3 follow similarly as for pure strategies. Now \(U_i^k(x)\) in the proof is replaced with \(\sum _j \text{ Prob}(y^j|k)\bar{u}_i(y^j,x)\). In the diagonalization argument we pick a convergent subsequence of payoff functions such that also \(\{\text{ Prob}(y^j|k)\}_k\) converge for all \(j\). The result then follows.
The fact that \(B(S)\in C\) when \(S\in C\) follows by taking a convergent sequence of payoffs in \(B(S)\) and observing that the limit payoff function satisfies the incentive compatibility constraint and hence belongs to \(B(S)\). The self generation result, Lemma 4, follows by the same deduction as for pure strategies. \(\square \)
Rights and permissions
About this article
Cite this article
Kitti, M. Conditional Markov equilibria in discounted dynamic games. Math Meth Oper Res 78, 77–100 (2013). https://doi.org/10.1007/s00186-013-0433-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-013-0433-x