Conditional Markov equilibria in discounted dynamic games

Kitti, Mitri

doi:10.1007/s00186-013-0433-x

Conditional Markov equilibria in discounted dynamic games

Original Article
Published: 22 February 2013

Volume 78, pages 77–100, (2013)
Cite this article

Mathematical Methods of Operations Research Aims and scope Submit manuscript

Mitri Kitti¹

294 Accesses
Explore all metrics

Abstract

This paper introduces conditional Markov strategies in discrete-time discounted dynamic games with perfect monitoring. These are strategies in which players follow Markov policies after all histories. Policies induced by conditional Markov equilibria can be supported with the threat of reverting to the policy that yields the smallest expected equilibrium payoff for the deviator. This leads to a set-valued fixed-point characterization of equilibrium payoff functions. The result can be used for the computation of equilibria and for showing the existence in behavior strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Markov perfect equilibria in a dynamic decision model with quasi-hyperbolic discounting

Article Open access 07 February 2018

Markov decision processes and stochastic games with total effective payoff

Article 28 May 2018

Elementary Subpaths in Discounted Stochastic Games

Article 05 May 2015

Notes

Sleet and Yeltekin (2003) have one example involving stochastic fluctuations although their present the numerical method only for deterministic games.
There is a slight abuse of notation throughout the paper when we need to write $y$ as $(y_i,y_{-i})$ or $\sigma =(\sigma _i,\sigma _{-i})$. For example, instead of writing $u_i((y_i,y_{-i}),x,w)$ we denote $u_i(y_i,y_{-i},x,w)$. Respectively, $u_i(\sigma _i(h),\sigma _{-i}(h),x(h),w)$ stands for $u_i((\sigma _i(h),\sigma _{-i}(h)),x(h),w)$.

References

Abreu D (1986) Extremal equilibria of oligopolistic supergames. J Econ Theory 39(2):191–225
Article MathSciNet MATH Google Scholar
Abreu D (1988) On the theory of infinitely repeated games with discounting. Econometrica 56(2):383–396
Article MathSciNet MATH Google Scholar
Abreu D, Pearce D, Stacchetti E (1986) Optimal cartel equilibria with imperfect monitoring. J Econ Theory 39(1):251–269
Article MathSciNet MATH Google Scholar
Abreu D, Pearce D, Stacchetti E (1990) Toward a theory of discounted repeated games with imperfect monitoring. Econometrica 58(5):1041–1063
Article MathSciNet MATH Google Scholar
Aguirregabiria V, Mira P (2007) Sequential estimation of dynamic discrete games. Econometrica 75(1):1–53
Article MathSciNet MATH Google Scholar
Bajari P, Benkard L, Levin J (2007) Estimating dynamic models of imperfect competition. Econometrica 75(5):1331–1370
Article MathSciNet MATH Google Scholar
Berg K, Kitti M (2012) Equilibrium paths in discounted supergames. Working paper
Berg K, Kitti M (2013) Computing equilibria in discounted $2\times 2$ supergames. Comput Econ 41(1):71–88
Article Google Scholar
Berry S, Ostrovsky M, Pakes A (2007) Simple estimators for the parameters of discrete dynamic games. RAND J Econ 38(2):373–399
Article Google Scholar
Bertsekas DP, Shreve SE (1996) Stochastic optimal control: the discrete time case. Athena Scientific, Belmont, Massachusetts
Google Scholar
Cole HL, Kocherlakota N (2001) Dynamic games with hidden actions and hidden states. J Econ Theory 98(1):114–126
Article MathSciNet MATH Google Scholar
Cronshaw MB (1997) Algorithms for finding repeated game equilibria. Comput Econ 10(2):139–168
Article MATH Google Scholar
Cronshaw MB, Luenberger DG (1994) Strongly symmetric subgame perfect equilibria in infinitely repeated games with perfect monitoring. Games Econ Behav 6(2):220–237
Article MathSciNet MATH Google Scholar
Doraszelski U, Escobar J (2012) Restricted feedback in long term relationships. J Econ Theory 147(1): 142–161
Google Scholar
Doraszelski U, Pakes A (2007) A framework for applied dynamic analysis in IO. In: Armstrong M, Porter R (eds) Handbook of industrial organization, vol 3. North-Holland, Amsterdam, pp 1887–1966
Google Scholar
Doraszelski U, Satterthwaite M (2010) Computable Markov-perfect industry dynamics. RAND J Econ 41(2):215–243
Article MathSciNet Google Scholar
Duffie D, Geanakoplos J, Mas-Colell A, McLennan A (1994) Stationary Markov equilibria. Econometrica 62(4):745–781
Article MathSciNet MATH Google Scholar
Dutta PK, Radner R (2006) A game-theoretic approach to global warming. In: Kasuoka S, Yamazaki A (eds). Advances in mathematical economics, vol 8. Springer-Verlag, Tokyo, pp 135–153
Ely JC, Hörner J, Olszewski W (2005) Belief-free equilibria in repeated games. Econometrica 73(2): 377–415
Google Scholar
Ericson R, Pakes A (1995) Markov-perfect industry dynamics: a framework for empirical work. Rev Econ Stud 62:53–82
Article MATH Google Scholar
Fink A (1964) Equilibrium points of stochastic noncooperative games. Hiroshima Univ Ser A 28:89–93
MathSciNet MATH Google Scholar
Fudenberg D, Levine D (1983) Subgame-perfect equilibria of finite- and infinite-horizon games. J Econ Theory 31(2):251–267
Article MathSciNet MATH Google Scholar
Judd K, Yeltekin Ş, Conklin J (2003) Computing supergame equilibria. Econometrica 71(4):1239–1254
Article MathSciNet MATH Google Scholar
Käenmäki A, Vilppolainen M, (2010) Dimension and measures of sub-self-affine sets. Monatshefte für Mathematik 161(3): 271–293
Google Scholar
Kandori M (2011) Weakly belief-free equilibria in repeated games with private monitoring. Econometrica 79(3):877–892
Article MathSciNet MATH Google Scholar
Kitti M (2011) Conditionally stationary equilibria in discounted dynamic games. Dyn Games Appl 1(4): 514–533
Google Scholar
Maitra AP, Sudderth WD (2007) Subgame-perfect equilibria for stochastic games. Math Oper Res 32(3):711–722
Article MathSciNet MATH Google Scholar
Mertens JF, Parthasarathy T (1991) Nonzerosum stochastic games. In: Raghavan TES, Ferguson TS, Parthasarathy T, Vrieze OJ (eds) Stochastic games and related topics. Kluwer, Boston
Google Scholar
Nash JF (1951) Non-cooperative games. Ann Math 54(2):286–295
Article MathSciNet MATH Google Scholar
Phelan C, Stacchetti E (2001) Sequential equilibria in a Ramsey tax model. Econometrica 69(6):1491–1518
Article MathSciNet MATH Google Scholar
Rockafellar RT, Wets RJ-B (1998) Variational analysis. Springer, Berlin
Book MATH Google Scholar
Shapley LS (1953) Stochastic games. Proc Natl Acad Sci USA 39(10):1095–1100
Article MathSciNet MATH Google Scholar
Sleet C, Yeltekin Ş (2003) On the computation of value correspondences. Working paper
Sleet C, Yeltekin Ş (2006) Optimal taxation with endogenously incomplete asset markets. J Econ Theory 127(1):36–73
Article MathSciNet MATH Google Scholar
Sleet C, Yeltekin Ş (2007) Recursive monetary policy games with incomplete information. J Econ Dyn Control 31(5):1557–1583
Article MathSciNet MATH Google Scholar
Solan E (1998) Discounted stochastic games. Math Oper Res 23(4):1010–1021
Article MathSciNet MATH Google Scholar
Whitt W (1980) Representation and approximation of noncooperative sequential games. SIAM J Control Optim 18(1):33–48
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Economics, University of Turku, 20014 , Turku, Finland
Mitri Kitti

Authors

Mitri Kitti
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mitri Kitti.

Additional information

I thank two anonymous referees for their comments. Funding from the Academy of Finland is gratefully acknowledged.

Appendix: Auxiliary Proofs

Proof of lemmas

2 and 3. The purpose is to first show that if we pick a sequence of equilibrium payoff functions, then there is a policy that gives the payoff function obtained in the limit of a convergent subsequence of the original sequence. First, note that $\mathbb{E } [u_i(y,x,w)]$ can be replaced with a function $\bar{u}_i(y,x)$ that is the expected value of $u_i$ over $w$, i.e., there is no loss in generality by assuming that $u$ is only a function of $y$ and $x$, see, e.g., Section 8 in Bertsekas and Shreve (1996) for more on this argument. Let $U^k(x)$ denote the expected payoffs at stage $k$ for initial state $x$ and for a given policy profile $\mu ^0,\mu ^ 1,\ldots $, i.e., $U^k(x)=\bar{u}(\mu ^k(x),x)$. Moreover, for a given policy profile we can determine the corresponding state transition probabilities $\text{ Prob}(x^{k+1}|x^k,k)$. It follows that we can find probabilities of states at each stage of the game conditional on the initial state, i.e., $\text{ Prob}(x|x^0,k)$. Then player $i$’s expected payoff in stage $k$ is

$$\begin{aligned} v_i(x^0,k)=\sum _{j=k}^{\infty } \delta _i^{j-k}\sum _{x\in X}\text{ Prob}(x|x^0,k) U_i^{j} (x) \quad \text{ for} \text{ all}\ k\ge 0. \end{aligned}$$

(6)

If we pick a sequence of equilibrium payoff functions $\{v^j\}_j$, then it has a convergent subsequence because $X$ is finite and players’ payoffs are bounded. Let $v$ be the limit. Recall that functions $v^j, j\ge 0$, can be associated with finite dimensional vectors because there are finitely many states. Hence, convergence can be considered in the usual Euclidean metric. Then by the usual diagonalization argument it is possible to pick a convergent subsequence in which the terms $U^{k}(x)$ and $\text{ Prob}(x|x^0,k)$ corresponding to the elements of the sequence $\{v^j\}_j$ converge for all $x,x^0\in X$ and $k\ge 0$. Let $\bar{U}^k(x)$ and $q^k(x|x^0)$ denote the resulting limits. Note that the assumption on the finiteness of $X$ is crucial for this step. We also obtain the expected payoffs $v_i(x,k), x\in X, k\ge 0, i\in I$, in the limit, with $v_i(x)=v_i(x,0), x\in X, i\in I$. Moreover, $v_i(x^0,k)$ satisfies (6) for $\bar{U}_i^k(x)$ and $q^k(x|x^0), x\in X, k\ge 0$. By the compactness of payoffs, there are decision functions $\mu ^k\in M, k\ge 0$, which lead to these payoffs and probabilities of states. Hence, we can construct a policy which yields the limit payoff $v$.

Let us now show the result of Lemma 2. The above deduction holds particularly for the sequence of payoff functions $\{v^j\}_j$ in which the component $v_i^j(x), j\ge 0$, converges to $v_i^-(V)(x)=\inf \{v_i: v\in V(x)\}$ for a given $x\in X$. Consequently, there is a subsequence $\{v^{j_k}\}_k$ that converges to $\bar{v}^i$ with $\bar{v}_i^i(x)=v_i^-(V)(x)$. Corresponding to $\{v^{j_k}\}_k$ we can construct a sequence of policies $\{\pi ^{x,i,k}\}_k$ giving these payoffs. As observed previously, we can find a policy corresponding to the limit $\bar{v}^i$. Consequently, we obtain $\pi ^{x,i}\in \Pi $ for all $x\in X$ and $i\in I$ such that

$$\begin{aligned} \bar{v}_i^ i(x)=v_i(\pi ^{x,i})(x)=\inf \{v_i: v\in S(x)\}. \end{aligned}$$

Let $p^*$ be the penal code composed of these policies. This penal code is an equilibrium because it gives punishment payoffs that are not larger than any other equilibrium payoffs. To be more specific, we can first observe from Proposition 1 that $\sigma (\pi ^{x,i,k},p^*)$ is a conditional Markov equilibrium, i.e., $\mu ^j(\pi ^{x,i,k}), j\ge 0$, are incentive compatible for $v^{j+1}(\pi ^{x,i,k})$ and $v(p^*), j\ge 0$. By the compactness of payoffs and finiteness of $X$ it is possible to find a subsequence of policies $\{\pi ^{x,i,k_l}\}_l$ such that $v^j(\pi ^{x,i,k_l})(x^0)$ converge for all $j\ge 0$ and $x,x^0\in X$ to $v^j(\pi ^{x,i})(x^0)$. It follows that the limit is incentive compatible for $v^{j+1}(\pi ^{x,i})$ and $v(p^*), j\ge 0$. Note that the inequality in the incentive compatibility condition remains in the limit. Consequently, Proposition 1 implies that $p^*$ is an equilibrium penal code. This proves Lemma 2.

Let us finally show Lemma 3. As argued previously, for any sequence of payoff functions in $V$, we have a convergent subsequence, and corresponding to the limit payoff $v$ of the subsequence we can find a policy $\pi $ that yields $v$ as its outcome. Similarly as $p^*$ is shown to be an equilibrium, it can be shown that $\sigma (\pi ,p^*)$ is an equilibrium. This implies $v\in V$, i.e., Lemma 3.

Proof of Lemma

4 Let us begin by showing the first result that $B$ maps compact sets of payoff functions into compact sets. Since $X$ is finite, compactness is in the sense of the usual topology defined by the Euclidean metric. By the finiteness of $X$, payoff functions can be associated with finite dimensional vectors. If we pick a sequence vectors (payoff functions) in $B(S)$, there is a convergent subsequence $\{v^j\}_j$ because payoffs are bounded. Moreover, there is $\mu $ corresponding to the limit $v$ of this subsequence. The limit satisfies $v(x)=T(\mu (x),x,v^{\prime })$ for all $x\in X$ and some $v^{\prime }\in S$. The payoff function $v^{\prime }$ can be constructed by diagonalization argument, i.e., choosing a subsequence $\{v^{j_k}\}_k$ with $v^{j_k}(x)=T(\mu ^{k}(x),x,\bar{v}^k), \bar{v}^ k\in S, x\in X, k\ge 0$, such that $v^{\prime }$ is obtained in the limit of $\{\bar{v}^{j_k}\}_k$. Moreover, $\mu $ is incentive compatible, i.e., it satisfies $\mu \in IC(v,S)$. Hence, $B(S)$ is compact.

The proof of the second result is straightforward: we construct a strategy profile $\sigma $ corresponding to $v^0\in B(S)$ for which $U(\sigma ,x)=v^0(x)$ for all $x\in X$, and then prove that it is an equilibrium.

Let us take $v^0\in S$. Then $v^0\in B(S)$, i.e., there is $\mu ^0$ and $v^1\in S$ such that $v^0(x)=T(\mu ^0(x),x,v^1)$ for all $x\in X$. We can repeat the same deduction for $v^1$ and so on. This construction gives us $\pi =(\mu ^0,\mu ^1,\ldots )$ and the corresponding continuation payoff functions $v^0,v^1,\ldots $. Furthermore, we can construct $\pi ^{x,i}$ corresponding to $v_i^-(S)(x)$. Observe that for each $v_i^-(S)(x)$ there is a continuation payoff function $v_i^x\in F_i$ such that $v_i^x(x)= v_i^-(S)(x)$. The construction of $\pi ^{x,i}$ is similar to that of $\pi $. As a result, we get a penal code $p$. Consequently, we obtain a simple strategy $\sigma (\pi ,p)$ and by construction $v^k(x)$ is the expected payoff that the players will get when they follow this strategy starting from period $k$ and state $x\in X$. By the definition of $B$ we have

$$\begin{aligned} \mu ^k(\pi )\in IC\left(v^{k+1},S\right) \quad \text{ and}\quad \mu ^k(\pi ^{x,i})\in IC\left(v^{k+1}(\pi ^{x,i}),S\right)\quad \text{ for} \text{ all}\ k\ge 0. \end{aligned}$$

Proposition 1 implies that $\sigma (\pi ,p)$ is a conditional Markov equilibrium. Hence, it holds that $v^0\in V$.

Proof of Lemma

5 Lemma 1 follows directly from the results for dynamic programming models, see, e.g., Section 9.4 in Bertsekas and Shreve (1996).

The results of Lemmas 2 and 3 follow similarly as for pure strategies. Now $U_i^k(x)$ in the proof is replaced with $\sum _j \text{ Prob}(y^j|k)\bar{u}_i(y^j,x)$. In the diagonalization argument we pick a convergent subsequence of payoff functions such that also $\{\text{ Prob}(y^j|k)\}_k$ converge for all $j$. The result then follows.

The fact that $B(S)\in C$ when $S\in C$ follows by taking a convergent sequence of payoffs in $B(S)$ and observing that the limit payoff function satisfies the incentive compatibility constraint and hence belongs to $B(S)$. The self generation result, Lemma 4, follows by the same deduction as for pure strategies. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kitti, M. Conditional Markov equilibria in discounted dynamic games. Math Meth Oper Res 78, 77–100 (2013). https://doi.org/10.1007/s00186-013-0433-x

Download citation

Received: 20 October 2011
Accepted: 08 February 2013
Published: 22 February 2013
Issue Date: August 2013
DOI: https://doi.org/10.1007/s00186-013-0433-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Conditional Markov equilibria in discounted dynamic games

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Markov perfect equilibria in a dynamic decision model with quasi-hyperbolic discounting

Markov decision processes and stochastic games with total effective payoff

Elementary Subpaths in Discounted Stochastic Games

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Auxiliary Proofs

Appendix: Auxiliary Proofs

Proof of lemmas

Proof of Lemma

Proof of Lemma

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now