Abstract
We study nonzero-sum continuous-time stochastic games, also known as continuous-time Markov games, of fixed duration. We concentrate on Markovian strategies. We show by way of example that equilibria need not exist in Markovian strategies, but they always exist in Markovian public-signal correlated strategies. To do so, we develop criteria for a strategy profile to be an equilibrium via differential inclusions, both directly and also by modeling continuous-time stochastic as differential games and using the Hamilton–Jacobi–Bellman equations. We also give an interpretation of equilibria in mixed strategies in continuous time and show that approximate equilibria always exist.
Similar content being viewed by others
Notes
Rieder [31] works with general state spaces and compact action spaces.
As a function of time and state.
When the other players keep their strategies fixed.
I thank an anonymous referee for pointing out this work-in-progress. However, our techniques, unlike theirs, emphasizes also the appropriate convergence of strategies and not just of payoffs.
It is worth noting that in nonzero-sum differential games, the HJB equations are in general ill defined, in the sense that they do not possess unique solutions and the solutions do not depend in a continuous way on initial data; see the survey by Bressan [6]. The lack of uniqueness in our case is not surprising given the usual multiplicity of Nash equilibria; however, it is not known to what extent the solutions are “well behaved” in terms of dependence on initial data in our class of games, which, as we mentioned, can be viewed as a very special class of differential games.
A Markov transition matrix is a square matrix with nonnegative entries such that each row sums to unity.
Although the theorems there are stated for dynamics of the form \(\frac{dx}{dt} = f(t,x)\) when f is continuous in both parameters, it is remarked there that one can derive similar results with minor changes when f satisfies only measurability in the time coordinate.
We thank an anonymous referee for pointing this reference out. The main application in Krasovskii and Subbotin [22] is differential games of evasion and pursuit; since such games are zero-sum games, both their construction and their use of these approximations are different from ours.
By definition, it is multilinear over \(\varSigma(\mathcal {T})\).
The partitions need not be nested.
Where [0,T] has the Lebesgue σ-algebra, Ω has the Borel σ-algebra, and [0,T]×Ω has the induced product σ-algebra.
S is of full measure; σ can be defined arbitrarily outside of S.
\(\mathcal {Q}\) must cover Ω only up a ν-null set; this will be useful later when we mention an example of such a partition which satisfies additional conditions.
The reason we discretize the signal space is because we will view a \(\mathcal {T}\times \mathcal {Q}\)-correlated strategy as a distribution over \(\mathcal {T}\times \mathcal {Q}\)-correlated pure strategies. If we did not discretize Ω, then the strategies for Player p in the time-discretized games would be distributions over mappings from \(\mathcal {T}\times\varOmega\) to I p, and the former does not possess a “reasonable” Borel structure; see Aumann [2]. Alternatively, we could have turned the space of mappings from Ω into a standard Borel space by identifying maps that agree ν-a.e. and using the Borel structure induced by the weak-* topology.
X is convex in a Euclidian space and hence has a natural Lebesgue measure.
See, e.g., [4].
It is not clear whether the boundedness of the transitions rates in this case can be replaced with some integrability condition.
If we allow for r,μ to depend measurably on time, then we require a Carathéodory type of condition: for each fixed point in time, r,μ are continuous in the actions, and for each fixed action profile, r,μ are measurable in time.
For the L 2-induced operator norm, \(\Vert A \cdot B\Vert _{L^{2}} \leq\Vert A\Vert _{L^{2}} \cdot \Vert B\Vert _{L^{2}}\), and all norms on a Euclidian space are equivalent.
v| j induces the Markovian strategy \(\hat {v}|_{j}\) in \([0,t_{m_{j}}]\).
References
Aubin JP (2009) Viability theory. Birkhauser, Boston
Aumann RJ (1961) Borel structures for function spaces. Ill J Math 5:614–630
Barron EN, Evans LC, Jensen R (1984) Viscosity solutions of Isaacs’ equation and differential games with Lipschitz controls. J Differ Equ 53:213–233
Bertsekas D (2005) Dynamic programming and optimal control, vol 1. Athena Scientific, Belmont
Buckdahn R, Li J, Quincampoix M (2012) Value function of differential games without Isaacs conditions. An approach with non-anticipative mixed strategies. Preprint
Bressan A (2011) Noncooperative differential games. Milan J Math 79:357–427
Castaing C, Valadier M (1977) Convex analysis and measurable multifunctions. Lecture notes in mathematics, vol 580. Springer, New York
Coddington E, Levinson N (1972) Theory of ordinary differential equations. McGraw-Hill, New York
Crandall M, Lions P (1983) Viscosity solutions of Hamilton–Jacobi equations. Trans Am Math Soc 277:1–42
Deimling K (1992) Multivalued differential equations. Walter de Gruyer, Berlin
Frankowska H, Plaskacz S, Rzezuchowski T (1995) Measurable viability theorems and the Hamilton–Jacobi–Bellman equations. J Differ Equ 116:265–305
Friedman A (1971) Differential games. Pure and applied mathematics, vol 25. Wiley, New York
Guo X, Hernández-Lerma O (2003) Zero-sum games for continuous-time Markov chains with unbounded transitions and average payoff rates. J Appl Probab 40:327–345
Guo X, Hernández-Lerma O (2005) Nonzero-sum games for continuous-time Markov chains with unbounded discounted payoffs. J Appl Probab 42:303–320
Harsanyi JC, Selten R (1988) A general theory of equilibrium selection in games. MIT Press, Cambridge
Hellwig M, Leininger W (1988) Markov-perfect equilibrium in games of perfect information. Discussion paper A-183, University of Bonn
Himmelberg CJ (1975) Measurable relations. Fundam Math 87:53–72
Hirsch M, Smale S (1974) Differential equations, dynamical systems, and linear algebra. Academic Press, San Diego
Isaacs R (1965) Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization. Wiley, New York
Judd KL (1985) The law of large numbers with a continuum of IID random variables. J Econ Theory 35:19–25
Kohlberg E, Mertens JF (1986) On the strategic stability of equilibria. Econometrica 54:1003–1037
Krasovskii NN, Subbotin AI (1988) Game theoretical control problems. Springer, New York
Kuratowski K, Ryll-Nardzewski C (1965) A general theorem on selectors. Bull Pol Acad Sci, Math 13:379–403
Levy Y (2012a) A discounted stochastic game with no stationary Nash equilibrium: the case of absolutely continuous transitions. DP #612, Center for the Study of Rationality, Hebrew University, Jerusalem
Levy Y (2012b) Continuous-time stochastic games of fixed duration. DP, Center for the Study of Rationality, Hebrew University, Jerusalem, to appear
Maskin E, Tirole J (2001) Markov perfect equilibrium. I. Observable actions. J Econ Theory 100:191–219
Miller B (1967) Finite state continuous-time Markov decision processes with applications to a class of optimization problems in queueing theory. Technical report 15, Stanford University
Miller B (1968) Finite state continuous time Markov decision processes with a finite planning horizon. SIAM J Control 6:266–280
Neyman A (2012) Continuous-time stochastic games. DP #616, Center for the Study of Rationality, Hebrew University, Jerusalem
Nowak AS, Raghavan TES (1992) Existence of stationary correlated equilibria with symmetric information for discounted stochastic games. Math Oper Res 17:519–526
Rieder U (1979) Equilibrium plans for non-zero-sum Markov games. In: Moeschlin O, Pallaschke D (eds) Game theory and related topics. North Holland, Amsterdam, pp 91–102
Souquière A (2013) Nash equilibrium payoffs in mixed strategies. In: Cardaliaguet P, Cressman R (eds) Advanced in dynamic games, vol 13. Springer, New York
Shapley L (1953) Stochastic games. Proc Natl Acad Sci USA 39:1095–1100
Yushkevich AA (1980) Controlled jump Markov models. Theory Probab Appl 25:244–266
Zachrisson LE (1964) Markov games. In: Dresher M, Shapley LS, Tucker AW (eds) Advances in game theory. Princeton University Press, Princeton, pp 211–253
Acknowledgements
Research supported in part by Israel Science Foundation grants 1123/06 and 1596/10. Many thanks to A. Neyman and an anonymous referee for many useful comments.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs Omitted from Sect. 5
Appendix: Proofs Omitted from Sect. 5
Proof of Lemma 2
Note first that ∥A⋅B∥∞≤C ∞∥A∥∞⋅∥B∥∞ for an appropriate C ∞>0 and for any two |Z|×|Z| matrices A,B.Footnote 19 Further note that ∥P u ∥∞≤1 for any Markovian strategy u. Using these observations and (2.1) gives:
Therefore, an application of Gronwall’s inequality (see, e.g., Hirsch and Smale [18]) gives (5.6) for appropriate M>0. To prove (5.7), note that (3.1) shows that for each player \(p \in \mathcal {P}\) and z∈Z,
and so enlarging M appropriately completes the proof of (5.7). □
Proof of Lemma 3
For the simplicity of notation, we will prove only (a); (b) is proved by an almost identical argument, albeit with more cumbersome notation. Let \(\mathcal {T}= \{0 = t_{0} < \cdots< t_{k} < T \}\), fix τ satisfying \(\ell(\mathcal {T}) < \tau\leq\min[T,2\ell(\mathcal {T})]\), and let 0=m 0<m 1<⋯<m n =k+1 (recall that t k+1=T) be such that \(\tau< t_{m_{j}} - t_{m_{j-1}} < 2\tau\) for all 1≤j<n; such a selection is possible. Denote
Clearly D 0=0. We wish to bound D n . Fix 1≤j<n and let \(\mathcal {T}|_{j} = \{ t_{0},\ldots,t_{m_{j} - 1} \}\) and \(\mathcal {T}|_{j+} = \{t_{m_{j}},\ldots,t_{m_{j+1} - 1}\}\); these can be viewed as restrictions of the partition to \([0,t_{m_{j}}]\) and to \([t_{m_{j}},t_{m_{j+1}}]\), respectively. We will denote by v| j , v| j+ generic elements of the sets \(S(\mathcal {T}|_{j}) \cong\prod_{p \in \mathcal {P}}(I^{p})^{Z \times \mathcal {T}|_{j}}\), \(S(\mathcal {T}|_{j+}) \cong\prod_{p \in \mathcal {P}}(I^{p})^{Z \times \mathcal {T}|_{j+}}\), respectively. Note that each \(w \in \varSigma(\mathcal {T})\) induces elements of \(\varSigma(\mathcal {T}|_{j}) \cong\prod_{p \in \mathcal {P}}(\Delta(I^{p}))^{Z \times \mathcal {T}|_{j}}\), \(\varSigma(\mathcal {T}|_{j+})\cong \prod_{p \in \mathcal {P}}(\Delta(I^{p}))^{Z \times \mathcal {T}|_{j+}}\) in the natural way, and this induced distribution is a product distribution on \(\varSigma(\mathcal {T}|_{j+1}) = \varSigma(\mathcal {T}|_{j}) \otimes\varSigma(\mathcal {T}|_{j+})\). Note that for \(t \leq t_{m_{j}}\), \(P_{\hat{v}|_{j}}(t)\) is well defined,Footnote 20 and for \(t_{m_{j}} \leq t < t_{m_{j+1}}\) and any z∈Z, \(\hat{v}|_{j+}(z,t)\) is well defined, see (5.1). Also observe that by the definition of w, for any \(t_{m_{j}} \leq p<q < t_{m_{j+1}}\), we have, for all z∈Z,
where we recall that a pure strategy \(v \in S(\mathcal {T})\) induces a pure Markovian strategy by (5.1). Finally, we note that for any Markovian strategy ϕ, by (2.1),
where C ∞ is as in the proof of Lemma 2. Hence, letting m j ≤p<m j+1 and denoting \(\tau_{0} = t_{m_{j}}\), τ=t p −τ 0, we have
Therefore,
By (10.2), the first two terms together are bounded by C ∞(∥μ∥∞)2 τ 2. As for the second term, note that for 0≤s≤τ,
Using (10.1), we have
Hence, (10.3) and (10.5), together with \(\tau< t_{m_{j+1}} - t_{m_{j}} \leq2\ell(\mathcal {T})\) for all 0≤j<n, imply
with initial condition D 0=0. Denoting \(A = 1 + 2 \ell(\mathcal {T}) \cdot C_{\infty}\Vert \mu\Vert _{\infty}\geq1\) and \(B = 4 C_{\infty}(\Vert \mu\Vert _{\infty})^{2} (\ell(\mathcal {T}))^{2} > 0\), we see that D j+1≤A⋅D j +B with D 0=0, and therefore,
We have \(n \cdot\ell(\mathcal {T}) < T\), where recall that \(t_{m_{n}} = T\), and therefore \(n < \frac{T}{\ell(\mathcal {T})}\). Hence,
Since \(A - 1 = \ell(\mathcal {T}) \cdot C_{\infty}\Vert \mu\Vert _{\infty}\), we have for some C′>0 independent of \(\ell(\mathcal {T})\),
and therefore,
this gives, denoting C=8C ∞(∥μ∥∞)2 C′,
Finally, from (10.2) we have
which completes the proof of (5.8), with K=C+4C ∞∥μ∥∞. Now we prove (5.9). Fix 0≤j<n and denote \(\tau= t_{m_{j+1}} - t_{m_{j}}\). Using (5.8) and techniques similar to those above, we have
Summing over j=0,…,n−1 and recalling that \(n < \frac{T}{\ell(\mathcal {T})}\), \(\tau\leq2\ell(\mathcal {T})\) gives
Enlarging K accordingly gives (5.9). □
Proof of Lemma 4
First, we prove
for appropriate C>0. Given \(w^{p} \in\varSigma^{p}(\mathcal {T})\), set \(u = \hat{w}\); by parts (a) and (b) of Lemma 3,
Next we prove, that for all z∈Z,
for appropriate C>0. We will actually prove a bit more. We show that it suffices to take only pure Markovian strategies in the supremum on the right-hand side. Fix \(w^{p} \in\varSigma^{p}(\mathcal {T}),w^{-p} \in \varSigma^{-p}(\mathcal {T})\). It suffices to show that there is a pure Markovian strategy u p satisfying
for some C>0 that is independent of w p or of \(\mathcal {T}\). For each interval J=[t j ,t j+1] induced by \(\mathcal {T}\), divide J into subintervals (some of which may be degenerate) \(I_{1},\ldots,I_{|I^{p}|}\), with ℓ(I a )=ℓ(J)⋅w p(t j )[a], where ℓ(⋅) will be used to denote the length of an interval, and a denotes the ath element of I p in some ordering. Then define u p(t)[a]=1 if t∈I a . Note then that by the definition of u p and denoting w=(w p,w −p), we have for any interval J=[τ 0,τ 1] in the partition induced by \(\mathcal {T}\),
Note also that w p is an \(\mathcal {T}\)-adapted version of u p. Combining (5.8) and (5.10) of Lemma 3 gives
for K as in Lemma 3. As a result, for each interval J=[τ 0,τ 1] induced by the partition \(\mathcal {T}\), we have, using (10.2) and techniques similar to those used in the proof of Lemma 3,
where the last inequality is because of (10.9) and (10.8). Denoting L=|Z|⋅(2C ∞∥μ∥∞+1)⋅∥r∥∞ and summing over all J induced by the partition induced by \(\mathcal {T}\) gives
where we have used the following elementary claim: For any T,D>0,
where ⌈⋅⌉ denotes the rounding-up function. Indeed, in our case, \(\ell(J) \leq\ell(\mathcal {T})\) for each J induced by \(\mathcal {T}\). □
Rights and permissions
About this article
Cite this article
Levy, Y. Continuous-Time Stochastic Games of Fixed Duration. Dyn Games Appl 3, 279–312 (2013). https://doi.org/10.1007/s13235-012-0067-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13235-012-0067-2