Skip to main content
Log in

Continuous-Time Stochastic Games of Fixed Duration

  • Published:
Dynamic Games and Applications Aims and scope Submit manuscript

Abstract

We study nonzero-sum continuous-time stochastic games, also known as continuous-time Markov games, of fixed duration. We concentrate on Markovian strategies. We show by way of example that equilibria need not exist in Markovian strategies, but they always exist in Markovian public-signal correlated strategies. To do so, we develop criteria for a strategy profile to be an equilibrium via differential inclusions, both directly and also by modeling continuous-time stochastic as differential games and using the Hamilton–Jacobi–Bellman equations. We also give an interpretation of equilibria in mixed strategies in continuous time and show that approximate equilibria always exist.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Rieder [31] works with general state spaces and compact action spaces.

  2. As a function of time and state.

  3. When the other players keep their strategies fixed.

  4. I thank an anonymous referee for pointing out this work-in-progress. However, our techniques, unlike theirs, emphasizes also the appropriate convergence of strategies and not just of payoffs.

  5. It is worth noting that in nonzero-sum differential games, the HJB equations are in general ill defined, in the sense that they do not possess unique solutions and the solutions do not depend in a continuous way on initial data; see the survey by Bressan [6]. The lack of uniqueness in our case is not surprising given the usual multiplicity of Nash equilibria; however, it is not known to what extent the solutions are “well behaved” in terms of dependence on initial data in our class of games, which, as we mentioned, can be viewed as a very special class of differential games.

  6. A Markov transition matrix is a square matrix with nonnegative entries such that each row sums to unity.

  7. Although the theorems there are stated for dynamics of the form \(\frac{dx}{dt} = f(t,x)\) when f is continuous in both parameters, it is remarked there that one can derive similar results with minor changes when f satisfies only measurability in the time coordinate.

  8. We thank an anonymous referee for pointing this reference out. The main application in Krasovskii and Subbotin [22] is differential games of evasion and pursuit; since such games are zero-sum games, both their construction and their use of these approximations are different from ours.

  9. By definition, it is multilinear over \(\varSigma(\mathcal {T})\).

  10. The partitions need not be nested.

  11. Where [0,T] has the Lebesgue σ-algebra, Ω has the Borel σ-algebra, and [0,TΩ has the induced product σ-algebra.

  12. S is of full measure; σ can be defined arbitrarily outside of S.

  13. \(\mathcal {Q}\) must cover Ω only up a ν-null set; this will be useful later when we mention an example of such a partition which satisfies additional conditions.

  14. The reason we discretize the signal space is because we will view a \(\mathcal {T}\times \mathcal {Q}\)-correlated strategy as a distribution over \(\mathcal {T}\times \mathcal {Q}\)-correlated pure strategies. If we did not discretize Ω, then the strategies for Player p in the time-discretized games would be distributions over mappings from \(\mathcal {T}\times\varOmega\) to I p, and the former does not possess a “reasonable” Borel structure; see Aumann [2]. Alternatively, we could have turned the space of mappings from Ω into a standard Borel space by identifying maps that agree ν-a.e. and using the Borel structure induced by the weak-* topology.

  15. X is convex in a Euclidian space and hence has a natural Lebesgue measure.

  16. See, e.g., [4].

  17. It is not clear whether the boundedness of the transitions rates in this case can be replaced with some integrability condition.

  18. If we allow for r,μ to depend measurably on time, then we require a Carathéodory type of condition: for each fixed point in time, r,μ are continuous in the actions, and for each fixed action profile, r,μ are measurable in time.

  19. For the L 2-induced operator norm, \(\Vert A \cdot B\Vert _{L^{2}} \leq\Vert A\Vert _{L^{2}} \cdot \Vert B\Vert _{L^{2}}\), and all norms on a Euclidian space are equivalent.

  20. v| j induces the Markovian strategy \(\hat {v}|_{j}\) in \([0,t_{m_{j}}]\).

References

  1. Aubin JP (2009) Viability theory. Birkhauser, Boston

    Book  MATH  Google Scholar 

  2. Aumann RJ (1961) Borel structures for function spaces. Ill J Math 5:614–630

    MathSciNet  MATH  Google Scholar 

  3. Barron EN, Evans LC, Jensen R (1984) Viscosity solutions of Isaacs’ equation and differential games with Lipschitz controls. J Differ Equ 53:213–233

    Article  MathSciNet  MATH  Google Scholar 

  4. Bertsekas D (2005) Dynamic programming and optimal control, vol 1. Athena Scientific, Belmont

    MATH  Google Scholar 

  5. Buckdahn R, Li J, Quincampoix M (2012) Value function of differential games without Isaacs conditions. An approach with non-anticipative mixed strategies. Preprint

  6. Bressan A (2011) Noncooperative differential games. Milan J Math 79:357–427

    Article  MathSciNet  MATH  Google Scholar 

  7. Castaing C, Valadier M (1977) Convex analysis and measurable multifunctions. Lecture notes in mathematics, vol 580. Springer, New York

    Book  MATH  Google Scholar 

  8. Coddington E, Levinson N (1972) Theory of ordinary differential equations. McGraw-Hill, New York

    Google Scholar 

  9. Crandall M, Lions P (1983) Viscosity solutions of Hamilton–Jacobi equations. Trans Am Math Soc 277:1–42

    Article  MathSciNet  MATH  Google Scholar 

  10. Deimling K (1992) Multivalued differential equations. Walter de Gruyer, Berlin

    Book  MATH  Google Scholar 

  11. Frankowska H, Plaskacz S, Rzezuchowski T (1995) Measurable viability theorems and the Hamilton–Jacobi–Bellman equations. J Differ Equ 116:265–305

    Article  MathSciNet  MATH  Google Scholar 

  12. Friedman A (1971) Differential games. Pure and applied mathematics, vol 25. Wiley, New York

    MATH  Google Scholar 

  13. Guo X, Hernández-Lerma O (2003) Zero-sum games for continuous-time Markov chains with unbounded transitions and average payoff rates. J Appl Probab 40:327–345

    Article  MathSciNet  MATH  Google Scholar 

  14. Guo X, Hernández-Lerma O (2005) Nonzero-sum games for continuous-time Markov chains with unbounded discounted payoffs. J Appl Probab 42:303–320

    Article  MathSciNet  MATH  Google Scholar 

  15. Harsanyi JC, Selten R (1988) A general theory of equilibrium selection in games. MIT Press, Cambridge

    MATH  Google Scholar 

  16. Hellwig M, Leininger W (1988) Markov-perfect equilibrium in games of perfect information. Discussion paper A-183, University of Bonn

  17. Himmelberg CJ (1975) Measurable relations. Fundam Math 87:53–72

    MathSciNet  MATH  Google Scholar 

  18. Hirsch M, Smale S (1974) Differential equations, dynamical systems, and linear algebra. Academic Press, San Diego

    MATH  Google Scholar 

  19. Isaacs R (1965) Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization. Wiley, New York

    MATH  Google Scholar 

  20. Judd KL (1985) The law of large numbers with a continuum of IID random variables. J Econ Theory 35:19–25

    Article  MathSciNet  MATH  Google Scholar 

  21. Kohlberg E, Mertens JF (1986) On the strategic stability of equilibria. Econometrica 54:1003–1037

    Article  MathSciNet  MATH  Google Scholar 

  22. Krasovskii NN, Subbotin AI (1988) Game theoretical control problems. Springer, New York

    Book  MATH  Google Scholar 

  23. Kuratowski K, Ryll-Nardzewski C (1965) A general theorem on selectors. Bull Pol Acad Sci, Math 13:379–403

    MathSciNet  Google Scholar 

  24. Levy Y (2012a) A discounted stochastic game with no stationary Nash equilibrium: the case of absolutely continuous transitions. DP #612, Center for the Study of Rationality, Hebrew University, Jerusalem

  25. Levy Y (2012b) Continuous-time stochastic games of fixed duration. DP, Center for the Study of Rationality, Hebrew University, Jerusalem, to appear

  26. Maskin E, Tirole J (2001) Markov perfect equilibrium. I. Observable actions. J Econ Theory 100:191–219

    Article  MathSciNet  MATH  Google Scholar 

  27. Miller B (1967) Finite state continuous-time Markov decision processes with applications to a class of optimization problems in queueing theory. Technical report 15, Stanford University

  28. Miller B (1968) Finite state continuous time Markov decision processes with a finite planning horizon. SIAM J Control 6:266–280

    Article  MathSciNet  MATH  Google Scholar 

  29. Neyman A (2012) Continuous-time stochastic games. DP #616, Center for the Study of Rationality, Hebrew University, Jerusalem

  30. Nowak AS, Raghavan TES (1992) Existence of stationary correlated equilibria with symmetric information for discounted stochastic games. Math Oper Res 17:519–526

    Article  MathSciNet  MATH  Google Scholar 

  31. Rieder U (1979) Equilibrium plans for non-zero-sum Markov games. In: Moeschlin O, Pallaschke D (eds) Game theory and related topics. North Holland, Amsterdam, pp 91–102

    Google Scholar 

  32. Souquière A (2013) Nash equilibrium payoffs in mixed strategies. In: Cardaliaguet P, Cressman R (eds) Advanced in dynamic games, vol 13. Springer, New York

    Google Scholar 

  33. Shapley L (1953) Stochastic games. Proc Natl Acad Sci USA 39:1095–1100

    Article  MathSciNet  MATH  Google Scholar 

  34. Yushkevich AA (1980) Controlled jump Markov models. Theory Probab Appl 25:244–266

    Article  MATH  Google Scholar 

  35. Zachrisson LE (1964) Markov games. In: Dresher M, Shapley LS, Tucker AW (eds) Advances in game theory. Princeton University Press, Princeton, pp 211–253

    Google Scholar 

Download references

Acknowledgements

Research supported in part by Israel Science Foundation grants 1123/06 and 1596/10. Many thanks to A. Neyman and an anonymous referee for many useful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yehuda Levy.

Appendix: Proofs Omitted from Sect. 5

Appendix: Proofs Omitted from Sect. 5

Proof of Lemma 2

Note first that ∥ABC A⋅∥B for an appropriate C >0 and for any two |Z|×|Z| matrices A,B.Footnote 19 Further note that ∥P u ≤1 for any Markovian strategy u. Using these observations and (2.1) gives:

Therefore, an application of Gronwall’s inequality (see, e.g., Hirsch and Smale [18]) gives (5.6) for appropriate M>0. To prove (5.7), note that (3.1) shows that for each player \(p \in \mathcal {P}\) and zZ,

and so enlarging M appropriately completes the proof of (5.7). □

Proof of Lemma 3

For the simplicity of notation, we will prove only (a); (b) is proved by an almost identical argument, albeit with more cumbersome notation. Let \(\mathcal {T}= \{0 = t_{0} < \cdots< t_{k} < T \}\), fix τ satisfying \(\ell(\mathcal {T}) < \tau\leq\min[T,2\ell(\mathcal {T})]\), and let 0=m 0<m 1<⋯<m n =k+1 (recall that t k+1=T) be such that \(\tau< t_{m_{j}} - t_{m_{j-1}} < 2\tau\) for all 1≤j<n; such a selection is possible. Denote

$$D_j := \sup_{t \in\{ t_0, \dots, t_{m_j} \}} \bigl\Vert P_u(t) - P_w(t)\bigr\Vert _\infty. $$

Clearly D 0=0. We wish to bound D n . Fix 1≤j<n and let \(\mathcal {T}|_{j} = \{ t_{0},\ldots,t_{m_{j} - 1} \}\) and \(\mathcal {T}|_{j+} = \{t_{m_{j}},\ldots,t_{m_{j+1} - 1}\}\); these can be viewed as restrictions of the partition to \([0,t_{m_{j}}]\) and to \([t_{m_{j}},t_{m_{j+1}}]\), respectively. We will denote by v| j , v| j+ generic elements of the sets \(S(\mathcal {T}|_{j}) \cong\prod_{p \in \mathcal {P}}(I^{p})^{Z \times \mathcal {T}|_{j}}\), \(S(\mathcal {T}|_{j+}) \cong\prod_{p \in \mathcal {P}}(I^{p})^{Z \times \mathcal {T}|_{j+}}\), respectively. Note that each \(w \in \varSigma(\mathcal {T})\) induces elements of \(\varSigma(\mathcal {T}|_{j}) \cong\prod_{p \in \mathcal {P}}(\Delta(I^{p}))^{Z \times \mathcal {T}|_{j}}\), \(\varSigma(\mathcal {T}|_{j+})\cong \prod_{p \in \mathcal {P}}(\Delta(I^{p}))^{Z \times \mathcal {T}|_{j+}}\) in the natural way, and this induced distribution is a product distribution on \(\varSigma(\mathcal {T}|_{j+1}) = \varSigma(\mathcal {T}|_{j}) \otimes\varSigma(\mathcal {T}|_{j+})\). Note that for \(t \leq t_{m_{j}}\), \(P_{\hat{v}|_{j}}(t)\) is well defined,Footnote 20 and for \(t_{m_{j}} \leq t < t_{m_{j+1}}\) and any zZ, \(\hat{v}|_{j+}(z,t)\) is well defined, see (5.1). Also observe that by the definition of w, for any \(t_{m_{j}} \leq p<q < t_{m_{j+1}}\), we have, for all zZ,

$$ \int _{t_p}^{t_q} \mu\bigl(u(z,s)\bigr)\,ds = \sum _{v|_{j+} \in \mathcal {T}|_{j+}} \int_{t_p}^{t_q} w[v|_{j+}]\cdot\mu\bigl(\hat{v}|_{j+}(z,s)\bigr)\,ds, $$
(10.1)

where we recall that a pure strategy \(v \in S(\mathcal {T})\) induces a pure Markovian strategy by (5.1). Finally, we note that for any Markovian strategy ϕ, by (2.1),

$$ \bigl\Vert P_\phi(t) - P_\phi (s)\bigr\Vert _\infty\leq \int_s^t \bigl\Vert P_\phi(s)\mu\bigl(\phi(s)\bigr)\bigr\Vert _\infty \,ds \leq C_\infty \Vert \mu\Vert _\infty|t-s|, $$
(10.2)

where C is as in the proof of Lemma 2. Hence, letting m j p<m j+1 and denoting \(\tau_{0} = t_{m_{j}}\), τ=t p τ 0, we have

Therefore,

(10.3)
(10.4)

By (10.2), the first two terms together are bounded by C (∥μ)2 τ 2. As for the second term, note that for 0≤sτ,

Using (10.1), we have

(10.5)

Hence, (10.3) and (10.5), together with \(\tau< t_{m_{j+1}} - t_{m_{j}} \leq2\ell(\mathcal {T})\) for all 0≤j<n, imply

$$D_{j+1} \leq D_j + 4 C_\infty\bigl(\Vert \mu\Vert _\infty\bigr)^2 \bigl(\ell(\mathcal {T})\bigr)^2 + 2 D_j \cdot \ell(\mathcal {T}) \cdot C_\infty\Vert \mu\Vert _\infty $$

with initial condition D 0=0. Denoting \(A = 1 + 2 \ell(\mathcal {T}) \cdot C_{\infty}\Vert \mu\Vert _{\infty}\geq1\) and \(B = 4 C_{\infty}(\Vert \mu\Vert _{\infty})^{2} (\ell(\mathcal {T}))^{2} > 0\), we see that D j+1AD j +B with D 0=0, and therefore,

$$D_n \leq B \cdot\bigl(1 + A + \cdots+ A^{n-1}\bigr) = B \frac{A^{n} - 1}{A - 1}. $$

We have \(n \cdot\ell(\mathcal {T}) < T\), where recall that \(t_{m_{n}} = T\), and therefore \(n < \frac{T}{\ell(\mathcal {T})}\). Hence,

$$A^n = \bigl(1 + \ell(\mathcal {T}) \cdot C_\infty\Vert \mu\Vert _\infty\bigr)^{n} \leq\bigl(1 + 2\cdot\ell(\mathcal {T}) C_\infty\Vert \mu\Vert _\infty\bigr)^{\frac{T}{\ell(\mathcal {T})}} \leq e^{2 T C_\infty\Vert \mu\Vert _\infty}. $$

Since \(A - 1 = \ell(\mathcal {T}) \cdot C_{\infty}\Vert \mu\Vert _{\infty}\), we have for some C′>0 independent of \(\ell(\mathcal {T})\),

$$\ell(\mathcal {T}) \frac{A^{n} - 1}{A - 1} \leq C', $$

and therefore,

$$D_n \leq\frac{B}{\ell(\mathcal {T})}C' = 4 C_\infty\bigl(\Vert \mu\Vert _\infty\bigl)^2 C' \ell (\mathcal {T}) \leq8 C_\infty\bigl(\Vert \mu\Vert _\infty\bigr)^2 C' \ell( \mathcal {T}); $$

this gives, denoting C=8C (∥μ)2 C′,

$$ D_n \leq C\cdot\ell( \mathcal {T}). $$
(10.6)

Finally, from (10.2) we have

which completes the proof of (5.8), with K=C+4C μ. Now we prove (5.9). Fix 0≤j<n and denote \(\tau= t_{m_{j+1}} - t_{m_{j}}\). Using (5.8) and techniques similar to those above, we have

Summing over j=0,…,n−1 and recalling that \(n < \frac{T}{\ell(\mathcal {T})}\), \(\tau\leq2\ell(\mathcal {T})\) gives

Enlarging K accordingly gives (5.9). □

Proof of Lemma 4

First, we prove

$$\sup_{u^p \in \mathfrak {A}^p} \gamma^p_{u^p,w^{-p}}(z) \leq \sup_{w^p \in \varSigma^p(\mathcal {T})} \gamma^p_{w^p,w^{-p}}(z) + C \cdot\ell(\mathcal {T}) $$

for appropriate C>0. Given \(w^{p} \in\varSigma^{p}(\mathcal {T})\), set \(u = \hat{w}\); by parts (a) and (b) of Lemma 3,

Next we prove, that for all zZ,

$$\sup_{w^p \in\varSigma^p(\mathcal {T})} \gamma^p_{w^p,w^{-p}}(z) \leq \sup_{u^p \in \mathfrak {A}^p} \gamma^p_{u^p,w^{-p}}(z) + C \cdot\ell(\mathcal {T}) $$

for appropriate C>0. We will actually prove a bit more. We show that it suffices to take only pure Markovian strategies in the supremum on the right-hand side. Fix \(w^{p} \in\varSigma^{p}(\mathcal {T}),w^{-p} \in \varSigma^{-p}(\mathcal {T})\). It suffices to show that there is a pure Markovian strategy u p satisfying

$$ \bigl\Vert \gamma^p_{w^p,w^{-p}} - \gamma^p_{u^p,w^{-p}}\bigr\Vert _\infty \leq C \cdot \ell(\mathcal {T}) $$
(10.7)

for some C>0 that is independent of w p or of \(\mathcal {T}\). For each interval J=[t j ,t j+1] induced by \(\mathcal {T}\), divide J into subintervals (some of which may be degenerate) \(I_{1},\ldots,I_{|I^{p}|}\), with (I a )=(J)⋅w p(t j )[a], where (⋅) will be used to denote the length of an interval, and a denotes the ath element of I p in some ordering. Then define u p(t)[a]=1 if tI a . Note then that by the definition of u p and denoting w=(w p,w p), we have for any interval J=[τ 0,τ 1] in the partition induced by \(\mathcal {T}\),

(10.8)

Note also that w p is an \(\mathcal {T}\)-adapted version of u p. Combining (5.8) and (5.10) of Lemma 3 gives

(10.9)

for K as in Lemma 3. As a result, for each interval J=[τ 0,τ 1] induced by the partition \(\mathcal {T}\), we have, using (10.2) and techniques similar to those used in the proof of Lemma 3,

where the last inequality is because of (10.9) and (10.8). Denoting L=|Z|⋅(2C μ+1)⋅∥r and summing over all J induced by the partition induced by \(\mathcal {T}\) gives

where we have used the following elementary claim: For any T,D>0,

$$\max_{\{\sum_{i=1}^n a_i = T, \forall i, 0 \leq a_i \leq D\}} \sum_{i=1}^n a_i^2 \leq\biggl\lceil\frac{T}{D} \biggr\rceil \cdot D^2 \leq T\cdot D + D^2, $$

where ⌈⋅⌉ denotes the rounding-up function. Indeed, in our case, \(\ell(J) \leq\ell(\mathcal {T})\) for each J induced by \(\mathcal {T}\). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Levy, Y. Continuous-Time Stochastic Games of Fixed Duration. Dyn Games Appl 3, 279–312 (2013). https://doi.org/10.1007/s13235-012-0067-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13235-012-0067-2

Keywords

Navigation