Continuous-Time Stochastic Games of Fixed Duration

Levy, Yehuda

doi:10.1007/s13235-012-0067-2

Continuous-Time Stochastic Games of Fixed Duration

Published: 19 December 2012

Volume 3, pages 279–312, (2013)
Cite this article

Dynamic Games and Applications Aims and scope Submit manuscript

Yehuda Levy¹

320 Accesses
12 Citations
Explore all metrics

Abstract

We study nonzero-sum continuous-time stochastic games, also known as continuous-time Markov games, of fixed duration. We concentrate on Markovian strategies. We show by way of example that equilibria need not exist in Markovian strategies, but they always exist in Markovian public-signal correlated strategies. To do so, we develop criteria for a strategy profile to be an equilibrium via differential inclusions, both directly and also by modeling continuous-time stochastic as differential games and using the Hamilton–Jacobi–Bellman equations. We also give an interpretation of equilibria in mixed strategies in continuous time and show that approximate equilibria always exist.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Zero-Sum Markov Games with Random State-Actions-Dependent Discount Factors: Existence of Optimal Strategies

Article 03 March 2018

Zero-Sum Stochastic Games

Notes

Rieder [31] works with general state spaces and compact action spaces.
As a function of time and state.
When the other players keep their strategies fixed.
I thank an anonymous referee for pointing out this work-in-progress. However, our techniques, unlike theirs, emphasizes also the appropriate convergence of strategies and not just of payoffs.
It is worth noting that in nonzero-sum differential games, the HJB equations are in general ill defined, in the sense that they do not possess unique solutions and the solutions do not depend in a continuous way on initial data; see the survey by Bressan [6]. The lack of uniqueness in our case is not surprising given the usual multiplicity of Nash equilibria; however, it is not known to what extent the solutions are “well behaved” in terms of dependence on initial data in our class of games, which, as we mentioned, can be viewed as a very special class of differential games.
A Markov transition matrix is a square matrix with nonnegative entries such that each row sums to unity.
Although the theorems there are stated for dynamics of the form $\frac{dx}{dt} = f(t,x)$ when f is continuous in both parameters, it is remarked there that one can derive similar results with minor changes when f satisfies only measurability in the time coordinate.
We thank an anonymous referee for pointing this reference out. The main application in Krasovskii and Subbotin [22] is differential games of evasion and pursuit; since such games are zero-sum games, both their construction and their use of these approximations are different from ours.
By definition, it is multilinear over $\varSigma(\mathcal {T})$.
The partitions need not be nested.
Where [0,T] has the Lebesgue σ-algebra, Ω has the Borel σ-algebra, and [0,T]×Ω has the induced product σ-algebra.
S is of full measure; σ can be defined arbitrarily outside of S.
$\mathcal {Q}$ must cover Ω only up a ν-null set; this will be useful later when we mention an example of such a partition which satisfies additional conditions.
The reason we discretize the signal space is because we will view a $\mathcal {T}\times \mathcal {Q}$-correlated strategy as a distribution over $\mathcal {T}\times \mathcal {Q}$-correlated pure strategies. If we did not discretize Ω, then the strategies for Player p in the time-discretized games would be distributions over mappings from $\mathcal {T}\times\varOmega$ to I ^p, and the former does not possess a “reasonable” Borel structure; see Aumann [2]. Alternatively, we could have turned the space of mappings from Ω into a standard Borel space by identifying maps that agree ν-a.e. and using the Borel structure induced by the weak-* topology.
X is convex in a Euclidian space and hence has a natural Lebesgue measure.
See, e.g., [4].
It is not clear whether the boundedness of the transitions rates in this case can be replaced with some integrability condition.
If we allow for r,μ to depend measurably on time, then we require a Carathéodory type of condition: for each fixed point in time, r,μ are continuous in the actions, and for each fixed action profile, r,μ are measurable in time.
For the L ²-induced operator norm, $\Vert A \cdot B\Vert _{L^{2}} \leq\Vert A\Vert _{L^{2}} \cdot \Vert B\Vert _{L^{2}}$, and all norms on a Euclidian space are equivalent.
v|_j induces the Markovian strategy $\hat {v}|_{j}$ in $[0,t_{m_{j}}]$.

References

Aubin JP (2009) Viability theory. Birkhauser, Boston
Book MATH Google Scholar
Aumann RJ (1961) Borel structures for function spaces. Ill J Math 5:614–630
MathSciNet MATH Google Scholar
Barron EN, Evans LC, Jensen R (1984) Viscosity solutions of Isaacs’ equation and differential games with Lipschitz controls. J Differ Equ 53:213–233
Article MathSciNet MATH Google Scholar
Bertsekas D (2005) Dynamic programming and optimal control, vol 1. Athena Scientific, Belmont
MATH Google Scholar
Buckdahn R, Li J, Quincampoix M (2012) Value function of differential games without Isaacs conditions. An approach with non-anticipative mixed strategies. Preprint
Bressan A (2011) Noncooperative differential games. Milan J Math 79:357–427
Article MathSciNet MATH Google Scholar
Castaing C, Valadier M (1977) Convex analysis and measurable multifunctions. Lecture notes in mathematics, vol 580. Springer, New York
Book MATH Google Scholar
Coddington E, Levinson N (1972) Theory of ordinary differential equations. McGraw-Hill, New York
Google Scholar
Crandall M, Lions P (1983) Viscosity solutions of Hamilton–Jacobi equations. Trans Am Math Soc 277:1–42
Article MathSciNet MATH Google Scholar
Deimling K (1992) Multivalued differential equations. Walter de Gruyer, Berlin
Book MATH Google Scholar
Frankowska H, Plaskacz S, Rzezuchowski T (1995) Measurable viability theorems and the Hamilton–Jacobi–Bellman equations. J Differ Equ 116:265–305
Article MathSciNet MATH Google Scholar
Friedman A (1971) Differential games. Pure and applied mathematics, vol 25. Wiley, New York
MATH Google Scholar
Guo X, Hernández-Lerma O (2003) Zero-sum games for continuous-time Markov chains with unbounded transitions and average payoff rates. J Appl Probab 40:327–345
Article MathSciNet MATH Google Scholar
Guo X, Hernández-Lerma O (2005) Nonzero-sum games for continuous-time Markov chains with unbounded discounted payoffs. J Appl Probab 42:303–320
Article MathSciNet MATH Google Scholar
Harsanyi JC, Selten R (1988) A general theory of equilibrium selection in games. MIT Press, Cambridge
MATH Google Scholar
Hellwig M, Leininger W (1988) Markov-perfect equilibrium in games of perfect information. Discussion paper A-183, University of Bonn
Himmelberg CJ (1975) Measurable relations. Fundam Math 87:53–72
MathSciNet MATH Google Scholar
Hirsch M, Smale S (1974) Differential equations, dynamical systems, and linear algebra. Academic Press, San Diego
MATH Google Scholar
Isaacs R (1965) Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization. Wiley, New York
MATH Google Scholar
Judd KL (1985) The law of large numbers with a continuum of IID random variables. J Econ Theory 35:19–25
Article MathSciNet MATH Google Scholar
Kohlberg E, Mertens JF (1986) On the strategic stability of equilibria. Econometrica 54:1003–1037
Article MathSciNet MATH Google Scholar
Krasovskii NN, Subbotin AI (1988) Game theoretical control problems. Springer, New York
Book MATH Google Scholar
Kuratowski K, Ryll-Nardzewski C (1965) A general theorem on selectors. Bull Pol Acad Sci, Math 13:379–403
MathSciNet Google Scholar
Levy Y (2012a) A discounted stochastic game with no stationary Nash equilibrium: the case of absolutely continuous transitions. DP #612, Center for the Study of Rationality, Hebrew University, Jerusalem
Levy Y (2012b) Continuous-time stochastic games of fixed duration. DP, Center for the Study of Rationality, Hebrew University, Jerusalem, to appear
Maskin E, Tirole J (2001) Markov perfect equilibrium. I. Observable actions. J Econ Theory 100:191–219
Article MathSciNet MATH Google Scholar
Miller B (1967) Finite state continuous-time Markov decision processes with applications to a class of optimization problems in queueing theory. Technical report 15, Stanford University
Miller B (1968) Finite state continuous time Markov decision processes with a finite planning horizon. SIAM J Control 6:266–280
Article MathSciNet MATH Google Scholar
Neyman A (2012) Continuous-time stochastic games. DP #616, Center for the Study of Rationality, Hebrew University, Jerusalem
Nowak AS, Raghavan TES (1992) Existence of stationary correlated equilibria with symmetric information for discounted stochastic games. Math Oper Res 17:519–526
Article MathSciNet MATH Google Scholar
Rieder U (1979) Equilibrium plans for non-zero-sum Markov games. In: Moeschlin O, Pallaschke D (eds) Game theory and related topics. North Holland, Amsterdam, pp 91–102
Google Scholar
Souquière A (2013) Nash equilibrium payoffs in mixed strategies. In: Cardaliaguet P, Cressman R (eds) Advanced in dynamic games, vol 13. Springer, New York
Google Scholar
Shapley L (1953) Stochastic games. Proc Natl Acad Sci USA 39:1095–1100
Article MathSciNet MATH Google Scholar
Yushkevich AA (1980) Controlled jump Markov models. Theory Probab Appl 25:244–266
Article MATH Google Scholar
Zachrisson LE (1964) Markov games. In: Dresher M, Shapley LS, Tucker AW (eds) Advances in game theory. Princeton University Press, Princeton, pp 211–253
Google Scholar

Download references

Acknowledgements

Research supported in part by Israel Science Foundation grants 1123/06 and 1596/10. Many thanks to A. Neyman and an anonymous referee for many useful comments.

Author information

Authors and Affiliations

Center for the Study of Rationality, and Department of Mathematics, The Hebrew University of Jerusalem, 91904, Jerusalem, Israel
Yehuda Levy

Authors

Yehuda Levy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yehuda Levy.

Appendix: Proofs Omitted from Sect. 5

Proof of Lemma 2

Note first that ∥A⋅B∥_∞≤C _∞∥A∥_∞⋅∥B∥_∞ for an appropriate C _∞>0 and for any two |Z|×|Z| matrices A,B.^{Footnote 19} Further note that ∥P _u∥_∞≤1 for any Markovian strategy u. Using these observations and (2.1) gives:

Therefore, an application of Gronwall’s inequality (see, e.g., Hirsch and Smale [18]) gives (5.6) for appropriate M>0. To prove (5.7), note that (3.1) shows that for each player $p \in \mathcal {P}$ and z∈Z,

and so enlarging M appropriately completes the proof of (5.7). □

Proof of Lemma 3

For the simplicity of notation, we will prove only (a); (b) is proved by an almost identical argument, albeit with more cumbersome notation. Let $\mathcal {T}= \{0 = t_{0} < \cdots< t_{k} < T \}$, fix τ satisfying $\ell(\mathcal {T}) < \tau\leq\min[T,2\ell(\mathcal {T})]$, and let 0=m ₀<m ₁<⋯<m _n=k+1 (recall that t _k+1=T) be such that $\tau< t_{m_{j}} - t_{m_{j-1}} < 2\tau$ for all 1≤j<n; such a selection is possible. Denote

$$D_j := \sup_{t \in\{ t_0, \dots, t_{m_j} \}} \bigl\Vert P_u(t) - P_w(t)\bigr\Vert _\infty. $$

Clearly D ₀=0. We wish to bound D _n. Fix 1≤j<n and let $\mathcal {T}|_{j} = \{ t_{0},\ldots,t_{m_{j} - 1} \}$ and $\mathcal {T}|_{j+} = \{t_{m_{j}},\ldots,t_{m_{j+1} - 1}\}$; these can be viewed as restrictions of the partition to $[0,t_{m_{j}}]$ and to $[t_{m_{j}},t_{m_{j+1}}]$, respectively. We will denote by v|_j, v|_j+ generic elements of the sets $S(\mathcal {T}|_{j}) \cong\prod_{p \in \mathcal {P}}(I^{p})^{Z \times \mathcal {T}|_{j}}$, $S(\mathcal {T}|_{j+}) \cong\prod_{p \in \mathcal {P}}(I^{p})^{Z \times \mathcal {T}|_{j+}}$, respectively. Note that each $w \in \varSigma(\mathcal {T})$ induces elements of $\varSigma(\mathcal {T}|_{j}) \cong\prod_{p \in \mathcal {P}}(\Delta(I^{p}))^{Z \times \mathcal {T}|_{j}}$, $\varSigma(\mathcal {T}|_{j+})\cong \prod_{p \in \mathcal {P}}(\Delta(I^{p}))^{Z \times \mathcal {T}|_{j+}}$ in the natural way, and this induced distribution is a product distribution on $\varSigma(\mathcal {T}|_{j+1}) = \varSigma(\mathcal {T}|_{j}) \otimes\varSigma(\mathcal {T}|_{j+})$. Note that for $t \leq t_{m_{j}}$, $P_{\hat{v}|_{j}}(t)$ is well defined,^{Footnote 20} and for $t_{m_{j}} \leq t < t_{m_{j+1}}$ and any z∈Z, $\hat{v}|_{j+}(z,t)$ is well defined, see (5.1). Also observe that by the definition of w, for any $t_{m_{j}} \leq p<q < t_{m_{j+1}}$, we have, for all z∈Z,

$$ \int _{t_p}^{t_q} \mu\bigl(u(z,s)\bigr)\,ds = \sum _{v|_{j+} \in \mathcal {T}|_{j+}} \int_{t_p}^{t_q} w[v|_{j+}]\cdot\mu\bigl(\hat{v}|_{j+}(z,s)\bigr)\,ds, $$

(10.1)

where we recall that a pure strategy $v \in S(\mathcal {T})$ induces a pure Markovian strategy by (5.1). Finally, we note that for any Markovian strategy ϕ, by (2.1),

$$ \bigl\Vert P_\phi(t) - P_\phi (s)\bigr\Vert _\infty\leq \int_s^t \bigl\Vert P_\phi(s)\mu\bigl(\phi(s)\bigr)\bigr\Vert _\infty \,ds \leq C_\infty \Vert \mu\Vert _\infty|t-s|, $$

(10.2)

where C _∞ is as in the proof of Lemma 2. Hence, letting m _j≤p<m _j+1 and denoting $\tau_{0} = t_{m_{j}}$, τ=t _p−τ ₀, we have

Therefore,

(10.3)

(10.4)

By (10.2), the first two terms together are bounded by C _∞(∥μ∥_∞)² τ ². As for the second term, note that for 0≤s≤τ,

Using (10.1), we have

(10.5)

Hence, (10.3) and (10.5), together with $\tau< t_{m_{j+1}} - t_{m_{j}} \leq2\ell(\mathcal {T})$ for all 0≤j<n, imply

$$D_{j+1} \leq D_j + 4 C_\infty\bigl(\Vert \mu\Vert _\infty\bigr)^2 \bigl(\ell(\mathcal {T})\bigr)^2 + 2 D_j \cdot \ell(\mathcal {T}) \cdot C_\infty\Vert \mu\Vert _\infty $$

with initial condition D ₀=0. Denoting $A = 1 + 2 \ell(\mathcal {T}) \cdot C_{\infty}\Vert \mu\Vert _{\infty}\geq1$ and $B = 4 C_{\infty}(\Vert \mu\Vert _{\infty})^{2} (\ell(\mathcal {T}))^{2} > 0$, we see that D _j+1≤A⋅D _j+B with D ₀=0, and therefore,

$$D_n \leq B \cdot\bigl(1 + A + \cdots+ A^{n-1}\bigr) = B \frac{A^{n} - 1}{A - 1}. $$

We have $n \cdot\ell(\mathcal {T}) < T$, where recall that $t_{m_{n}} = T$, and therefore $n < \frac{T}{\ell(\mathcal {T})}$. Hence,

$$A^n = \bigl(1 + \ell(\mathcal {T}) \cdot C_\infty\Vert \mu\Vert _\infty\bigr)^{n} \leq\bigl(1 + 2\cdot\ell(\mathcal {T}) C_\infty\Vert \mu\Vert _\infty\bigr)^{\frac{T}{\ell(\mathcal {T})}} \leq e^{2 T C_\infty\Vert \mu\Vert _\infty}. $$

Since $A - 1 = \ell(\mathcal {T}) \cdot C_{\infty}\Vert \mu\Vert _{\infty}$, we have for some C′>0 independent of $\ell(\mathcal {T})$,

$$\ell(\mathcal {T}) \frac{A^{n} - 1}{A - 1} \leq C', $$

and therefore,

$$D_n \leq\frac{B}{\ell(\mathcal {T})}C' = 4 C_\infty\bigl(\Vert \mu\Vert _\infty\bigl)^2 C' \ell (\mathcal {T}) \leq8 C_\infty\bigl(\Vert \mu\Vert _\infty\bigr)^2 C' \ell( \mathcal {T}); $$

this gives, denoting C=8C _∞(∥μ∥_∞)² C′,

$$ D_n \leq C\cdot\ell( \mathcal {T}). $$

(10.6)

Finally, from (10.2) we have

which completes the proof of (5.8), with K=C+4C _∞∥μ∥_∞. Now we prove (5.9). Fix 0≤j<n and denote $\tau= t_{m_{j+1}} - t_{m_{j}}$. Using (5.8) and techniques similar to those above, we have

Summing over j=0,…,n−1 and recalling that $n < \frac{T}{\ell(\mathcal {T})}$, $\tau\leq2\ell(\mathcal {T})$ gives

Enlarging K accordingly gives (5.9). □

Proof of Lemma 4

First, we prove

$$\sup_{u^p \in \mathfrak {A}^p} \gamma^p_{u^p,w^{-p}}(z) \leq \sup_{w^p \in \varSigma^p(\mathcal {T})} \gamma^p_{w^p,w^{-p}}(z) + C \cdot\ell(\mathcal {T}) $$

for appropriate C>0. Given $w^{p} \in\varSigma^{p}(\mathcal {T})$, set $u = \hat{w}$; by parts (a) and (b) of Lemma 3,

Next we prove, that for all z∈Z,

$$\sup_{w^p \in\varSigma^p(\mathcal {T})} \gamma^p_{w^p,w^{-p}}(z) \leq \sup_{u^p \in \mathfrak {A}^p} \gamma^p_{u^p,w^{-p}}(z) + C \cdot\ell(\mathcal {T}) $$

for appropriate C>0. We will actually prove a bit more. We show that it suffices to take only pure Markovian strategies in the supremum on the right-hand side. Fix $w^{p} \in\varSigma^{p}(\mathcal {T}),w^{-p} \in \varSigma^{-p}(\mathcal {T})$. It suffices to show that there is a pure Markovian strategy u ^p satisfying

$$ \bigl\Vert \gamma^p_{w^p,w^{-p}} - \gamma^p_{u^p,w^{-p}}\bigr\Vert _\infty \leq C \cdot \ell(\mathcal {T}) $$

(10.7)

for some C>0 that is independent of w ^p or of $\mathcal {T}$. For each interval J=[t _j,t _j+1] induced by $\mathcal {T}$, divide J into subintervals (some of which may be degenerate) $I_{1},\ldots,I_{|I^{p}|}$, with ℓ(I _a)=ℓ(J)⋅w ^p(t _j)[a], where ℓ(⋅) will be used to denote the length of an interval, and a denotes the ath element of I ^p in some ordering. Then define u ^p(t)[a]=1 if t∈I _a. Note then that by the definition of u ^p and denoting w=(w ^p,w ^−p), we have for any interval J=[τ ₀,τ ₁] in the partition induced by $\mathcal {T}$,

(10.8)

Note also that w ^p is an $\mathcal {T}$-adapted version of u ^p. Combining (5.8) and (5.10) of Lemma 3 gives

(10.9)

for K as in Lemma 3. As a result, for each interval J=[τ ₀,τ ₁] induced by the partition $\mathcal {T}$, we have, using (10.2) and techniques similar to those used in the proof of Lemma 3,

where the last inequality is because of (10.9) and (10.8). Denoting L=|Z|⋅(2C _∞∥μ∥_∞+1)⋅∥r∥_∞ and summing over all J induced by the partition induced by $\mathcal {T}$ gives

where we have used the following elementary claim: For any T,D>0,

$$\max_{\{\sum_{i=1}^n a_i = T, \forall i, 0 \leq a_i \leq D\}} \sum_{i=1}^n a_i^2 \leq\biggl\lceil\frac{T}{D} \biggr\rceil \cdot D^2 \leq T\cdot D + D^2, $$

where ⌈⋅⌉ denotes the rounding-up function. Indeed, in our case, $\ell(J) \leq\ell(\mathcal {T})$ for each J induced by $\mathcal {T}$. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Levy, Y. Continuous-Time Stochastic Games of Fixed Duration. Dyn Games Appl 3, 279–312 (2013). https://doi.org/10.1007/s13235-012-0067-2

Download citation

Published: 19 December 2012
Issue Date: June 2013
DOI: https://doi.org/10.1007/s13235-012-0067-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Continuous-Time Stochastic Games of Fixed Duration

Abstract

Access this article

Similar content being viewed by others