Upper and Lower Values in Zero-Sum Stochastic Games with Asymmetric Information

Kartik, Dhruva; Nayyar, Ashutosh

doi:10.1007/s13235-020-00364-x

Upper and Lower Values in Zero-Sum Stochastic Games with Asymmetric Information

Published: 27 July 2020

Volume 11, pages 363–388, (2021)
Cite this article

Dynamic Games and Applications Aims and scope Submit manuscript

425 Accesses
7 Citations
Explore all metrics

A Correction to this article was published on 16 September 2020

This article has been updated

Abstract

A general model for zero-sum stochastic games with asymmetric information is considered. In this model, each player’s information at each time can be divided into a common information part and a private information part. Under certain conditions on the evolution of the common and private information, a dynamic programming characterization of the value of the game (if it exists) is presented. If the value of the zero-sum game does not exist, then the dynamic program provides bounds on the upper and lower values of the game.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Zero-Sum Stochastic Games

Change history

16 September 2020
A Correction to this paper has been published: https://doi.org/10.1007/s13235-020-00366-9

Notes

Note that we do not impose Assumption 2 of [26].
For example, see the delayed sharing information structure in Sect. 2.2.
Our notation is different from [23, Section IV.3].
We refer the reader to [23, Chapter III] for a detailed discussion on the universal belief space.
Note that the belief ${{\mathbb {P}}}^{({\tilde{\chi }}^1,{\tilde{\chi }}^2)}[x_t, p_t^{1:2} \mid c_t,\gamma _{1:t-1}^{1:2}] = {{\mathbb {P}}}^{({\tilde{\chi }}^1,{\tilde{\chi }}^2)}[x_t, p_t^{1:2} \mid c_t,\gamma _{1:t}^{1:2}]$ because $\gamma _t^i = {\tilde{\chi }}_t^i( c_t,\gamma _{1:t-1}^{1:2})$, $i=1,2$.

References

Alpcan T, Başar T (2010) Network security: a decision and game-theoretic approach. Cambridge University Press, Cambridge
Book Google Scholar
Amin S, Litrico X, Sastry S, Bayen AM (2012) Cyber security of water scada systemspart I: analysis and experimentation of stealthy deception attacks. IEEE Trans Control Syst Technol 21(5):1963–1970
Article Google Scholar
Amin S, Schwartz GA, Cardenas AA, Sastry SS (2015) Game-theoretic models of electricity theft detection in smart utility networks: providing new capabilities with advanced metering infrastructure. IEEE Control Syst Mag 35(1):66–81
Article MathSciNet Google Scholar
Aumann RJ, Maschler M, Stearns RE (1995) Repeated games with incomplete information. MIT Press, Cambridge
Google Scholar
Başar T (1981) On the saddle-point solution of a class of stochastic differential games. J Optim Theory Appl 33(4):539–556
Article MathSciNet Google Scholar
Basar T, Olsder GJ (1999) Dynamic noncooperative game theory, vol 23. SIAM, Philadelphia
MATH Google Scholar
Bondi E, Oh H, Xu H, Fang F, Dilkina B, Tambe M (2019) Using game theory in real time in the real world: A conservation case study. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp. 2336–2338. International Foundation for Autonomous Agents and Multiagent Systems
Fang F, Nguyen TH, Pickles R, Lam WY, Clements GR, An B, Singh A, Schwedock BC, Tambe M, Lemieux A (2017) Paws-a deployed game-theoretic application to combat poaching. AI Mag 38(1):23–36
Google Scholar
Fang F, Stone P, Tambe M (2015) When security games go green: Designing defender strategies to prevent poaching and illegal fishing. In: Twenty-fourth international joint conference on artificial intelligence
Filar J, Vrieze K (2012) Competitive Markov decision processes. Springer, Berlin
MATH Google Scholar
Fudenberg D, Tirole J (1991) Game theory. MIT Press, Cambridge
MATH Google Scholar
Gensbittel F, Oliu-Barton M, Venel X (2014) Existence of the uniform value in zero-sum repeated games with a more informed controller. J Dyn Games 1(3):411–445
Article MathSciNet Google Scholar
Gensbittel F, Renault J (2015) The value of Markov chain games with incomplete information on both sides. Math Oper Res 40(4):820–841
Article MathSciNet Google Scholar
Hansen EA (1998) Solving pomdps by searching in policy space. In: Proceedings of the fourteenth conference on Uncertainty in artificial intelligence, pp 211–219
Haywood O Jr (1954) Military decision and game theory. J Oper Res Soc Am 2(4):365–385
MATH Google Scholar
Hernández-Lerma O, Lasserre JB (2012) Discrete-time Markov control processes: basic optimality criteria, vol 30. Springer, Berlin
MATH Google Scholar
Kartik D, Nayyar A (2019) Stochastic zero-sum games with asymmetric information. In: 58th IEEE conference on decision and control. IEEE
Kartik D, Nayyar A (2019) Zero-sum stochastic games with asymmetric information. arXiv preprint arXiv:1909.01445
Li L, Shamma J (2014) LP formulation of asymmetric zero-sum stochastic games. In: 53rd IEEE conference on decision and control, pp. 1930–1935. IEEE
Li X, Venel X (2016) Recursive games: uniform value, tauberian theorem and the mertens conjecture. Int J Game Theory 45(1–2):155–189
Article Google Scholar
Loch J, Singh SP (1998) Using eligibility traces to find the best memoryless policy in partially observable markov decision processes. In: Proceedings of the fifteenth international conference on machine learning, pp. 323–331
Maschler M, Solan E, Zamir S (2013) Game theory. Cambridge University Press. https://doi.org/10.1017/CBO9780511794216
Mertens JF, Sorin S, Zamir S (2015) Repeated games, vol 55. Cambridge University Press, Cambridge
Book Google Scholar
Morrow JD (1994) Game theory for political scientists. Princeton University Press, Princeton
Google Scholar
Nayyar A, Gupta A (2017) Information structures and values in zero-sum stochastic games. In: American control conference (ACC), 2017, pp. 3658–3663. IEEE
Nayyar A, Gupta A, Langbort C, Başar T (2014) Common information based Markov perfect equilibria for stochastic games with asymmetric information: finite games. IEEE Trans Autom Control 59(3):555–570
Article MathSciNet Google Scholar
Nayyar A, Mahajan A, Teneketzis D (2010) Optimal control strategies in delayed sharing information structures. IEEE Trans Autom Control 56(7):1606–1620
Article MathSciNet Google Scholar
Nayyar A, Mahajan A, Teneketzis D (2013) Decentralized stochastic control with partial history sharing: a common information approach. IEEE Trans Autom Control 58(7):1644–1658
Article MathSciNet Google Scholar
Osborne MJ, Rubinstein A (1994) A course in game theory. MIT Press, Cambridge
MATH Google Scholar
Ouyang Y, Tavafoghi H, Teneketzis D (2017) Dynamic games with asymmetric information: common information based perfect bayesian equilibria and sequential decomposition. IEEE Trans Autom Control 62(1):222–237
Article MathSciNet Google Scholar
Ponssard JP, Sorin S (1980) The lp formulation of finite zero-sum games with incomplete information. Int J Game Theory 9(2):99–105
Article MathSciNet Google Scholar
Renault J (2006) The value of Markov chain games with lack of information on one side. Math Oper Res 31(3):490–512
Article MathSciNet Google Scholar
Renault J (2012) The value of repeated games with an informed controller. Math Oper Res 37(1):154–179
Article MathSciNet Google Scholar
Rosenberg D (1998) Duality and markovian strategies. Int J Game Theory 27(4):577–597
Article MathSciNet Google Scholar
Rosenberg D, Solan E, Vieille N (2004) Stochastic games with a single controller and incomplete information. SIAM J Control Optim 43(1):86–110
Article MathSciNet Google Scholar
Rudin W et al (1964) Principles of mathematical analysis, vol 3. McGraw-hill, New York
MATH Google Scholar
Sandell NR Jr (1974) Control of Finite-State, Finite-Memory Stochastic Systems, Massachusetts Institute of Technology, PhD Thesis. https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19740018646.pdf
Shelar D, Amin S (2017) Security assessment of electricity distribution networks under der node compromises. IEEE Trans Control Netw Syst 4(1):23–36. https://doi.org/10.1109/TCNS.2016.2598427
Article MathSciNet MATH Google Scholar
Teneketzis D (2006) On the structure of optimal real-time encoders and decoders in noisy communication. IEEE Trans Inf Theory 52(9):4017–4035
Article MathSciNet Google Scholar
Vasal D, Sinha A, Anastasopoulos A (2019) A systematic process for evaluating structured perfect bayesian equilibria in dynamic games with asymmetric information. IEEE Trans Autom Control 64(1):78–93
Article MathSciNet Google Scholar
Washburn A, Wood K (1995) Two-person zero-sum games for network interdiction. Oper Res 43(2):243–251
Article MathSciNet Google Scholar
Wu M, Amin S (2018) Securing infrastructure facilities: When does proactive defense help? Dyn Games Appl 9:1–42
MathSciNet MATH Google Scholar
Zheng J, Castañón DA (2013) Decomposition techniques for Markov zero-sum games with nested information. In: 52nd IEEE conference on decision and control, pp. 574–581. IEEE
Zhu Q, Basar T (2015) Game-theoretic methods for robustness, security, and resilience of cyberphysical control systems: games-in-games principle for optimal cross-layer resilient control systems. IEEE Control Syst Mag 35(1):46–65
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, 90007, USA
Dhruva Kartik & Ashutosh Nayyar

Authors

Dhruva Kartik
View author publications
You can also search for this author in PubMed Google Scholar
Ashutosh Nayyar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dhruva Kartik.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Preliminary version of this paper appears in the proceedings of the 58th Conference on Decision and Control (CDC), 2019 [17]

Appendices

Proof of Lemma 1

It was shown in [26] that there exist bijective mappings ${\mathcal {M}}^i: {\mathcal {G}}^i \rightarrow {\mathcal {H}}^i$, $i =1,2,$ such that for every $g^1 \in {\mathcal {G}}^1$ and $g^2 \in {\mathcal {G}}^2$, we have

$$\begin{aligned} J(g^1,g^2) = {{\mathcal {J}}}({{\mathcal {M}}}^1(g^1),{{\mathcal {M}}}^2(g^2)). \end{aligned}$$

(39)

Therefore, for any strategy $g^1 \in {\mathcal {G}}^1$, we have

$$\begin{aligned} \sup _{g^2\in {\mathcal {G}}^2}J(g^1,g^2)&= \sup _{g^2\in {\mathcal {G}}^2}{\mathcal {J}}({\mathcal {M}}^1(g^1),{\mathcal {M}}^2(g^2)) \end{aligned}$$

(40)

$$\begin{aligned}&= \sup _{\chi ^2\in {\mathcal {H}}^2}{\mathcal {J}}({\mathcal {M}}^1(g^1),\chi ^2). \end{aligned}$$

(41)

Consequently,

$$\begin{aligned} \inf _{g^1\in {\mathcal {G}}^1}\sup _{g^2\in {\mathcal {G}}^2}J(g^1,g^2)&= \inf _{g^1\in {\mathcal {G}}^1}\sup _{\chi ^2\in {\mathcal {H}}^2}{{\mathcal {J}}}({{\mathcal {M}}}^1(g^1),\chi ^2) \end{aligned}$$

(42)

$$\begin{aligned}&= \inf _{\chi ^1\in {\mathcal {H}}^1}\sup _{\chi ^2\in {\mathcal {H}}^2}{{\mathcal {J}}}(\chi ^1,\chi ^2). \end{aligned}$$

(43)

This implies that $S^u({\mathscr {G}}) = S^u({\mathscr {G}}_v)$. We can similarly prove that $S^l({\mathscr {G}}) = S^l({\mathscr {G}}_v)$.

Remark 6

We can also show that a strategy profile $(g^1,g^2)$ is a Nash equilibrium in game ${\mathscr {G}}$ if and only if $({\mathcal {M}}^1(g^1),{\mathcal {M}}^2(g^2))$ is a Nash equilibrium in game ${\mathscr {G}}_v$.

Proof of Lemma 2

Let us consider the evolution of the virtual game ${\mathscr {G}}_v$ under the strategy profile $(\chi ^1,\chi ^2)$ and the expanded virtual game ${\mathscr {G}}_e$ under the strategy profile $({\tilde{\chi }}^1,{\tilde{\chi }}^2)$. Let the primitive variables and the randomization variables $K_t^i$ in both games be identical. The variables such as the state, action and information variables in the expanded game ${\mathscr {G}}_e$ are distinguished from those in the virtual game ${\mathscr {G}}_v$ by means of a tilde. For instance, $X_t$ is the state in game ${\mathscr {G}}_v$ and ${\tilde{X}}_t$ is the state in game ${\mathscr {G}}_e$.

We will prove by induction that the system evolution in both these games is identical over the entire horizon. This is clearly true at the end of time $t=1$ because the state, observations and the common and private information variables are identical in both games. Moreover, since ${\chi }^i = \varrho ^i({{\tilde{\chi }}}^1,{{\tilde{\chi }}}^2)$, $i=1,2$, the strategies $\chi ^i_1$ and ${\tilde{\chi }}^i_1$ are identical by definition (see Definition 2). Thus, the prescriptions and actions at $t=1$ are also identical.

For induction, assume that the system evolution in both games is identical until the end of time t. Then,

$$\begin{aligned} {X}_{t+1} = f_t({X}_t, {U}_t^{1:2},{W}_t^s) = f_t(\tilde{{X}}_t, \tilde{{U}}_t^{1:2},{W}_t^s) = \tilde{{X}}_{t+1}. \end{aligned}$$

Using Eqs. (2), (4) and (3), we can similarly argue that ${Y}_{t+1}^i = \tilde{{Y}}_{t+1}^i$, ${P}_{t+1}^i = \tilde{{P}}_{t+1}^i$ and ${C}_{t+1} = \tilde{{C}}_{t+1}$. Since ${\chi }^i = \varrho ^i({{\tilde{\chi }}}^1,{{\tilde{\chi }}}^2)$, we also have

$$\begin{aligned} {\tilde{\varGamma }}_{t+1}^i&= {\tilde{\chi }}_{t+1}^i(\tilde{{C}}_{t+1},{\tilde{\varGamma }}_{1:t}^{1:2}) {\mathop {=}\limits ^{a}} \chi _{t+1}^i(\tilde{{C}}_{t+1}) {\mathop {=}\limits ^{b}} \varGamma _{t+1}^i. \end{aligned}$$

(44)

Here, equality (a) follows from the construction of the mapping $\varrho ^i$ (see Definition 2) and equality (b) follows from the fact that $C_{t+1} = {\tilde{C}}_{t+1}$. Further,

$$\begin{aligned} {U}_{t+1}^i = \kappa (\varGamma _{t+1}^i({P}_{t+1}^i),{K}_{t+1}^i)&= \kappa ({\tilde{\varGamma }}_{t+1}^i(\tilde{{P}}_{t+1}^i),{K}_{t+1}^i) \end{aligned}$$

(45)

$$\begin{aligned}&= \tilde{{U}}_{t+1}^i. \end{aligned}$$

(46)

Thus, by induction, the hypothesis is true for every $1\le t \le T$. This proves that the virtual and expanded games have identical dynamics under strategy profiles $({\chi }^1,\chi ^2)$ and $(\tilde{{\chi }}^1,{\tilde{\chi }}^2)$.

Since the virtual and expanded games have the same cost structure, having identical dynamics ensures that strategy profiles $({\chi }^1,\chi ^2)$ and $(\tilde{{\chi }}^1,{\tilde{\chi }}^2)$ have the same expected cost in games ${\mathscr {G}}_v$ and ${\mathscr {G}}_e$, respectively. Therefore, ${\mathcal {J}}({\chi }^1,\chi ^2) = {{\mathcal {J}}}(\tilde{{\chi }}^1,{\tilde{\chi }}^2)$.

Proof of Theorem 1

For any strategy $\chi ^1 \in {\mathcal {H}}^1$, we have

$$\begin{aligned} \sup _{{\tilde{\chi }}^2 \in {\tilde{{\mathcal {H}}}}^2}{\mathcal {J}}({\chi }^1,{\tilde{\chi }}^2) \ge \sup _{{\chi }^2 \in {{\mathcal {H}}}^2}{\mathcal {J}}({\chi }^1,{\chi }^2), \end{aligned}$$

(47)

because ${\mathcal {H}}^2 \subseteq {\tilde{{\mathcal {H}}}}^2$. Further,

$$\begin{aligned} \sup _{{\tilde{\chi }}^2 \in {\tilde{{\mathcal {H}}}}^2}{\mathcal {J}}({\chi }^1,{\tilde{\chi }}^2)&=\sup _{{\tilde{\chi }}^2 \in {\tilde{{\mathcal {H}}}}^2}{\mathcal {J}}(\varrho ^1(\chi ^1,{\tilde{\chi }}^2),\varrho ^2(\chi ^1,{\tilde{\chi }}^2)). \end{aligned}$$

(48)

$$\begin{aligned}&= \sup _{{\tilde{\chi }}^2 \in {\tilde{{\mathcal {H}}}}^2}{\mathcal {J}}({\chi }^1,\varrho ^2(\chi ^1,{\tilde{\chi }}^2)) \end{aligned}$$

(49)

$$\begin{aligned}&\le \sup _{{\chi }^2 \in {{\mathcal {H}}}^2}{\mathcal {J}}({\chi }^1,{\chi }^2), \end{aligned}$$

(50)

where the first equality is due to Lemma 2, the second equality is because $\varrho ^1(\chi ^1,{\tilde{\chi }}^2) = \chi ^1$ and the last inequality is due to the fact that $\varrho ^2(\chi ^1,{\tilde{\chi }}^2) \in {{\mathcal {H}}}^2$ for any ${\tilde{\chi }}^2 \in {\tilde{{\mathcal {H}}}}^2$.

Combining (47) and (50), we obtain that

$$\begin{aligned} \sup _{{\chi }^2 \in {{\mathcal {H}}}^2}{\mathcal {J}}({\chi }^1,{\chi }^2) = \sup _{{\tilde{\chi }}^2 \in {\tilde{{\mathcal {H}}}}^2}{\mathcal {J}}({\chi }^1,{\tilde{\chi }}^2). \end{aligned}$$

(51)

Now,

$$\begin{aligned} S^u({\mathscr {G}}_e)&{:=}\inf _{{\tilde{\chi }}^1 \in {\tilde{{\mathcal {H}}}}^1}\sup _{{\tilde{\chi }}^2 \in {\tilde{{\mathcal {H}}}}^2}{\mathcal {J}}({\tilde{\chi }}^1,{\tilde{\chi }}^2) \end{aligned}$$

(52)

$$\begin{aligned}&\le \inf _{{\chi }^1 \in {{\mathcal {H}}}^1}\sup _{{\tilde{\chi }}^2 \in {\tilde{{\mathcal {H}}}}^2}{\mathcal {J}}({\chi }^1,{\tilde{\chi }}^2) \end{aligned}$$

(53)

$$\begin{aligned}&= \inf _{{\chi }^1 \in {{\mathcal {H}}}^1}\sup _{{\chi }^2 \in {{\mathcal {H}}}^2}{\mathcal {J}}({\chi }^1,{\chi }^2), \end{aligned}$$

(54)

$$\begin{aligned}&=: S^u({\mathscr {G}}_v). \end{aligned}$$

(55)

where inequality (53) is true since ${\mathcal {H}}^1 \subseteq {\tilde{{\mathcal {H}}}}^1$ and the equality in (54) follows from (51). Therefore, $S^u({\mathscr {G}}_e) \le S^u({\mathscr {G}}_v)$. We can use similar arguments to show that $S^l({\mathscr {G}}_v) \le S^l({\mathscr {G}}_e).$

Proof of Lemma 3

We begin with defining the following transformations for each time t. Recall that ${{\mathcal {S}}}_t$ is the set of all possible common information beliefs at time t and ${{\mathcal {B}}}_t^i$ is the prescription space for virtual player i at time t.

Definition 5

(i)
Let $P_t^j: {{\mathcal {S}}}_t \times {{\mathcal {B}}}_t^1 \times {{\mathcal {B}}}_t^2 \rightarrow \varDelta ({{\mathcal {Z}}}_{t+1} \times {{\mathcal {X}}}_{t+1} \times {\mathcal {P}}_{t+1}^1 \times {\mathcal {P}}_{t+1}^2)$ be defined as
$$\begin{aligned}&P_t^j(\pi _t,\gamma _t^{1:2}; z_{t+1}, x_{t+1},p_{t+1}^{1:2}) \end{aligned}$$
(56)
$$\begin{aligned}&\quad {:=} \sum _{{x}_t,{p}^{1:2}_t,{u}_t^{1:2}}\pi _t({x}_t,{p}_t^{1:2})\gamma _t^1({p}_t^1;u_t^1)\gamma _t^2( {p}_t^2; u_t^2){{\mathbb {P}}}[{x}_{t+1}, {p}_{t+1}^{1:2},{z}_{t+1} \mid {x}_t,{p}_t^{1:2},{u}_t^{1:2}]. \end{aligned}$$
(57)
We will use $P_t^j(\pi _t,\gamma _t^{1:2})$ as a shorthand for the probability distribution $P_t^j(\pi _t,\gamma _t^{1:2}; \cdot , \cdot , \cdot )$. The distribution $P_t^j(\pi _t,\gamma _t^{1:2})$ can be viewed as a joint distribution over the variables $Z_{t+1},X_{t+1}, P_{t+1}^{1:2}$ if the distribution on $X_t, P^{1:2}_t$ is $\pi _t$ and prescriptions $\gamma ^{1:2}_t$ are chosen by the virtual players at time t.
(ii)
Let $P_t^m: {{\mathcal {S}}}_t \times {{\mathcal {B}}}_t^1 \times {{\mathcal {B}}}_t^2 \rightarrow \varDelta {{\mathcal {Z}}}_{t+1}$ be defined as
$$\begin{aligned} P_t^m(\pi _t,\gamma _t^{1:2}; z_{t+1}) = \sum _{x_{t+1},p_{t+1}^{1:2}} P_t^j(\pi _t,\gamma _t^{1:2}; z_{t+1}, x_{t+1},p_{t+1}^{1:2}). \end{aligned}$$
(58)
The distribution $P_t^m(\pi _t,\gamma _t^{1:2})$ is the marginal distribution of the variable $Z_{t+1}$ obtained from the joint distribution $P_t^j(\pi _t,\gamma _t^{1:2})$ defined above.
(iii)
Let $F_t: {{\mathcal {S}}}_t \times {{\mathcal {B}}}_t^1 \times {{\mathcal {B}}}_t^2 \times {{\mathcal {Z}}}_{t+1} \rightarrow {{\mathcal {S}}}_{t+1}$ be defined as
$$\begin{aligned} F_t(\pi _t,\gamma _t^{1:2},z_{t+1})= {\left\{ \begin{array}{ll} \frac{P_t^j(\pi _t,\gamma _t^{1:2};z_{t+1},\cdot ,\cdot )}{P_t^m(\pi _t,\gamma _t^{1:2};z_{t+1})} &{}\text {if } P_t^m(\pi _t,\gamma _t^{1:2};z_{t+1}) > 0\\ G_t(\pi _t,\gamma _t^{1:2},z_{t+1}) &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(59)
where $G_t: {{\mathcal {S}}}_t \times {{\mathcal {B}}}_t^1 \times {{\mathcal {B}}}_t^2 \times {{\mathcal {Z}}}_{t+1} \rightarrow {{\mathcal {S}}}_{t+1}$ can be any arbitrary measurable mapping. One such mapping is the one that maps every element $\pi _t,\gamma _t^{1:2},z_{t+1}$ to the uniform distribution over the finite space ${{\mathcal {X}}}_{t+1} \times {\mathcal {P}}_{t+1}^1 \times {\mathcal {P}}_{t+1}^2$.

Let the collection of transformations $F_t$ that can be constructed using the method described in Definition 5 be denoted by ${{\mathscr {B}}}$. Note that the transformations $P_t^j, P_t^m$ and $F_t$ do not depend on the strategy profile $({\tilde{\chi }}^1, {\tilde{\chi }}^2)$ because the term ${{\mathbb {P}}}[{x}_{t+1}, {p}_{t+1}^{1:2},{z}_{t+1} \mid {x}_t,{p}_t^{1:2},{u}_t^{1:2}]$ in (57) depends only on the system dynamics (see Eqs. (12–16)) and not on the strategy profile $({\tilde{\chi }}^1,{\tilde{\chi }}^2)$.

Consider a strategy profile $({\tilde{\chi }}^1, {\tilde{\chi }}^2)$. Note that the number of possible realizations of common information and prescription history under $({\tilde{\chi }}^1, {\tilde{\chi }}^2)$ is finite. Let $c_{t+1},\gamma _{1:t}^{1:2}$ be a realization of the common information and prescription history at time $t+1$ with nonzero probability of occurrence under $({\tilde{\chi }}^1, {\tilde{\chi }}^2)$. For this realization of virtual players’ information, the common information-based belief on the state and private information at time $t+1$ is given by

$$\begin{aligned}&\pi _{t+1}(x_{t+1},p_{t+1}^{1:2}) \nonumber \\&= {{\mathbb {P}}}^{({\tilde{\chi }}^1,{\tilde{\chi }}^2)}[X_{t+1} = {x}_{t+1}, P_{t+1}^{1:2} = {p}_{t+1}^{1:2} \mid {c}_{t+1},\gamma _{1:t}^{1:2}]\nonumber \\&= {{\mathbb {P}}}^{({\tilde{\chi }}^1,{\tilde{\chi }}^2)}[X_{t+1} = {x}_{t+1}, P_{t+1}^{1:2} = {p}_{t+1}^{1:2}\mid {c}_{t},\gamma _{1:t-1}^{1:2},{z}_{t+1},\gamma _t^{1:2}]\nonumber \\&= \frac{{{\mathbb {P}}}^{({\tilde{\chi }}^1,{\tilde{\chi }}^2)}[X_{t+1} = {x}_{t+1}, P_{t+1}^{1:2} = {p}_{t+1}^{1:2},Z_{t+1} = {z}_{t+1} \mid {c}_{t},\gamma _{1:t}^{1:2}]}{{{\mathbb {P}}}^{({\tilde{\chi }}^1,{\tilde{\chi }}^2)}[Z_{t+1} = {z}_{t+1}\mid {c}_t,\gamma _{1:t}^{1:2}]}. \end{aligned}$$

(60)

Notice that expression (60) is well defined; that is, the denominator is nonzero because of our assumption that the realization $c_{t+1},\gamma _{1:t}^{1:2}$ has nonzero probability of occurrence. Let us consider the numerator in expression (60). For convenience, we will denote it with ${{\mathbb {P}}}^{({\tilde{\chi }}^1,{\tilde{\chi }}^2)}[{x}_{t+1}, {p}_{t+1}^{1:2},{z}_{t+1} \mid {c}_{t},\gamma _{1:t}^{1:2}]$. We have

$$\begin{aligned}&{{\mathbb {P}}}^{({\tilde{\chi }}^1,{\tilde{\chi }}^2)}[{x}_{t+1}, {p}_{t+1}^{1:2},{z}_{t+1} \mid {c}_{t},\gamma _{1:t}^{1:2}]\nonumber \\&= \sum _{{x}_t,{p}^{1:2}_t,{u}_t^{1:2}}\pi _t({x}_t,{p}_t^{1:2})\gamma _t^1( {p}_t^1;{u}_t^1)\gamma _t^2( {p}_t^2; {u}_t^2){{\mathbb {P}}}^{({\tilde{\chi }}^1,{\tilde{\chi }}^2)}[{x}_{t+1}, {p}_{t+1}^{1:2},{z}_{t+1} \mid {c}_{t},\gamma _{1:t}^{1:2},{x}_t,{p}_t^{1:2},{u}_t^{1:2}] \end{aligned}$$

(61)

$$\begin{aligned}&= \sum _{{x}_t,{p}^{1:2}_t,{u}_t^{1:2}}\pi _t({x}_t,{p}_t^{1:2})\gamma _t^1({p}_t^1;u_t^1)\gamma _t^2( {p}_t^2; u_t^2){{\mathbb {P}}}[{x}_{t+1}, {p}_{t+1}^{1:2},{z}_{t+1} \mid {x}_t,{p}_t^{1:2},{u}_t^{1:2}] \end{aligned}$$

(62)

$$\begin{aligned}&= P_t^j(\pi _t,\gamma _t^{1:2};z_{t+1}, x_{t+1},p_{t+1}^{1:2}), \end{aligned}$$

(63)

where $\pi _t$ is the common information belief on $X_t, P_t^1,P_t^2$ at time t given the realization^{Footnote 5}$c_t,\gamma _{1:t-1}^{1:2}$ and $P_t^j$ is as defined in Definition 5. The equality in (62) is due to the structure of the system dynamics in game ${\mathscr {G}}_e$ described by Eqs. (12–16). Similarly, the denominator in (60) satisfies

$$\begin{aligned} 0 < {{\mathbb {P}}}^{({\tilde{\chi }}^1,{\tilde{\chi }}^2)}[{z}_{t+1} \mid {c}_{t},\gamma _{1:t}^{1:2}]&= \sum _{x_{t+1},p_{t+1}^{1:2}} P_t^j(\pi _t,\gamma _t^{1:2};z_{t+1}, x_{t+1},p_{t+1}^{1:2})\nonumber \\&= P_t^m(\pi _t,\gamma _t^{1:2};z_{t+1}), \end{aligned}$$

(64)

where $P_t^m$ is as defined is Definition 5. Thus, from Eq. (60), we have

$$\begin{aligned} \pi _{t+1} = \frac{{P}_t^j(\pi _t,\gamma _t^{1:2};z_{t+1},\cdot ,\cdot )}{P_t^m(\pi _t,\gamma _t^{1:2},z_{t+1})} = F_t(\pi _t, \gamma _t^{1:2};{z}_{t+1}), \end{aligned}$$

(65)

where $F_t$ is as defined in Definition 5. Since relation (65) holds for every realization $c_{t+1}, \gamma _{1:t}^{1:2}$ that has nonzero probability of occurrence under $({\tilde{\chi }}^1,{\tilde{\chi }}^2)$, we can conclude that the common information belief $\Pi _t$ evolves almost surely as

$$\begin{aligned} \Pi _{t+1} = F_t(\Pi _t, \varGamma _t^{1:2},{Z}_{t+1}), ~~ t \ge 1, \end{aligned}$$

(66)

under the strategy profile $({\tilde{\chi }}^1,{\tilde{\chi }}^2)$.

The expected cost at time t can be expressed as follows

$$\begin{aligned} {{\mathbb {E}}}^{(\tilde{{\chi }}^1,\tilde{{\chi }}^2)}[c_t(X_t,U_t^1,U_t^2)]&= {{\mathbb {E}}}^{(\tilde{{\chi }}^1,\tilde{{\chi }}^2)}[{{\mathbb {E}}}[c_t(X_t,U_t^1,U_t^2) \mid C_t,\varGamma _{1:t}^{1:2}]] \end{aligned}$$

(67)

$$\begin{aligned}&= {{\mathbb {E}}}^{(\tilde{{\chi }}^1,\tilde{{\chi }}^2)}[{\tilde{c}}_t(\Pi _t,\varGamma _t^1,\varGamma _t^2)], \end{aligned}$$

(68)

where the function ${\tilde{c}}_t$ is as defined in Eq. (22). Therefore, the total cost can be expressed as

$$\begin{aligned} {{\mathbb {E}}}^{(\tilde{{\chi }}^1,\tilde{{\chi }}^2)}&\left[ \sum _{t=1}^T c_t({X}_t,{U}_t^{1},{U}_t^2)\right] = {{\mathbb {E}}}^{(\tilde{{\chi }}^1,\tilde{{\chi }}^2)}\left[ \sum _{t=1}^T {\tilde{c}}_t(\Pi _t,\varGamma _t^1,\varGamma _t^2)\right] . \end{aligned}$$

(69)

Some Continuity Results

In this section, we will state and prove some technical results that will be useful for proving Lemma 4.

Let ${{\mathcal {S}}}_t$ denote the set of all probability distributions over the finite set ${{\mathcal {X}}}_{t} \times {{\mathcal {P}}}_{t}^1 \times {{\mathcal {P}}}_{t}^2$. Thus, ${{\mathcal {S}}}_t$ is the set of all possible common information-based beliefs at time t. Define

$$\begin{aligned} \bar{{{\mathcal {S}}}}_t {:=} \{\alpha \pi _t : 0\le \alpha \le 1, \pi _t \in {{\mathcal {S}}}_t\}. \end{aligned}$$

(70)

The functions ${\tilde{c}}_t$ in (22), $P_t^j$ in (56), $P_t^m$ in (58) and $F_t$ in (65) were defined for any $\pi _t \in {{\mathcal {S}}}_t$. We will extend the domain of the argument $\pi _t$ in these functions to $\bar{{{\mathcal {S}}}}_t$ as follows. For any $\gamma _t^i \in {{\mathcal {B}}}_t^i, i = 1,2$, $z_{t+1} \in {{\mathcal {Z}}}_{t+1}$, $0 \le \alpha \le 1$ and $\pi _t \in {{{\mathcal {S}}}}_t$, let

(i)
${\tilde{c}}_t(\alpha \pi _t,\gamma _t^1,\gamma _t^2) {:=} \alpha {\tilde{c}}_t(\pi _t,\gamma _t^1,\gamma _t^2)$
(ii)
$P^j_t(\alpha \pi _t,\gamma _t^{1:2}) {:=} \alpha P^j_t(\pi _t,\gamma _t^{1:2})$
(iii)
$P^m_t(\alpha \pi _t,\gamma _t^{1:2}) {:=} \alpha P^m_t(\pi _t,\gamma _t^{1:2})$
(iv)
$ F_t(\alpha \pi _t,\gamma _t^{1:2},z_{t+1}) {:=} {\left\{ \begin{array}{ll} F_t(\pi _t,\gamma _t^{1:2},z_{t+1}) &{}\text {if } \alpha > 0\\ \textit{\textbf{0}} &{} \text {if } \alpha = 0, \end{array}\right. }$

where $\textit{\textbf{0}}$ is a zero vector of size $|{{\mathcal {X}}}_{t} \times {{\mathcal {P}}}_{t}^1 \times {{\mathcal {P}}}_{t}^2|$.

Having extended the domain of the above functions, we can also extend the domain of the argument $\pi _t$ in the functions $w_t^u(\cdot ), w_t^l(\cdot ), V_t^u(\cdot ), V_t^l(\cdot )$ defined in the dynamic programs of Section 4.2. First, for any $0 \le \alpha \le 1$ and $\pi _{T+1} \in {{{\mathcal {S}}}}_{T+1}$, define $V^u_{T+1}(\alpha \pi _{T+1}) {:=}0$. We can then define the following functions for every $t \le T$ in a backward-inductive manner: For any $\gamma _t^i \in {{\mathcal {B}}}_t^i, i = 1,2$, $0 \le \alpha \le 1$ and $\pi _t \in {{{\mathcal {S}}}}_t$, let

$$\begin{aligned} w^u_t (\alpha \pi _t,\gamma _t^1,\gamma _t^2)&{:=} {\tilde{c}}_t(\alpha \pi _t,\gamma _t^1,\gamma _t^2) + \sum _{z_{t+1}}\big [P^m_t(\alpha \pi _t,\gamma _t^{1:2};z_{t+1})V^u_{t+1}(F_t(\alpha \pi _t,\gamma _t^{1:2},z_{t+1}))\big ] \end{aligned}$$

(71)

$$\begin{aligned} V^u_t(\alpha \pi _t)&{:=} \inf _{{\gamma }_t^1} \sup _{\gamma _t^2}w_t^u(\alpha \pi _t,\gamma _t^1,\gamma _t^2). \end{aligned}$$

(72)

Note that when $\alpha =1$, the above definition of $w^u_t$ is equal to the definition of $w^u_t$ in Eq. (26) of the dynamic program. We can similarly extend $w^l_t$ and $V^l_t$. These extended value functions satisfy the following homogeneity property. A similar result was shown in [19, Lemma III.1] for a special case of our model.

Lemma 5

For any constant $0 \le \alpha \le 1$ and any $\pi _t \in \bar{{{\mathcal {S}}}}_t$, we have $\alpha V_t^u(\pi _t) = V_t^u(\alpha \pi _t)$ and $\alpha V_t^l(\pi _t) = V_t^l(\alpha \pi _t)$.

Proof

The proof can be easily obtained from the above definitions of the extended functions.

The following lemmas will be used in “Appendix 6” to establish some useful properties of the extended functions.

Lemma 6

Let $V: \bar{{{\mathcal {S}}}}_{t+1} \rightarrow {{\mathbb {R}}}$ be a continuous function satisfying $V(\alpha \pi ) = \alpha V(\pi )$ for every $0 \le \alpha \le 1$ and $\pi \in \bar{{{\mathcal {S}}}}_{t+1}$. Define

$$\begin{aligned} V'(\pi _t, \gamma _t^1,\gamma _t^2) {:=} \sum _{z_{t+1}}P_t^m( \pi _t,\gamma _{t}^{1:2};z_{t+1})[V(F_t(\pi _{t},\gamma _t^{1:2},{z}_{t+1}))]. \end{aligned}$$

For a fixed $\gamma _t^1,\gamma _t^2$, $V'(\cdot ,\gamma _t^1,\gamma _t^2)$ is a function from $\bar{{{\mathcal {S}}}}_{t+1}$ to ${{\mathbb {R}}}$. Then, the family of functions

$$\begin{aligned} {{\mathscr {F}}}_1&{:=}\{V'(\cdot ,\gamma _t^1,\gamma _t^2): \gamma _t^i \in {{\mathcal {B}}}_t^i,i=1,2\} \end{aligned}$$

(73)

is equicontinuous. Similarly, the following families of functions

$$\begin{aligned} {{\mathscr {F}}}_2&{:=}\{V'(\pi _t,\cdot ,\gamma _t^2): \gamma _t^2 \in {{\mathcal {B}}}_t^2, \pi _t \in \bar{{{\mathcal {S}}}}_t\} \end{aligned}$$

(74)

$$\begin{aligned} {{\mathscr {F}}}_3&{:=}\{V'(\pi _t,\gamma _t^1,\cdot ): \gamma _t^1 \in {{\mathcal {B}}}_t^1, \pi _t \in \bar{{{\mathcal {S}}}}_t\} \end{aligned}$$

(75)

are equicontinuous in their respective arguments.

Proof

A continuous function is bounded and uniformly continuous over a compact domain (see Theorem 4.19 in [36]). Therefore, V is bounded and uniformly continuous over $\bar{{{\mathcal {S}}}}_{t+1}$.

Using the fact that $V(\alpha \pi ) = \alpha V(\pi )$ and the definition of $F_t$ in Definition 5, the function $V'$ can be written as

$$\begin{aligned} V'(&\pi _t,\gamma _t^1,\gamma _t^2) = \sum _{{z}_{t+1}}V\left( {{P}_t^j(\pi _t,\gamma _t^{1:2};{z}_{t+1},\cdot ,\cdot )}\right) . \end{aligned}$$

(76)

Recall that $P_t^j$ is trilinear in $\pi _t,\gamma _t^1$ and $\gamma _t^2$ with bounded coefficients for a fixed value of $z_{t+1}$ (see (56)). Therefore, for each ${z}_{t+1}$, $\{P^j_t(\cdot , \gamma ^1_t,\gamma ^2_t,{z}_{t+1})\}$ is an equicontinuous family of functions in the argument $\pi _t$, where $P^j_t(\pi _t, \gamma ^1_t,\gamma ^2_t,{z}_{t+1})$ is a shorthand notation for the measure $P^j_t(\pi _t, \gamma ^1_t,\gamma ^2_t,{z}_{t+1},\cdot ,\cdot )$ over the space ${{\mathcal {X}}}_{t+1} \times {\mathcal {P}}_{t+1}^1 \times {\mathcal {P}}_{t+1}^2$.

Also, since V is uniformly continuous, the family $\left\{ V\left( {{P}_t^j(\cdot ,\gamma _t^{1:2},{z}_{t+1})}\right) \right\} $ is equicontinuous in $\pi _t$ for each $z_{t+1}$. This is because composition with a uniformly continuous function preserves equicontinuity. Therefore, the family of functions ${{\mathscr {F}}}_1$ is equicontinuous in $\pi _t$. We can use similar arguments to prove equicontinuity of the other two families. $\square $

Lemma 7

Let $w: {{\mathcal {B}}}^1_t \times {{\mathcal {B}}}^2_t \rightarrow {{\mathbb {R}}}$ be a function such that (i) the family of functions $\{w(\cdot ,\gamma ^2): \gamma ^2 \in {{\mathcal {B}}}^2_t\}$ is equicontinuous in the first argument and (ii) the family of functions $\{w(\gamma ^1, \cdot ): \gamma ^1 \in {{\mathcal {B}}}^1_t\}$ is equicontinuous in the second argument. Then, $\sup _{\gamma ^2}w(\gamma ^1,\gamma ^2)$ is a continuous function of $\gamma ^1$ and, similarly, $\inf _{\gamma ^1}w(\gamma ^1,\gamma ^2)$ is a continuous function of $\gamma ^2$.

Proof

Let $\epsilon > 0$. For a given $\gamma ^1$, there exists a $\delta > 0$ such that

$$\begin{aligned} |w(\gamma ^1,\gamma ^2) - w(\gamma '^1,\gamma ^2)| \le \epsilon \quad \forall \gamma ^2, \forall ||\gamma ^1 - \gamma '^1|| \le \delta . \end{aligned}$$

(77)

Let ${\bar{\gamma }}^2$ be a prescription such that

$$\begin{aligned} w(\gamma ^1,{\bar{\gamma }}^2) = \sup _{\gamma ^2}w(\gamma ^1,\gamma ^2). \end{aligned}$$

(78)

Note that the existence of ${\bar{\gamma }}^2$ is guaranteed because of continuity of $w(\gamma ^1, \cdot )$ in the second argument and compactness of ${{\mathcal {B}}}^2_t$. Pick any $\gamma '^1$ satisfying $||\gamma ^1 - \gamma '^1|| \le \delta $. Let ${\bar{\gamma }}'^2$ be a prescription such that

$$\begin{aligned} w(\gamma '^1,{\bar{\gamma }}'^2) = \sup _{\gamma ^2}w(\gamma '^1,\gamma ^2). \end{aligned}$$

(79)

Using (77), we have

$$\begin{aligned} (i)~~w(\gamma ^1,{\bar{\gamma }}^2) - w(\gamma '^{1},{\bar{\gamma }}'^2)&\ge w(\gamma ^1,{\bar{\gamma }}'^2) - w(\gamma '^{1},{\bar{\gamma }}'^2)\nonumber \\&\ge -\epsilon , \end{aligned}$$

(80)

$$\begin{aligned} (ii)~~w(\gamma ^1,{\bar{\gamma }}^2) - w(\gamma '^{1},{\bar{\gamma }}'^2)&\le w(\gamma ^1,{\bar{\gamma }}^2) - w(\gamma '^{1},{\bar{\gamma }}^2)\nonumber \\&\le \epsilon . \end{aligned}$$

(81)

Equations (78)–(81) imply that $\sup _{\gamma ^2}w(\gamma ^1,\gamma ^2)$ is a continuous function of $\gamma ^1$. We can use a similar argument for showing continuity of $\inf _{\gamma ^1}w(\gamma ^1,\gamma ^2)$ in $\gamma ^2$. $\square $

Proof of Lemma 4

We first use the definitions of extensions of $w^u_t,w^l_t,V^u_t,V^l_t$ in “Appendix 5” and Lemmas 5 and 6 to establish the following equicontinuity result.

Lemma 8

The families of functions

$$\begin{aligned} {{\mathscr {F}}}_t^a&{:=}\{w_t^u(\cdot ,\gamma _t^1,\gamma _t^2): \gamma _t^i \in {{\mathcal {B}}}_t^i,i=1,2\} \end{aligned}$$

(82)

$$\begin{aligned} {{\mathscr {F}}}_t^b&{:=}\{w_t^u(\pi _t,\cdot ,\gamma _t^2): \gamma _t^2 \in {{\mathcal {B}}}_t^2, \pi _t \in \bar{{{\mathcal {S}}}}_t\} \end{aligned}$$

(83)

$$\begin{aligned} {{\mathscr {F}}}_t^c&{:=}\{w_t^u(\pi _t,\gamma _t^1,\cdot ): \gamma _t^1 \in {{\mathcal {B}}}_t^1, \pi _t \in \bar{{{\mathcal {S}}}}_t\} \end{aligned}$$

(84)

are all equicontinuous in their arguments for every $t \le T$. A similar statement holds for $w_t^l$.

Proof

We use a backward-induction argument for the proof. For induction, assume that $V_{t+1}^u$ is a continuous function for some $t \le T$. This is clearly true for $t=T$. Using the continuity of $V^u_{t+1}$, we will establish the statement of the lemma for time t and also prove the continuity of $V^u_t$. This establishes the lemma for all $t \le T$.

Equicontinuity of $w^u_t$: Since ${\tilde{c}}_t(\pi _t,\gamma _t^1,\gamma _t^2)$ is linear in $\pi _t$ with uniformly bounded coefficients for any given $\gamma _t^{1:2}$ (see (22)), it is equicontinuous in the argument $\pi _t$. In Lemma 5, we showed that the value functions $V_t^u$ satisfy the condition $V_t^u(\alpha \pi ) = \alpha V_t^u(\pi )$ for every $0 \le \alpha \le 1$, $\pi \in {{\mathcal {S}}}_{t}$. Further, due to our induction hypothesis, $V_{t+1}^u$ is continuous. Thus, using Lemma 6, the second term of $w_t^u$,

$$\begin{aligned} \sum _{z_{t+1}}P_t^m( \pi _t,\gamma _{t}^{1:2};z_{t+1})V_{t+1}^u(F_t(\pi _{t},\gamma _t^{1:2},{z}_{t+1})), \end{aligned}$$

is also equicontinuous in $\pi _t$. Hence, the family ${{\mathscr {F}}}_t^a$ is equicontinuous in $\pi _t$.

Continuity of $V^u_t$: Due to the equicontinuity of the family ${{\mathscr {F}}}_t^a$, we have the following. For any given $\epsilon > 0$ and $\pi _t \in \bar{{{\mathcal {S}}}}_t$, there exists a $\delta > 0$ such that

$$\begin{aligned} |w^u_t(\pi _t,\gamma _t^1,\gamma _t^2)-w^u_t(\pi '_t,\gamma _t^1,\gamma _t^2)| < \epsilon \end{aligned}$$

(85)

for every $\gamma _t^1, \gamma _t^2$ and $\pi _t'$ satisfying $||\pi _t - \pi '_t|| < \delta $. Therefore,

$$\begin{aligned}&w^u_t(\pi _t,\gamma _t^1,\gamma _t^2) < w^u_t(\pi '_t,\gamma _t^1,\gamma _t^2) + \epsilon \; \forall \gamma _t^1,\gamma _t^2 \end{aligned}$$

(86)

$$\begin{aligned} \implies&\sup _{\gamma _t^2}w^u_t(\pi _t,\gamma _t^1,\gamma _t^2) \le \sup _{\gamma _t^2}w^u_t(\pi '_t,\gamma _t^1,\gamma _t^2) + \epsilon \;\forall \gamma _t^1 \end{aligned}$$

(87)

$$\begin{aligned} \implies&\inf _{\gamma _t^1}\sup _{\gamma _t^2}w^u_t(\pi _t,\gamma _t^1,\gamma _t^2) \le \inf _{\gamma _t^1}\sup _{\gamma _t^2}w^u_t(\pi '_t,\gamma _t^1,\gamma _t^2) + \epsilon \nonumber \\ \implies&V^u_t(\pi _t) \le V^u_t(\pi '_t) + \epsilon , \end{aligned}$$

(88)

for every $\pi _t'$ that satisfies $||\pi _t - \pi '_t|| < \delta $. Similarly, $V^u_t(\pi _t) \ge V^u_t(\pi '_t) -\epsilon $ for every $\pi _t'$ that satisfies $||\pi _t - \pi '_t|| < \delta $. Therefore, $V^u_t(\pi _t)$ is continuous at $\pi _t$.

Hence, by induction, we can say that the family ${{\mathscr {F}}}_t^a$ is equicontinuous in $\pi _t$ for every $t \le T$. We can use similar arguments to prove the equicontinuity of the other families. $\square $

The continuity of $w^u_t$ established above implies that $\sup _{\gamma _t^2}w_t^u(\pi _t,\gamma _t^1,\gamma _t^2)$ is achieved for every $\pi _t, \gamma ^1_t.$ Further, Lemma 8 implies that $w_t^u$ and $w_t^l$ satisfy the equicontinuity conditions in Lemma 7 for any given realization of belief $\pi _t$. Therefore, we can use Lemma 7 to argue that $\sup _{\gamma _t^2}w_t^u(\pi _t,\gamma _t^1,\gamma _t^2)$ is continuous in $\gamma _t^1$. And since $\gamma _t^1$ lies in the compact space ${{\mathcal {B}}}_t^1$, a minmaximizer exists for the function $w_t^u$. Further, we can use the measurable selection condition (see Condition 3.3.2 in [16]) to prove the existence of measurable mapping $\varXi _t^1(\pi _t)$ as defined in Lemma 4. A similar argument can be made to establish the existence of a maxminimizer and a measurable mapping $\varXi _t^2(\pi _t)$ as defined in Lemma 4. This concludes the proof of Lemma 4.

Proof of Theorem 2

Let us first define a distribution ${\tilde{\Pi }}_t$ over the space ${{\mathcal {X}}}_t \times {\mathcal {P}}_t^1 \times {\mathcal {P}}_t^2$ in the following manner. The distribution ${\tilde{\Pi }}_t$, given $C_t,\varGamma _{1:t-1}^{1:2}$, is recursively obtained using the following relation

$$\begin{aligned} {\tilde{\Pi }}_1(x_1,p_1^1,p_1^2)&= {{\mathbb {P}}}[X_1 = x_1,P_1^1 = p_1^1,P_1^2 = p_1^2 \mid C_1 ] ~ \forall \; x_1,p_1^1,p_1^2, \end{aligned}$$

(89)

$$\begin{aligned} {\tilde{\Pi }}_{\tau +1}&= F_\tau ({\tilde{\Pi }}_\tau , \varGamma _\tau ^{1},\varGamma _\tau ^2,{Z}_{\tau +1}), ~~ \tau \ge 1, \end{aligned}$$

(90)

where $F_\tau $ is as defined in Definition 5 in “Appendix 4.” We refer to this distribution as the strategy-independent common information belief (SI-CIB).

Let ${\tilde{\chi }}^1 \in \tilde{{\mathcal {H}}}^1$ be any strategy for virtual player 1 in game ${\mathscr {G}}_e$. Consider the problem of obtaining virtual player 2’s best response to the strategy ${\tilde{\chi }}^1$ with respect to the cost ${\mathcal {J}}({\tilde{\chi }}^1 ,{\tilde{\chi }}^2)$ defined in (18). This problem can be formulated as a Markov decision process (MDP) with common information and prescription history $C_t,\varGamma _{1:t-1}^{1:2}$ as the state. The control action at time t in this MDP is $\varGamma _t^2$, which is selected based on the information $C_t,\varGamma _{1:t-1}^{1:2}$ using strategy ${\tilde{\chi }}^2 \in {\mathcal {H}}^2$. The evolution of the state $C_t,\varGamma _{1:t-1}^{1:2}$ of this MDP is as follows

$$\begin{aligned} \{C_{t+1},\varGamma _{1:t}^{1:2}\} = \{C_t,Z_{t+1},\varGamma _{1:t-1}^{1:2}, {\tilde{\chi }}^1_t(C_t,\varGamma _{1:t-1}^{1:2}),\varGamma _t^2\}, \end{aligned}$$

(91)

where

$$\begin{aligned} {{\mathbb {P}}}^{(\tilde{{\chi }}^1,\tilde{{\chi }}^2)}[Z_{t+1} = z_{t+1} \mid C_t,\varGamma _{1:t-1}^{1:2}, \varGamma _t^2] = P_t^m[{\tilde{\Pi }}_t,\varGamma _t^1,\varGamma _t^2;z_{t+1}], \end{aligned}$$

(92)

almost surely. Here, $\varGamma _t^1 = {\tilde{\chi }}^1_t(C_t,\varGamma _{1:t-1}^{1:2})$ and the transformation $P_t^m$ is as defined in Definition 5 in “Appendix 4.” Notice that due to Lemma 3, the common information belief $\Pi _t$ associated with any strategy profile ${(\tilde{{\chi }}^1,\tilde{{\chi }}^2)}$ is equal to $\tilde{\Pi _t}$ almost surely. This results in the state evolution equation in (92). The objective of this MDP is to maximize, for a given ${\tilde{\chi }}^1$, the following cost

$$\begin{aligned} {{\mathbb {E}}}^{(\tilde{{\chi }}^1,\tilde{{\chi }}^2)}\left[ \sum _{t=1}^T {\tilde{c}}_t({\tilde{\Pi }}_t,\varGamma _t^1,\varGamma _t^2)\right] , \end{aligned}$$

(93)

where ${\tilde{c}}_t$ is as defined in Eq. (22). Due to Lemma 3, the total expected cost defined above is equal to the cost ${{\mathcal {J}}}(\tilde{{\chi }}^1,\tilde{{\chi }}^2)$ defined in (18).

The MDP described above can be solved using the following dynamic program. For every realization of virtual players’ information $c_{T+1},\gamma _{1:T}^{1:2}$, define

$$\begin{aligned} V^{{\tilde{\chi }}^1}_{T+1}(c_{T+1},\gamma _{1:T}^{1:2}) {:=} 0. \end{aligned}$$

In a backward-inductive manner, for each time $t \le T$ and each realization $c_{t},\gamma _{1:t-1}^{1:2}$, define

$$\begin{aligned} V^{{\tilde{\chi }}^1}_{t}(c_{t},\gamma _{1:t-1}^{1:2})&{:=} \sup _{\gamma _t^2}[{\tilde{c}}_t({\tilde{\pi }}_t,\gamma _t^1,\gamma _t^2)+ {{\mathbb {E}}}[V^{{\tilde{\chi }}^1}_{t+1}(c_{t},Z_{t+1},\gamma _{1:t}^{1:2}) \mid c_t,\gamma _{1:t}^{1:2}]], \end{aligned}$$

(94)

where $\gamma _t^1 = {\tilde{\chi }}^1_t(c_{t},\gamma _{1:t-1}^{1:2})$ and ${\tilde{\pi }}_t$ is the SI-CIB associated with the information $c_{t},\gamma _{1:t-1}^{1:2}$. Note that the measurable selection condition (see condition 3.3.2 in [16]) holds for the dynamic program described above. Thus, the value functions $V^{{\tilde{\chi }}^1}_{t}(\cdot )$ are measurable and there exists a measurable best-response strategy for player 2 which is a solution to the dynamic program described above. Therefore, we have

$$\begin{aligned} \sup _{{\tilde{\chi }}^2}{\mathcal {J}}({\tilde{\chi }}^1,{\tilde{\chi }}^2) = {{\mathbb {E}}}V^{{\tilde{\chi }}^1}_{1}(C_{1}). \end{aligned}$$

(95)

Claim 1

For any strategy ${\tilde{\chi }}^1 \in \tilde{{\mathcal {H}}}^1$ and for any realization of virtual players’ information $c_{t},\gamma _{1:t-1}^{1:2}$, we have

$$\begin{aligned} V^{{\tilde{\chi }}^1}_{t}(c_{t},\gamma _{1:t-1}^{1:2}) \ge V_t^u({\tilde{\pi }}_t), \end{aligned}$$

(96)

where $V_t^u$ is as defined in (26) and ${\tilde{\pi }}_t$ is the SI-CIB belief associated with the instance $c_{t},\gamma _{1:t-1}^{1:2}$. As a consequence, we have

$$\begin{aligned} \sup _{{\tilde{\chi }}^2}{\mathcal {J}}({\tilde{\chi }}^1,{\tilde{\chi }}^2) \ge {{\mathbb {E}}}V^{u}_{1}(\Pi _{1}). \end{aligned}$$

(97)

Proof

The proof is by backward induction. Clearly, the claim is true at time $t = T+1$. Assume that the claim is true for all times greater than t. Then, we have

$$\begin{aligned} V^{{\tilde{\chi }}^1}_{t}(c_{t},\gamma _{1:t-1}^{1:2})&= \sup _{\gamma _t^2}[{\tilde{c}}_t({\tilde{\pi }}_t,\gamma _t^1,\gamma _t^2) + {{\mathbb {E}}}[V^{{\tilde{\chi }}^1}_{t+1}(c_{t},Z_{t+1},\gamma _{1:t}^{1:2}) \mid c_t,\gamma _{1:t}^{1:2}]]\\&\ge \sup _{\gamma _t^2}[{\tilde{c}}_t({\tilde{\pi }}_t,\gamma _t^1,\gamma _t^2) + {{\mathbb {E}}}[V^{u}_{t+1}(F_t({\tilde{\pi }}_t,\gamma _t^{1:2},Z_{t+1})) \mid c_t,\gamma _{1:t}^{1:2}]]\\&\ge V_t^u({\tilde{\pi }}_t). \end{aligned}$$

The first equality follows from the definition in (94), and the inequality after that follows from the induction hypothesis. The last inequality is a consequence of the definition of the value function $V_t^u$. This completes the induction argument. Further, using Claim 1 and the result in (95), we have

$$\begin{aligned} \sup _{{\tilde{\chi }}^2}{\mathcal {J}}({\tilde{\chi }}^1,{\tilde{\chi }}^2) = {{\mathbb {E}}}V^{{\tilde{\chi }}^1}_{1}(C_{1}) \ge {{\mathbb {E}}}V^{u}_{1}({\tilde{\Pi }}_{1}) = {{\mathbb {E}}}V^{u}_{1}(\Pi _{1}). \end{aligned}$$

$\square $

We can therefore say that

$$\begin{aligned} S^u({\mathscr {G}}_e)&= \inf _{{\tilde{\chi }}^1}\sup _{{\tilde{\chi }}^2}{\mathcal {J}}({\tilde{\chi }}^1,{\tilde{\chi }}^2) \ge \inf _{{\tilde{\chi }}^1}{{\mathbb {E}}}V^{u}_{1}(\Pi _{1}) = {{\mathbb {E}}}V^{u}_{1}(\Pi _{1}). \end{aligned}$$

(98)

Further, for the strategy ${\tilde{\chi }}^{1*}$ defined in Definition 4, inequalities (96) and (97) hold with equality. We can prove this using an inductive argument similar to the one used to prove Claim 1. Therefore, we have

$$\begin{aligned} S^u({\mathscr {G}}_e)&= \inf _{{\tilde{\chi }}^1}\sup _{{\tilde{\chi }}^2}{\mathcal {J}}({\tilde{\chi }}^1,{\tilde{\chi }}^2) \le \sup _{{\tilde{\chi }}^2}{\mathcal {J}}({\tilde{\chi }}^{1*},{\tilde{\chi }}^2) = {{\mathbb {E}}}V^{{\tilde{\chi }}^{1*}}_{1}(C_{1}) = {{\mathbb {E}}}V^{u}_{1}(\Pi _{1}). \end{aligned}$$

(99)

Combining (98) and (99), we have

$$\begin{aligned} S^u({\mathscr {G}}_e) = {{\mathbb {E}}}V^{u}_{1}(\Pi _{1}). \end{aligned}$$

Thus, the inequality in (99) holds with equality which leads us to the result that the strategy ${\tilde{\chi }}^{1*}$ is a min–max strategy in game ${\mathscr {G}}_e$. A similar argument can be used to show that

$$\begin{aligned} S^l({\mathscr {G}}_e) = {{\mathbb {E}}}V^{l}_{1}(\Pi _{1}), \end{aligned}$$

and that the strategy ${\tilde{\chi }}^{2*}$ defined in Definition 4 is a max–min strategy in game ${\mathscr {G}}_e$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kartik, D., Nayyar, A. Upper and Lower Values in Zero-Sum Stochastic Games with Asymmetric Information. Dyn Games Appl 11, 363–388 (2021). https://doi.org/10.1007/s13235-020-00364-x

Download citation

Published: 27 July 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s13235-020-00364-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Upper and Lower Values in Zero-Sum Stochastic Games with Asymmetric Information

Abstract

Access this article

Similar content being viewed by others

Zero-Sum Stochastic Games

Zero-Sum Stochastic Games

Zero-Sum Stochastic Games

Change history

16 September 2020

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Proof of Lemma 1

Remark 6

Proof of Lemma 2

Proof of Theorem 1

Proof of Lemma 3

Definition 5

Some Continuity Results

Lemma 5

Proof

Lemma 6

Proof

Lemma 7

Proof

Proof of Lemma 4

Lemma 8

Proof

Proof of Theorem 2

Claim 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation