Abstract
A general model for zero-sum stochastic games with asymmetric information is considered. In this model, each player’s information at each time can be divided into a common information part and a private information part. Under certain conditions on the evolution of the common and private information, a dynamic programming characterization of the value of the game (if it exists) is presented. If the value of the zero-sum game does not exist, then the dynamic program provides bounds on the upper and lower values of the game.
Similar content being viewed by others
Change history
16 September 2020
A Correction to this paper has been published: https://doi.org/10.1007/s13235-020-00366-9
Notes
Note that we do not impose Assumption 2 of [26].
For example, see the delayed sharing information structure in Sect. 2.2.
Our notation is different from [23, Section IV.3].
We refer the reader to [23, Chapter III] for a detailed discussion on the universal belief space.
Note that the belief \({{\mathbb {P}}}^{({\tilde{\chi }}^1,{\tilde{\chi }}^2)}[x_t, p_t^{1:2} \mid c_t,\gamma _{1:t-1}^{1:2}] = {{\mathbb {P}}}^{({\tilde{\chi }}^1,{\tilde{\chi }}^2)}[x_t, p_t^{1:2} \mid c_t,\gamma _{1:t}^{1:2}]\) because \(\gamma _t^i = {\tilde{\chi }}_t^i( c_t,\gamma _{1:t-1}^{1:2})\), \(i=1,2\).
References
Alpcan T, Başar T (2010) Network security: a decision and game-theoretic approach. Cambridge University Press, Cambridge
Amin S, Litrico X, Sastry S, Bayen AM (2012) Cyber security of water scada systemspart I: analysis and experimentation of stealthy deception attacks. IEEE Trans Control Syst Technol 21(5):1963–1970
Amin S, Schwartz GA, Cardenas AA, Sastry SS (2015) Game-theoretic models of electricity theft detection in smart utility networks: providing new capabilities with advanced metering infrastructure. IEEE Control Syst Mag 35(1):66–81
Aumann RJ, Maschler M, Stearns RE (1995) Repeated games with incomplete information. MIT Press, Cambridge
Başar T (1981) On the saddle-point solution of a class of stochastic differential games. J Optim Theory Appl 33(4):539–556
Basar T, Olsder GJ (1999) Dynamic noncooperative game theory, vol 23. SIAM, Philadelphia
Bondi E, Oh H, Xu H, Fang F, Dilkina B, Tambe M (2019) Using game theory in real time in the real world: A conservation case study. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp. 2336–2338. International Foundation for Autonomous Agents and Multiagent Systems
Fang F, Nguyen TH, Pickles R, Lam WY, Clements GR, An B, Singh A, Schwedock BC, Tambe M, Lemieux A (2017) Paws-a deployed game-theoretic application to combat poaching. AI Mag 38(1):23–36
Fang F, Stone P, Tambe M (2015) When security games go green: Designing defender strategies to prevent poaching and illegal fishing. In: Twenty-fourth international joint conference on artificial intelligence
Filar J, Vrieze K (2012) Competitive Markov decision processes. Springer, Berlin
Fudenberg D, Tirole J (1991) Game theory. MIT Press, Cambridge
Gensbittel F, Oliu-Barton M, Venel X (2014) Existence of the uniform value in zero-sum repeated games with a more informed controller. J Dyn Games 1(3):411–445
Gensbittel F, Renault J (2015) The value of Markov chain games with incomplete information on both sides. Math Oper Res 40(4):820–841
Hansen EA (1998) Solving pomdps by searching in policy space. In: Proceedings of the fourteenth conference on Uncertainty in artificial intelligence, pp 211–219
Haywood O Jr (1954) Military decision and game theory. J Oper Res Soc Am 2(4):365–385
Hernández-Lerma O, Lasserre JB (2012) Discrete-time Markov control processes: basic optimality criteria, vol 30. Springer, Berlin
Kartik D, Nayyar A (2019) Stochastic zero-sum games with asymmetric information. In: 58th IEEE conference on decision and control. IEEE
Kartik D, Nayyar A (2019) Zero-sum stochastic games with asymmetric information. arXiv preprint arXiv:1909.01445
Li L, Shamma J (2014) LP formulation of asymmetric zero-sum stochastic games. In: 53rd IEEE conference on decision and control, pp. 1930–1935. IEEE
Li X, Venel X (2016) Recursive games: uniform value, tauberian theorem and the mertens conjecture. Int J Game Theory 45(1–2):155–189
Loch J, Singh SP (1998) Using eligibility traces to find the best memoryless policy in partially observable markov decision processes. In: Proceedings of the fifteenth international conference on machine learning, pp. 323–331
Maschler M, Solan E, Zamir S (2013) Game theory. Cambridge University Press. https://doi.org/10.1017/CBO9780511794216
Mertens JF, Sorin S, Zamir S (2015) Repeated games, vol 55. Cambridge University Press, Cambridge
Morrow JD (1994) Game theory for political scientists. Princeton University Press, Princeton
Nayyar A, Gupta A (2017) Information structures and values in zero-sum stochastic games. In: American control conference (ACC), 2017, pp. 3658–3663. IEEE
Nayyar A, Gupta A, Langbort C, Başar T (2014) Common information based Markov perfect equilibria for stochastic games with asymmetric information: finite games. IEEE Trans Autom Control 59(3):555–570
Nayyar A, Mahajan A, Teneketzis D (2010) Optimal control strategies in delayed sharing information structures. IEEE Trans Autom Control 56(7):1606–1620
Nayyar A, Mahajan A, Teneketzis D (2013) Decentralized stochastic control with partial history sharing: a common information approach. IEEE Trans Autom Control 58(7):1644–1658
Osborne MJ, Rubinstein A (1994) A course in game theory. MIT Press, Cambridge
Ouyang Y, Tavafoghi H, Teneketzis D (2017) Dynamic games with asymmetric information: common information based perfect bayesian equilibria and sequential decomposition. IEEE Trans Autom Control 62(1):222–237
Ponssard JP, Sorin S (1980) The lp formulation of finite zero-sum games with incomplete information. Int J Game Theory 9(2):99–105
Renault J (2006) The value of Markov chain games with lack of information on one side. Math Oper Res 31(3):490–512
Renault J (2012) The value of repeated games with an informed controller. Math Oper Res 37(1):154–179
Rosenberg D (1998) Duality and markovian strategies. Int J Game Theory 27(4):577–597
Rosenberg D, Solan E, Vieille N (2004) Stochastic games with a single controller and incomplete information. SIAM J Control Optim 43(1):86–110
Rudin W et al (1964) Principles of mathematical analysis, vol 3. McGraw-hill, New York
Sandell NR Jr (1974) Control of Finite-State, Finite-Memory Stochastic Systems, Massachusetts Institute of Technology, PhD Thesis. https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19740018646.pdf
Shelar D, Amin S (2017) Security assessment of electricity distribution networks under der node compromises. IEEE Trans Control Netw Syst 4(1):23–36. https://doi.org/10.1109/TCNS.2016.2598427
Teneketzis D (2006) On the structure of optimal real-time encoders and decoders in noisy communication. IEEE Trans Inf Theory 52(9):4017–4035
Vasal D, Sinha A, Anastasopoulos A (2019) A systematic process for evaluating structured perfect bayesian equilibria in dynamic games with asymmetric information. IEEE Trans Autom Control 64(1):78–93
Washburn A, Wood K (1995) Two-person zero-sum games for network interdiction. Oper Res 43(2):243–251
Wu M, Amin S (2018) Securing infrastructure facilities: When does proactive defense help? Dyn Games Appl 9:1–42
Zheng J, Castañón DA (2013) Decomposition techniques for Markov zero-sum games with nested information. In: 52nd IEEE conference on decision and control, pp. 574–581. IEEE
Zhu Q, Basar T (2015) Game-theoretic methods for robustness, security, and resilience of cyberphysical control systems: games-in-games principle for optimal cross-layer resilient control systems. IEEE Control Syst Mag 35(1):46–65
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Preliminary version of this paper appears in the proceedings of the 58th Conference on Decision and Control (CDC), 2019 [17]
Appendices
Proof of Lemma 1
It was shown in [26] that there exist bijective mappings \({\mathcal {M}}^i: {\mathcal {G}}^i \rightarrow {\mathcal {H}}^i\), \(i =1,2,\) such that for every \(g^1 \in {\mathcal {G}}^1\) and \(g^2 \in {\mathcal {G}}^2\), we have
Therefore, for any strategy \(g^1 \in {\mathcal {G}}^1\), we have
Consequently,
This implies that \(S^u({\mathscr {G}}) = S^u({\mathscr {G}}_v)\). We can similarly prove that \(S^l({\mathscr {G}}) = S^l({\mathscr {G}}_v)\).
Remark 6
We can also show that a strategy profile \((g^1,g^2)\) is a Nash equilibrium in game \({\mathscr {G}}\) if and only if \(({\mathcal {M}}^1(g^1),{\mathcal {M}}^2(g^2))\) is a Nash equilibrium in game \({\mathscr {G}}_v\).
Proof of Lemma 2
Let us consider the evolution of the virtual game \({\mathscr {G}}_v\) under the strategy profile \((\chi ^1,\chi ^2)\) and the expanded virtual game \({\mathscr {G}}_e\) under the strategy profile \(({\tilde{\chi }}^1,{\tilde{\chi }}^2)\). Let the primitive variables and the randomization variables \(K_t^i\) in both games be identical. The variables such as the state, action and information variables in the expanded game \({\mathscr {G}}_e\) are distinguished from those in the virtual game \({\mathscr {G}}_v\) by means of a tilde. For instance, \(X_t\) is the state in game \({\mathscr {G}}_v\) and \({\tilde{X}}_t\) is the state in game \({\mathscr {G}}_e\).
We will prove by induction that the system evolution in both these games is identical over the entire horizon. This is clearly true at the end of time \(t=1\) because the state, observations and the common and private information variables are identical in both games. Moreover, since \({\chi }^i = \varrho ^i({{\tilde{\chi }}}^1,{{\tilde{\chi }}}^2)\), \(i=1,2\), the strategies \(\chi ^i_1\) and \({\tilde{\chi }}^i_1\) are identical by definition (see Definition 2). Thus, the prescriptions and actions at \(t=1\) are also identical.
For induction, assume that the system evolution in both games is identical until the end of time t. Then,
Using Eqs. (2), (4) and (3), we can similarly argue that \({Y}_{t+1}^i = \tilde{{Y}}_{t+1}^i\), \({P}_{t+1}^i = \tilde{{P}}_{t+1}^i\) and \({C}_{t+1} = \tilde{{C}}_{t+1}\). Since \({\chi }^i = \varrho ^i({{\tilde{\chi }}}^1,{{\tilde{\chi }}}^2)\), we also have
Here, equality (a) follows from the construction of the mapping \(\varrho ^i\) (see Definition 2) and equality (b) follows from the fact that \(C_{t+1} = {\tilde{C}}_{t+1}\). Further,
Thus, by induction, the hypothesis is true for every \(1\le t \le T\). This proves that the virtual and expanded games have identical dynamics under strategy profiles \(({\chi }^1,\chi ^2)\) and \((\tilde{{\chi }}^1,{\tilde{\chi }}^2)\).
Since the virtual and expanded games have the same cost structure, having identical dynamics ensures that strategy profiles \(({\chi }^1,\chi ^2)\) and \((\tilde{{\chi }}^1,{\tilde{\chi }}^2)\) have the same expected cost in games \({\mathscr {G}}_v\) and \({\mathscr {G}}_e\), respectively. Therefore, \({\mathcal {J}}({\chi }^1,\chi ^2) = {{\mathcal {J}}}(\tilde{{\chi }}^1,{\tilde{\chi }}^2)\).
Proof of Theorem 1
For any strategy \(\chi ^1 \in {\mathcal {H}}^1\), we have
because \({\mathcal {H}}^2 \subseteq {\tilde{{\mathcal {H}}}}^2\). Further,
where the first equality is due to Lemma 2, the second equality is because \(\varrho ^1(\chi ^1,{\tilde{\chi }}^2) = \chi ^1\) and the last inequality is due to the fact that \(\varrho ^2(\chi ^1,{\tilde{\chi }}^2) \in {{\mathcal {H}}}^2\) for any \({\tilde{\chi }}^2 \in {\tilde{{\mathcal {H}}}}^2\).
Combining (47) and (50), we obtain that
Now,
where inequality (53) is true since \({\mathcal {H}}^1 \subseteq {\tilde{{\mathcal {H}}}}^1\) and the equality in (54) follows from (51). Therefore, \(S^u({\mathscr {G}}_e) \le S^u({\mathscr {G}}_v)\). We can use similar arguments to show that \(S^l({\mathscr {G}}_v) \le S^l({\mathscr {G}}_e).\)
Proof of Lemma 3
We begin with defining the following transformations for each time t. Recall that \({{\mathcal {S}}}_t\) is the set of all possible common information beliefs at time t and \({{\mathcal {B}}}_t^i\) is the prescription space for virtual player i at time t.
Definition 5
-
(i)
Let \(P_t^j: {{\mathcal {S}}}_t \times {{\mathcal {B}}}_t^1 \times {{\mathcal {B}}}_t^2 \rightarrow \varDelta ({{\mathcal {Z}}}_{t+1} \times {{\mathcal {X}}}_{t+1} \times {\mathcal {P}}_{t+1}^1 \times {\mathcal {P}}_{t+1}^2)\) be defined as
$$\begin{aligned}&P_t^j(\pi _t,\gamma _t^{1:2}; z_{t+1}, x_{t+1},p_{t+1}^{1:2}) \end{aligned}$$(56)$$\begin{aligned}&\quad {:=} \sum _{{x}_t,{p}^{1:2}_t,{u}_t^{1:2}}\pi _t({x}_t,{p}_t^{1:2})\gamma _t^1({p}_t^1;u_t^1)\gamma _t^2( {p}_t^2; u_t^2){{\mathbb {P}}}[{x}_{t+1}, {p}_{t+1}^{1:2},{z}_{t+1} \mid {x}_t,{p}_t^{1:2},{u}_t^{1:2}]. \end{aligned}$$(57)We will use \(P_t^j(\pi _t,\gamma _t^{1:2})\) as a shorthand for the probability distribution \(P_t^j(\pi _t,\gamma _t^{1:2}; \cdot , \cdot , \cdot )\). The distribution \(P_t^j(\pi _t,\gamma _t^{1:2})\) can be viewed as a joint distribution over the variables \(Z_{t+1},X_{t+1}, P_{t+1}^{1:2}\) if the distribution on \(X_t, P^{1:2}_t\) is \(\pi _t\) and prescriptions \(\gamma ^{1:2}_t\) are chosen by the virtual players at time t.
-
(ii)
Let \(P_t^m: {{\mathcal {S}}}_t \times {{\mathcal {B}}}_t^1 \times {{\mathcal {B}}}_t^2 \rightarrow \varDelta {{\mathcal {Z}}}_{t+1}\) be defined as
$$\begin{aligned} P_t^m(\pi _t,\gamma _t^{1:2}; z_{t+1}) = \sum _{x_{t+1},p_{t+1}^{1:2}} P_t^j(\pi _t,\gamma _t^{1:2}; z_{t+1}, x_{t+1},p_{t+1}^{1:2}). \end{aligned}$$(58)The distribution \(P_t^m(\pi _t,\gamma _t^{1:2})\) is the marginal distribution of the variable \(Z_{t+1}\) obtained from the joint distribution \(P_t^j(\pi _t,\gamma _t^{1:2})\) defined above.
-
(iii)
Let \(F_t: {{\mathcal {S}}}_t \times {{\mathcal {B}}}_t^1 \times {{\mathcal {B}}}_t^2 \times {{\mathcal {Z}}}_{t+1} \rightarrow {{\mathcal {S}}}_{t+1}\) be defined as
$$\begin{aligned} F_t(\pi _t,\gamma _t^{1:2},z_{t+1})= {\left\{ \begin{array}{ll} \frac{P_t^j(\pi _t,\gamma _t^{1:2};z_{t+1},\cdot ,\cdot )}{P_t^m(\pi _t,\gamma _t^{1:2};z_{t+1})} &{}\text {if } P_t^m(\pi _t,\gamma _t^{1:2};z_{t+1}) > 0\\ G_t(\pi _t,\gamma _t^{1:2},z_{t+1}) &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$(59)where \(G_t: {{\mathcal {S}}}_t \times {{\mathcal {B}}}_t^1 \times {{\mathcal {B}}}_t^2 \times {{\mathcal {Z}}}_{t+1} \rightarrow {{\mathcal {S}}}_{t+1}\) can be any arbitrary measurable mapping. One such mapping is the one that maps every element \(\pi _t,\gamma _t^{1:2},z_{t+1}\) to the uniform distribution over the finite space \({{\mathcal {X}}}_{t+1} \times {\mathcal {P}}_{t+1}^1 \times {\mathcal {P}}_{t+1}^2\).
Let the collection of transformations \(F_t\) that can be constructed using the method described in Definition 5 be denoted by \({{\mathscr {B}}}\). Note that the transformations \(P_t^j, P_t^m\) and \(F_t\) do not depend on the strategy profile \(({\tilde{\chi }}^1, {\tilde{\chi }}^2)\) because the term \({{\mathbb {P}}}[{x}_{t+1}, {p}_{t+1}^{1:2},{z}_{t+1} \mid {x}_t,{p}_t^{1:2},{u}_t^{1:2}]\) in (57) depends only on the system dynamics (see Eqs. (12–16)) and not on the strategy profile \(({\tilde{\chi }}^1,{\tilde{\chi }}^2)\).
Consider a strategy profile \(({\tilde{\chi }}^1, {\tilde{\chi }}^2)\). Note that the number of possible realizations of common information and prescription history under \(({\tilde{\chi }}^1, {\tilde{\chi }}^2)\) is finite. Let \(c_{t+1},\gamma _{1:t}^{1:2}\) be a realization of the common information and prescription history at time \(t+1\) with nonzero probability of occurrence under \(({\tilde{\chi }}^1, {\tilde{\chi }}^2)\). For this realization of virtual players’ information, the common information-based belief on the state and private information at time \(t+1\) is given by
Notice that expression (60) is well defined; that is, the denominator is nonzero because of our assumption that the realization \(c_{t+1},\gamma _{1:t}^{1:2}\) has nonzero probability of occurrence. Let us consider the numerator in expression (60). For convenience, we will denote it with \({{\mathbb {P}}}^{({\tilde{\chi }}^1,{\tilde{\chi }}^2)}[{x}_{t+1}, {p}_{t+1}^{1:2},{z}_{t+1} \mid {c}_{t},\gamma _{1:t}^{1:2}]\). We have
where \(\pi _t\) is the common information belief on \(X_t, P_t^1,P_t^2\) at time t given the realizationFootnote 5\(c_t,\gamma _{1:t-1}^{1:2}\) and \(P_t^j\) is as defined in Definition 5. The equality in (62) is due to the structure of the system dynamics in game \({\mathscr {G}}_e\) described by Eqs. (12–16). Similarly, the denominator in (60) satisfies
where \(P_t^m\) is as defined is Definition 5. Thus, from Eq. (60), we have
where \(F_t\) is as defined in Definition 5. Since relation (65) holds for every realization \(c_{t+1}, \gamma _{1:t}^{1:2}\) that has nonzero probability of occurrence under \(({\tilde{\chi }}^1,{\tilde{\chi }}^2)\), we can conclude that the common information belief \(\Pi _t\) evolves almost surely as
under the strategy profile \(({\tilde{\chi }}^1,{\tilde{\chi }}^2)\).
The expected cost at time t can be expressed as follows
where the function \({\tilde{c}}_t\) is as defined in Eq. (22). Therefore, the total cost can be expressed as
Some Continuity Results
In this section, we will state and prove some technical results that will be useful for proving Lemma 4.
Let \({{\mathcal {S}}}_t\) denote the set of all probability distributions over the finite set \({{\mathcal {X}}}_{t} \times {{\mathcal {P}}}_{t}^1 \times {{\mathcal {P}}}_{t}^2\). Thus, \({{\mathcal {S}}}_t\) is the set of all possible common information-based beliefs at time t. Define
The functions \({\tilde{c}}_t\) in (22), \(P_t^j\) in (56), \(P_t^m\) in (58) and \(F_t\) in (65) were defined for any \(\pi _t \in {{\mathcal {S}}}_t\). We will extend the domain of the argument \(\pi _t\) in these functions to \(\bar{{{\mathcal {S}}}}_t\) as follows. For any \(\gamma _t^i \in {{\mathcal {B}}}_t^i, i = 1,2\), \(z_{t+1} \in {{\mathcal {Z}}}_{t+1}\), \(0 \le \alpha \le 1\) and \(\pi _t \in {{{\mathcal {S}}}}_t\), let
-
(i)
\({\tilde{c}}_t(\alpha \pi _t,\gamma _t^1,\gamma _t^2) {:=} \alpha {\tilde{c}}_t(\pi _t,\gamma _t^1,\gamma _t^2)\)
-
(ii)
\(P^j_t(\alpha \pi _t,\gamma _t^{1:2}) {:=} \alpha P^j_t(\pi _t,\gamma _t^{1:2})\)
-
(iii)
\(P^m_t(\alpha \pi _t,\gamma _t^{1:2}) {:=} \alpha P^m_t(\pi _t,\gamma _t^{1:2})\)
-
(iv)
\( F_t(\alpha \pi _t,\gamma _t^{1:2},z_{t+1}) {:=} {\left\{ \begin{array}{ll} F_t(\pi _t,\gamma _t^{1:2},z_{t+1}) &{}\text {if } \alpha > 0\\ \textit{\textbf{0}} &{} \text {if } \alpha = 0, \end{array}\right. }\)
where \(\textit{\textbf{0}}\) is a zero vector of size \(|{{\mathcal {X}}}_{t} \times {{\mathcal {P}}}_{t}^1 \times {{\mathcal {P}}}_{t}^2|\).
Having extended the domain of the above functions, we can also extend the domain of the argument \(\pi _t\) in the functions \(w_t^u(\cdot ), w_t^l(\cdot ), V_t^u(\cdot ), V_t^l(\cdot )\) defined in the dynamic programs of Section 4.2. First, for any \(0 \le \alpha \le 1\) and \(\pi _{T+1} \in {{{\mathcal {S}}}}_{T+1}\), define \(V^u_{T+1}(\alpha \pi _{T+1}) {:=}0\). We can then define the following functions for every \(t \le T\) in a backward-inductive manner: For any \(\gamma _t^i \in {{\mathcal {B}}}_t^i, i = 1,2\), \(0 \le \alpha \le 1\) and \(\pi _t \in {{{\mathcal {S}}}}_t\), let
Note that when \(\alpha =1\), the above definition of \(w^u_t\) is equal to the definition of \(w^u_t\) in Eq. (26) of the dynamic program. We can similarly extend \(w^l_t\) and \(V^l_t\). These extended value functions satisfy the following homogeneity property. A similar result was shown in [19, Lemma III.1] for a special case of our model.
Lemma 5
For any constant \(0 \le \alpha \le 1\) and any \(\pi _t \in \bar{{{\mathcal {S}}}}_t\), we have \(\alpha V_t^u(\pi _t) = V_t^u(\alpha \pi _t)\) and \(\alpha V_t^l(\pi _t) = V_t^l(\alpha \pi _t)\).
Proof
The proof can be easily obtained from the above definitions of the extended functions.
The following lemmas will be used in “Appendix 6” to establish some useful properties of the extended functions.
Lemma 6
Let \(V: \bar{{{\mathcal {S}}}}_{t+1} \rightarrow {{\mathbb {R}}}\) be a continuous function satisfying \(V(\alpha \pi ) = \alpha V(\pi )\) for every \(0 \le \alpha \le 1\) and \(\pi \in \bar{{{\mathcal {S}}}}_{t+1}\). Define
For a fixed \(\gamma _t^1,\gamma _t^2\), \(V'(\cdot ,\gamma _t^1,\gamma _t^2)\) is a function from \(\bar{{{\mathcal {S}}}}_{t+1}\) to \({{\mathbb {R}}}\). Then, the family of functions
is equicontinuous. Similarly, the following families of functions
are equicontinuous in their respective arguments.
Proof
A continuous function is bounded and uniformly continuous over a compact domain (see Theorem 4.19 in [36]). Therefore, V is bounded and uniformly continuous over \(\bar{{{\mathcal {S}}}}_{t+1}\).
Using the fact that \(V(\alpha \pi ) = \alpha V(\pi )\) and the definition of \(F_t\) in Definition 5, the function \(V'\) can be written as
Recall that \(P_t^j\) is trilinear in \(\pi _t,\gamma _t^1\) and \(\gamma _t^2\) with bounded coefficients for a fixed value of \(z_{t+1}\) (see (56)). Therefore, for each \({z}_{t+1}\), \(\{P^j_t(\cdot , \gamma ^1_t,\gamma ^2_t,{z}_{t+1})\}\) is an equicontinuous family of functions in the argument \(\pi _t\), where \(P^j_t(\pi _t, \gamma ^1_t,\gamma ^2_t,{z}_{t+1})\) is a shorthand notation for the measure \(P^j_t(\pi _t, \gamma ^1_t,\gamma ^2_t,{z}_{t+1},\cdot ,\cdot )\) over the space \({{\mathcal {X}}}_{t+1} \times {\mathcal {P}}_{t+1}^1 \times {\mathcal {P}}_{t+1}^2\).
Also, since V is uniformly continuous, the family \(\left\{ V\left( {{P}_t^j(\cdot ,\gamma _t^{1:2},{z}_{t+1})}\right) \right\} \) is equicontinuous in \(\pi _t\) for each \(z_{t+1}\). This is because composition with a uniformly continuous function preserves equicontinuity. Therefore, the family of functions \({{\mathscr {F}}}_1\) is equicontinuous in \(\pi _t\). We can use similar arguments to prove equicontinuity of the other two families. \(\square \)
Lemma 7
Let \(w: {{\mathcal {B}}}^1_t \times {{\mathcal {B}}}^2_t \rightarrow {{\mathbb {R}}}\) be a function such that (i) the family of functions \(\{w(\cdot ,\gamma ^2): \gamma ^2 \in {{\mathcal {B}}}^2_t\}\) is equicontinuous in the first argument and (ii) the family of functions \(\{w(\gamma ^1, \cdot ): \gamma ^1 \in {{\mathcal {B}}}^1_t\}\) is equicontinuous in the second argument. Then, \(\sup _{\gamma ^2}w(\gamma ^1,\gamma ^2)\) is a continuous function of \(\gamma ^1\) and, similarly, \(\inf _{\gamma ^1}w(\gamma ^1,\gamma ^2)\) is a continuous function of \(\gamma ^2\).
Proof
Let \(\epsilon > 0\). For a given \(\gamma ^1\), there exists a \(\delta > 0\) such that
Let \({\bar{\gamma }}^2\) be a prescription such that
Note that the existence of \({\bar{\gamma }}^2\) is guaranteed because of continuity of \(w(\gamma ^1, \cdot )\) in the second argument and compactness of \({{\mathcal {B}}}^2_t\). Pick any \(\gamma '^1\) satisfying \(||\gamma ^1 - \gamma '^1|| \le \delta \). Let \({\bar{\gamma }}'^2\) be a prescription such that
Using (77), we have
Equations (78)–(81) imply that \(\sup _{\gamma ^2}w(\gamma ^1,\gamma ^2)\) is a continuous function of \(\gamma ^1\). We can use a similar argument for showing continuity of \(\inf _{\gamma ^1}w(\gamma ^1,\gamma ^2)\) in \(\gamma ^2\). \(\square \)
Proof of Lemma 4
We first use the definitions of extensions of \(w^u_t,w^l_t,V^u_t,V^l_t\) in “Appendix 5” and Lemmas 5 and 6 to establish the following equicontinuity result.
Lemma 8
The families of functions
are all equicontinuous in their arguments for every \(t \le T\). A similar statement holds for \(w_t^l\).
Proof
We use a backward-induction argument for the proof. For induction, assume that \(V_{t+1}^u\) is a continuous function for some \(t \le T\). This is clearly true for \(t=T\). Using the continuity of \(V^u_{t+1}\), we will establish the statement of the lemma for time t and also prove the continuity of \(V^u_t\). This establishes the lemma for all \(t \le T\).
Equicontinuity of \(w^u_t\): Since \({\tilde{c}}_t(\pi _t,\gamma _t^1,\gamma _t^2)\) is linear in \(\pi _t\) with uniformly bounded coefficients for any given \(\gamma _t^{1:2}\) (see (22)), it is equicontinuous in the argument \(\pi _t\). In Lemma 5, we showed that the value functions \(V_t^u\) satisfy the condition \(V_t^u(\alpha \pi ) = \alpha V_t^u(\pi )\) for every \(0 \le \alpha \le 1\), \(\pi \in {{\mathcal {S}}}_{t}\). Further, due to our induction hypothesis, \(V_{t+1}^u\) is continuous. Thus, using Lemma 6, the second term of \(w_t^u\),
is also equicontinuous in \(\pi _t\). Hence, the family \({{\mathscr {F}}}_t^a\) is equicontinuous in \(\pi _t\).
Continuity of \(V^u_t\): Due to the equicontinuity of the family \({{\mathscr {F}}}_t^a\), we have the following. For any given \(\epsilon > 0\) and \(\pi _t \in \bar{{{\mathcal {S}}}}_t\), there exists a \(\delta > 0\) such that
for every \(\gamma _t^1, \gamma _t^2\) and \(\pi _t'\) satisfying \(||\pi _t - \pi '_t|| < \delta \). Therefore,
for every \(\pi _t'\) that satisfies \(||\pi _t - \pi '_t|| < \delta \). Similarly, \(V^u_t(\pi _t) \ge V^u_t(\pi '_t) -\epsilon \) for every \(\pi _t'\) that satisfies \(||\pi _t - \pi '_t|| < \delta \). Therefore, \(V^u_t(\pi _t)\) is continuous at \(\pi _t\).
Hence, by induction, we can say that the family \({{\mathscr {F}}}_t^a\) is equicontinuous in \(\pi _t\) for every \(t \le T\). We can use similar arguments to prove the equicontinuity of the other families. \(\square \)
The continuity of \(w^u_t\) established above implies that \(\sup _{\gamma _t^2}w_t^u(\pi _t,\gamma _t^1,\gamma _t^2)\) is achieved for every \(\pi _t, \gamma ^1_t.\) Further, Lemma 8 implies that \(w_t^u\) and \(w_t^l\) satisfy the equicontinuity conditions in Lemma 7 for any given realization of belief \(\pi _t\). Therefore, we can use Lemma 7 to argue that \(\sup _{\gamma _t^2}w_t^u(\pi _t,\gamma _t^1,\gamma _t^2)\) is continuous in \(\gamma _t^1\). And since \(\gamma _t^1\) lies in the compact space \({{\mathcal {B}}}_t^1\), a minmaximizer exists for the function \(w_t^u\). Further, we can use the measurable selection condition (see Condition 3.3.2 in [16]) to prove the existence of measurable mapping \(\varXi _t^1(\pi _t)\) as defined in Lemma 4. A similar argument can be made to establish the existence of a maxminimizer and a measurable mapping \(\varXi _t^2(\pi _t)\) as defined in Lemma 4. This concludes the proof of Lemma 4.
Proof of Theorem 2
Let us first define a distribution \({\tilde{\Pi }}_t\) over the space \({{\mathcal {X}}}_t \times {\mathcal {P}}_t^1 \times {\mathcal {P}}_t^2\) in the following manner. The distribution \({\tilde{\Pi }}_t\), given \(C_t,\varGamma _{1:t-1}^{1:2}\), is recursively obtained using the following relation
where \(F_\tau \) is as defined in Definition 5 in “Appendix 4.” We refer to this distribution as the strategy-independent common information belief (SI-CIB).
Let \({\tilde{\chi }}^1 \in \tilde{{\mathcal {H}}}^1\) be any strategy for virtual player 1 in game \({\mathscr {G}}_e\). Consider the problem of obtaining virtual player 2’s best response to the strategy \({\tilde{\chi }}^1\) with respect to the cost \({\mathcal {J}}({\tilde{\chi }}^1 ,{\tilde{\chi }}^2)\) defined in (18). This problem can be formulated as a Markov decision process (MDP) with common information and prescription history \(C_t,\varGamma _{1:t-1}^{1:2}\) as the state. The control action at time t in this MDP is \(\varGamma _t^2\), which is selected based on the information \(C_t,\varGamma _{1:t-1}^{1:2}\) using strategy \({\tilde{\chi }}^2 \in {\mathcal {H}}^2\). The evolution of the state \(C_t,\varGamma _{1:t-1}^{1:2}\) of this MDP is as follows
where
almost surely. Here, \(\varGamma _t^1 = {\tilde{\chi }}^1_t(C_t,\varGamma _{1:t-1}^{1:2})\) and the transformation \(P_t^m\) is as defined in Definition 5 in “Appendix 4.” Notice that due to Lemma 3, the common information belief \(\Pi _t\) associated with any strategy profile \({(\tilde{{\chi }}^1,\tilde{{\chi }}^2)}\) is equal to \(\tilde{\Pi _t}\) almost surely. This results in the state evolution equation in (92). The objective of this MDP is to maximize, for a given \({\tilde{\chi }}^1\), the following cost
where \({\tilde{c}}_t\) is as defined in Eq. (22). Due to Lemma 3, the total expected cost defined above is equal to the cost \({{\mathcal {J}}}(\tilde{{\chi }}^1,\tilde{{\chi }}^2)\) defined in (18).
The MDP described above can be solved using the following dynamic program. For every realization of virtual players’ information \(c_{T+1},\gamma _{1:T}^{1:2}\), define
In a backward-inductive manner, for each time \(t \le T\) and each realization \(c_{t},\gamma _{1:t-1}^{1:2}\), define
where \(\gamma _t^1 = {\tilde{\chi }}^1_t(c_{t},\gamma _{1:t-1}^{1:2})\) and \({\tilde{\pi }}_t\) is the SI-CIB associated with the information \(c_{t},\gamma _{1:t-1}^{1:2}\). Note that the measurable selection condition (see condition 3.3.2 in [16]) holds for the dynamic program described above. Thus, the value functions \(V^{{\tilde{\chi }}^1}_{t}(\cdot )\) are measurable and there exists a measurable best-response strategy for player 2 which is a solution to the dynamic program described above. Therefore, we have
Claim 1
For any strategy \({\tilde{\chi }}^1 \in \tilde{{\mathcal {H}}}^1\) and for any realization of virtual players’ information \(c_{t},\gamma _{1:t-1}^{1:2}\), we have
where \(V_t^u\) is as defined in (26) and \({\tilde{\pi }}_t\) is the SI-CIB belief associated with the instance \(c_{t},\gamma _{1:t-1}^{1:2}\). As a consequence, we have
Proof
The proof is by backward induction. Clearly, the claim is true at time \(t = T+1\). Assume that the claim is true for all times greater than t. Then, we have
The first equality follows from the definition in (94), and the inequality after that follows from the induction hypothesis. The last inequality is a consequence of the definition of the value function \(V_t^u\). This completes the induction argument. Further, using Claim 1 and the result in (95), we have
\(\square \)
We can therefore say that
Further, for the strategy \({\tilde{\chi }}^{1*}\) defined in Definition 4, inequalities (96) and (97) hold with equality. We can prove this using an inductive argument similar to the one used to prove Claim 1. Therefore, we have
Combining (98) and (99), we have
Thus, the inequality in (99) holds with equality which leads us to the result that the strategy \({\tilde{\chi }}^{1*}\) is a min–max strategy in game \({\mathscr {G}}_e\). A similar argument can be used to show that
and that the strategy \({\tilde{\chi }}^{2*}\) defined in Definition 4 is a max–min strategy in game \({\mathscr {G}}_e\).
Rights and permissions
About this article
Cite this article
Kartik, D., Nayyar, A. Upper and Lower Values in Zero-Sum Stochastic Games with Asymmetric Information. Dyn Games Appl 11, 363–388 (2021). https://doi.org/10.1007/s13235-020-00364-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13235-020-00364-x