Abstract
We analyze a class of stochastic dynamic games among teams with asymmetric information, where members of a team share their observations internally with a delay of d. Each team is associated with a controlled Markov Chain, whose dynamics are coupled through the players’ actions. These games exhibit challenges in both theory and practice due to the presence of signaling and the increasing domain of information over time. We develop a general approach to characterize a subset of Nash equilibria where the agents can use a compressed version of their information, instead of the full information, to choose their actions. We identify two subclasses of strategies: sufficient private information-Based (SPIB) strategies, which only compress private information, and compressed information-based (CIB) strategies, which compress both common and private information. We show that SPIB-strategy-based equilibria exist and the set of payoff profiles of such equilibria is the same as that of all Nash equilibria. On the other hand, we show that CIB-strategy-based equilibria may not exist. We develop a backward inductive sequential procedure, whose solution (if it exists) provides a CIB strategy-based equilibrium. We identify some instances where we can guarantee the existence of a solution to the above procedure. Our results highlight the tension among compression of information, ability of compression-based strategies to sustain all or some of the equilibrium payoff profiles, and backward inductive sequential computation of equilibria in stochastic dynamic games with asymmetric information.
Similar content being viewed by others
Availability of Data and Material
Not applicable.
Notes
A team strategy is person-by-person optimal (PBPO) when each team member’s strategy is an optimal response given other team members’ strategy profile.
In contrast to signaling in teams, signaling in games is complicated by the fact that agents have diverging incentives.
Example of such strategy dependencies appears in Ho [20] and in Nayyar and Teneketzis [40] for team problems with non-classical information structure. Since these strategy dependencies are solely due to the problem’s information structure, they also appear in dynamic games with non-classical information structure (see [21]).
We do not restrict the strategy types of \(g^{i}\) and \({\tilde{g}}^i\) in Definition 4. In particular, each of \(g^{i}\) and \({\tilde{g}}^i\) could be a coordination strategy or a team strategy.
The \((d-1)\)-step PRPs are the same as the partial functions defined in the second structural result in Nayyar et al. [41].
The compression of private information of coordinators in our model is closely related to Tavafoghi et al.’s [53] sufficient information approach. One can show that our sufficient private information \(S_{t}^i=({\mathbf {X}}_{t-d}^i, \varvec{\varPhi }_t^i)\) satisfies the definition of sufficient private information (Definition 4) in Tavafoghi et al. [53] (hence, we choose to use the same terminology).
Since \({\mathcal {X}}_t,{\mathcal {U}}_t,{\mathcal {Y}}_t\) are finite sets, one can assume that \({\mathbf {W}}_t^Y\) also takes finite values without lost of generality.
The compression of hidden information to sufficient hidden information is similar to the shedding of irrelevant information in Mahajan [29].
We claim that the value of this conditional probability is the same for g and \({\tilde{g}}\) whenever the conditional probability is well-defined under both g and \({\tilde{g}}\). However, whether or not the the conditional probability is well-defined does depend on g. In the lemma, we always apply Claim 1 by multiplying \({\mathbb {P}}^g(x_{t-d+1:t}|\gamma _t, s_t^i, {\overline{h}}_{t}^{-i})\) with some other terms. Those terms will be 0 whenever \({\mathbb {P}}^g(x_{t-d+1:t}|\gamma _t, s_t^i, {\overline{h}}_{t}^{-i})\) is not well defined.
Note that \({\mathbb {P}}^{ g^{-i}}({\tilde{z}}_t|b_t, s_{t}^i)\) is different from \(\beta _t^i({\tilde{z}}_t|s_{t}^i)\). Since \(B_t\) is just a compression of the common information based on an predetermined update rule \(\psi \), which may or may not be consistent with the actually played strategy, \(B_t\) may not represent the true belief. \({\mathbb {P}}^{ g^{-i}}({\tilde{z}}_t|b_t, s_{t}^i)\) is the belief an agent inferred from the event \(B_t=b_t, S_t^i=s_t^i\). The agent knows that \(b_t\) might not contain the true belief, but it is useful anyway in inferring the true state. \(\beta _t^i({\tilde{z}}_t|s_{t}^i)\) is a conditional distribution computed with \(b_t\), pretending that \(b_t\) contains the true belief.
References
Amin S, Litrico X, Sastry S, Bayen AM (2013) Cyber security of water SCADA systems—Part I: analysis and experimentation of stealthy deception attacks. IEEE Trans Control Syst Technol 21(5):1963–1970. https://doi.org/10.1109/tcst.2012.2211873
Amin S, Schwartz GA, Cárdenas AA, Sastry SS (2015) Game-theoretic models of electricity theft detection in smart utility networks: providing new capabilities with advanced metering infrastructure. IEEE Control Syst Mag 35(1):66–81. https://doi.org/10.1109/mcs.2014.2364711
Anantharam V, Borkar V (2007) Common randomness and distributed control: a counterexample. Syst Control Lett 56(7–8):568–572. https://doi.org/10.1016/j.sysconle.2007.03.010
Başar T, Olsder GJ (1999) Dynamic noncooperative game theory, vol 23. SIAM, Philadelphia
Bergemann D, Välimäki J (2006) Dynamic price competition. J Econ Theory 127(1):232–263. https://doi.org/10.1016/j.jet.2005.01.002
Bhattacharya S, Başar T (2012) Multi-layer hierarchical approach to double sided jamming games among teams of mobile agents. In: 2012 IEEE 51st IEEE conference on decision and control (CDC), IEEE, pp 5774–5779. https://doi.org/10.1109/cdc.2012.6426411
Cabral L (2011) Dynamic price competition with network effects. Rev Econ Stud 78(1):83–111. https://doi.org/10.1093/restud/rdq007
Cardaliaguet P, Rainer C, Rosenberg D, Vieille N (2016) Markov games with frequent actions and incomplete information-the limit case. Math Oper Res 41(1):49–71. https://doi.org/10.1287/moor.2015.0715
Colombino M, Smith RS, Summers TH (2017) Mutually quadratically invariant information structures in two-team stochastic dynamic games. IEEE Trans Autom Control 63(7):2256–2263. https://doi.org/10.1109/tac.2017.2772020
Cooper DJ, Kagel JH (2005) Are two heads better than one? Team versus individual play in signaling games. Am Econ Rev 95(3):477–509. https://doi.org/10.1257/0002828054201431
Cox CA, Stoddard B (2018) Strategic thinking in public goods games with teams. J Public Econ 161:31–43. https://doi.org/10.1016/j.jpubeco.2018.03.007
Doganoglu T (2003) Dynamic price competition with consumption externalities. NETNOMICS Econ Res Electron Netw 5(1):43–69. https://doi.org/10.1023/A:1024994117734
Farina G, Celli A, Gatti N, Sandholm T (2018) Ex-ante coordination and collusion in zero-sum multi-player extensive-form games. In: Conference on neural information processing systems (NIPS)
Filar J, Vrieze K (2012) Competitive Markov decision processes. Springer, New York
Gensbittel F, Renault J (2015) The value of Markov chain games with incomplete information on both sides. Math Oper Res 40(4):820–841. https://doi.org/10.1287/moor.2014.0697
Gupta A, Nayyar A, Langbort C, Başar T (2014) Common information based Markov perfect equilibria for linear-Gaussian games with asymmetric information. SIAM J Control Optim 52(5):3228–3260. https://doi.org/10.1137/140953514
Gupta A, Langbort C, Başar T (2016) Dynamic games with asymmetric information and resource constrained players with applications to security of cyberphysical systems. IEEE Trans Control Netw Syst 4(1):71–81. https://doi.org/10.1109/tcns.2016.2584183
Hancock PA, Nourbakhsh I, Stewart J (2019) On the future of transportation in an era of automated and autonomous vehicles. Proc Natl Acad Sci 116(16):7684–7691. https://doi.org/10.1073/pnas.1805770115
Harbert T (2014) Radio wrestlers fight it out at the DARPA Spectrum Challenge. https://spectrum.ieee.org/telecom/wireless/radio-wrestlers-fight-it-out-at-the-darpa-spectrum-challenge
Ho YC (1980) Team decision theory and information structures. Proc IEEE 68(6):644–654. https://doi.org/10.1109/proc.1980.11718
Kartik D, Nayyar A (2020) Upper and lower values in zero-sum stochastic games with asymmetric information. Dyn Games Appl 1–26. https://doi.org/10.1007/s13235-020-00364-x
Kartik D, Nayyar A, Mitra U (2021) Common information belief based dynamic programs for stochastic zero-sum games with competing teams. arXiv preprint arXiv:2102.05838
Kaspi Y, Merhav N (2010) Structure theorem for real-time variable-rate lossy source encoders and memory-limited decoders with side information. In: 2010 IEEE international symposium on information theory (ISIT), pp 86–90. https://doi.org/10.1109/isit.2010.5513283
Kuhn H (1953) Extensive games and the problem of information. In: Contributions to the theory of games (AM-28), volume II. Princeton University Press, pp 193–216. https://doi.org/10.1515/9781400881970-012
Kumar PR, Varaiya P (1986) Stochastic systems: estimation, identification and adaptive control. Prentice-Hall, Inc, Englewood Cliffs
Li L, Shamma J (2014) LP formulation of asymmetric zero-sum stochastic games. In: 2014 53rd IEEE conference on decision and control. IEEE, pp 1930–1935. https://doi.org/10.1109/cdc.2014.7039680
Li L, Langbort C, Shamma J (2019) An LP approach for solving two-player zero-sum repeated Bayesian games. IEEE Trans Autom Control 64(9):3716–3731. https://doi.org/10.1109/tac.2018.2885644
Mahajan A (2008) Sequential decomposition of sequential dynamic teams: Applications to real-time communication and networked control systems. PhD thesis, University of Michigan, Ann Arbor
Mahajan A (2013) Optimal decentralized control of coupled subsystems with control sharing. IEEE Trans Autom Control 58(9):2377–2382. https://doi.org/10.1109/cdc.2011.6160970
Mahajan A, Teneketzis D (2009) Optimal performance of networked control systems with nonclassical information structures. SIAM J Control Optim 48(3):1377–1404. https://doi.org/10.1137/060678130
Mailath GJ, Samuelson L (2006) Repeated games and reputations: long-run relationships. Oxford University Press, Oxford
Maskin E, Tirole J (1988) A theory of dynamic oligopoly. I: Overview and quantity competition with large fixed costs. Econom J Econom Soc 549–569. https://doi.org/10.2307/1911700
Maskin E, Tirole J (1988) A theory of dynamic oligopoly. II: Price competition, kinked demand curves, and edgeworth cycles. Econom J Econom Soc. https://doi.org/10.2307/1911701
Maskin E, Tirole J (2001) Markov perfect equilibrium: I. observable actions. J Econ Theory 100(2):191–219. https://doi.org/10.1006/jeth.2000.2785
Maskin E, Tirole J (2013) Markov equilibrium. In: J. F. Mertens memorial conference. https://youtu.be/UNtLnKJzrhs
Myerson RB (2013) Game theory. Harvard University Press, Harvard
Nayyar A, Başar T (2012) Dynamic stochastic games with asymmetric information. In: 2012 IEEE 51st IEEE conference on decision and control (CDC). IEEE, pp 7145–7150. https://doi.org/10.1109/cdc.2012.6426857
Nayyar A, Teneketzis D (2011) On the structure of real-time encoding and decoding functions in a multiterminal communication system. IEEE Trans Inf Theory 57(9):6196–6214. https://doi.org/10.1109/tit.2011.2161915
Nayyar A, Teneketzis D (2011) Sequential problems in decentralized detection with communication. IEEE Trans Inf Theory 57(8):5410–5435. https://doi.org/10.1109/tit.2011.2158478
Nayyar A, Teneketzis D (2019) Common knowledge and sequential team problems. IEEE Trans Autom Control 64(12):5108–5115. https://doi.org/10.1109/tac.2019.2912536
Nayyar A, Mahajan A, Teneketzis D (2011) Optimal control strategies in delayed sharing information structures. IEEE Trans Autom Control 56(7):1606–1620. https://doi.org/10.1109/tac.2010.2089381
Nayyar A, Gupta A, Langbort C, Başar T (2013) Common information based Markov perfect equilibria for stochastic games with asymmetric information: finite games. IEEE Trans Autom Control 59(3):555–570. https://doi.org/10.1109/tac.2013.2283743
Nayyar A, Mahajan A, Teneketzis D (2013) Decentralized stochastic control with partial history sharing: a common information approach. IEEE Trans Autom Control 58(7):1644–1658. https://doi.org/10.1109/tac.2013.2239000
Ouyang Y, Tavafoghi H, Teneketzis D (2015) Dynamic oligopoly games with private Markovian dynamics. In: 2015 54th IEEE conference on decision and control (CDC). IEEE, pp 5851–5858. https://doi.org/10.1109/cdc.2015.7403139
Ouyang Y, Tavafoghi H, Teneketzis D (2016) Dynamic games with asymmetric information: common information based perfect Bayesian equilibria and sequential decomposition. IEEE Trans Autom Control 62(1):222–237. https://doi.org/10.1109/tac.2016.2544936
Renault J (2006) The value of Markov chain games with lack of information on one side. Math Oper Res 31(3):490–512. https://doi.org/10.1287/moor.1060.0199
Renault J (2012) The value of repeated games with an informed controller. Math Oper Res 37(1):154–179. https://doi.org/10.1287/moor.1110.0518
Shelar D, Amin S (2017) Security assessment of electricity distribution networks under DER node compromises. IEEE Trans Control Netw Syst 4(1):23–36. https://doi.org/10.1109/tcns.2016.2598427
Summers T, Li C, Kamgarpour M (2017) Information structure design in team decision problems. IFAC-PapersOnLine 50(1):2530–2535. https://doi.org/10.1016/j.ifacol.2017.08.067
Tavafoghi H (2017) On design and analysis of cyber-physical systems with strategic agents. PhD thesis, University of Michigan, Ann Arbor
Tavafoghi H, Ouyang Y, Teneketzis D (2016) On stochastic dynamic games with delayed sharing information structure. In: 2016 IEEE 55th conference on decision and control (CDC). IEEE, pp 7002–7009. https://doi.org/10.1109/cdc.2016.7799348
Tavafoghi H, Ouyang Y, Teneketzis D, Wellman M (2019) Game theoretic approaches to cyber security: challenges, results, and open problems. In: Jajodia S, Cybenko G, Liu P, Wang C, Wellman M (eds) Adversarial and uncertain reasoning for adaptive cyber defense: control-and game-theoretic approaches to cyber security, vol 11830. Springer, New York, pp 29–53. https://doi.org/10.1007/978-3-030-30719-6_3
Tavafoghi H, Ouyang Y, Teneketzis D (March 2022) A unified approach to dynamic decision problems with asymmetric information: non-strategic agents. IEEE Trans Autom Control. https://doi.org/10.1109/tac.2021.3060835, to appear
Teneketzis D (2006) On the structure of optimal real-time encoders and decoders in noisy communication. IEEE Trans Inf Theory 52(9):4017–4035. https://doi.org/10.1109/tit.2006.880067
Teneketzis D, Ho YC (1987) The decentralized Wald problem. Inf Comput 73(1):23–44. https://doi.org/10.1016/0890-5401(87)90038-1
Teneketzis D, Varaiya P (1984) The decentralized quickest detection problem. IEEE Trans Autom Control 29(7):641–644. https://doi.org/10.1109/tac.1984.1103601
Tenney RR, Sandell NR (1981) Detection with distributed sensors. IEEE Trans Aerosp Electron Syst AES 17(4):501–510. https://doi.org/10.1109/taes.1981.309178
Tsitsiklis JN (1993) Decentralized detection. Adv Stat Signal Process 297–344
Varaiya P, Walrand J (1983) Causal coding and control for Markov chains. Syst Control Lett 3(4):189–192. https://doi.org/10.1016/0167-6911(83)90012-9
Vasal D, Sinha A, Anastasopoulos A (2019) A systematic process for evaluating structured perfect Bayesian equilibria in dynamic games with asymmetric information. IEEE Trans Autom Control 64(1):81–96. https://doi.org/10.1109/tac.2018.2809863
Veeravalli VV (2001) Decentralized quickest change detection. IEEE Trans Inf Theory 47(4):1657–1665. https://doi.org/10.1109/18.923755
Veeravalli VV, Başar T, Poor HV (1993) Decentralized sequential detection with a fusion center performing the sequential test. IEEE Trans Inf Theory 39(2):433–442. https://doi.org/10.1109/18.212274
Veeravalli VV, Başar T, Poor HV (1994) Decentralized sequential detection with sensors performing sequential tests. Math Control Signals Syst 7(4):292–305. https://doi.org/10.1007/bf01211521
Walrand J, Varaiya P (1983) Optimal causal coding-decoding problems. IEEE Trans Inf Theory 29(6):814–820. https://doi.org/10.1109/tit.1983.1056760
Witsenhausen HS (1973) A standard form for sequential stochastic control. Math Syst Theory 7(1):5–11. https://doi.org/10.1007/bf01824800
Witsenhausen HS (1979) On the structure of real-time source coders. Bell Syst Tech J 58(6):1437–1451. https://doi.org/10.1002/j.1538-7305.1979.tb02263.x
Yoshikawa T (1978) Decomposition of dynamic team decision problems. IEEE Trans Autom Control 23(4):627–632. https://doi.org/10.1109/tac.1978.1101791
Zhang Y, An B (2020) Computing team-maxmin equilibria in zero-sum multiplayer extensive-form games. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, no. 02, pp 2318–2325. https://doi.org/10.1609/aaai.v34i02.5610
Zheng J, Castañón DA (2013) Decomposition techniques for Markov zero-sum games with nested information. In: 2013 52nd IEEE conference on decision and control. IEEE, pp 574–581. https://doi.org/10.1109/cdc.2013.6759943
Zhu Q, Başar T (2015) Game-theoretic methods for robustness, security, and resilience of cyberphysical control systems: games-in-games principle for optimal cross-layer resilient control systems. IEEE Control Syst Mag 35(1):46–65. https://doi.org/10.1109/mcs.2014.2364710
Funding
This work is supported by National Science Foundation (NSF) Grant No. ECCS 1750041, ECCS 2038416, ECCS 1608361, CCF 2008130, Army Research Office (ARO) Award No. W911NF-17-1-0232, and Michigan Institute for Data Science (MIDAS) Sponsorship Funds by General Dynamics.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Code Availability
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Multi-agent Dynamic Decision Making and Learning” edited by Konstantin Avrachenkov, Vivek S. Borkar and U. Jayakrishnan Nair.
Appendices
Two Examples
1.1 A Motivating Example for Sect. 2
The following example illustrates the importance of considering jointly randomized mixed strategies when we study games among teams. Similar to the role mixed strategies play in games among individual players, the space of jointly randomized mixed strategies contains the minimum richness of strategies that ensures an equilibrium exists in games among teams. In particular, if we restrict the teams to use independently randomized strategies, i.e., type 1 and type 2 strategies described in Sect. 2.2, then an equilibrium may not exist. This example is similar to the examples in Farina et al. [13], Zhang and An [68], Anantharam and Borkar [3] in spirit, despite the fact that in our example the players in the same team have asymmetric information.
Example 2
(Guessing game) Consider a two-stage game (i.e., \({\mathcal {T}}=\{1, 2\}\)) of two teams \({\mathcal {I}} = \{A, B\}\), each consisting of two players. The set of all agents is given by \({\mathcal {N}} = \{(A, 1), (A, 2), (B,1), (B, 2) \}\). Let \({\mathbf {X}}_t^A=(X_t^{A,1}, X_t^{A,2})\in \{-1, 1\}^2\) and Team B does not have a state, i.e., \({\mathbf {X}}_t^B = \varnothing \). Assume \({\mathcal {U}}_t^{i, j} = \{-1, 1\}\) for \(t=1, i=A\) or \(t=2, i=B\) and \({\mathcal {U}}_t^{i, j}=\varnothing \) otherwise, i.e., Team A moves at time 1, and Team B moves at time 2. At time 1, \(X_1^{A, 1}\) and \(X_1^{A, 2}\) are independently uniformly distributed on \(\{-1, 1\}\). Team A’s system is assumed to be static, i.e., \({\mathbf {X}}_{2}^{A} = {\mathbf {X}}_{1}^{A}\).
The rewards of Team A are given by
and the rewards of Team B are given by
Assume that there are no additional common observations other than past actions, i.e., \({\mathbf {Y}}_t = \varnothing \). We set the delay \(d=2\), i.e., agent (A, 1) does not know \(X_t^{A,2}\) throughout the game and a similar property is true for agent (A, 2). In this game, the task of Team A is to choose actions according to their states at \(t=1\) in order to earn a positive reward, while not revealing too much information through their actions to Team B. The task of Team B is to guess Team A’s state.
It can be verified (see Appendix A.2.1 for a detailed derivation) that if we restrict both teams to use independently randomized strategies (including deterministic strategies), then there exists no equilibria. However, there does exist an equilibrium where Team A randomizes in a correlated manner, specifically, the following strategy profile \(\sigma ^*\): At \(t=1\), Team A plays \(\gamma ^A=(\gamma ^{A, 1}, \gamma ^{A, 2})\) with probability 1/2, and \({\tilde{\gamma }}^A=({\tilde{\gamma }}^{A, 1}, {\tilde{\gamma }}^{A, 2})\) with probability 1/2, where
and at \(t=2\), the two members of Team B choose independent and uniformly distributed actions on \(\{-1, 1\}\), independent of their action and observation history. In \(\sigma ^*\), each agent (A, j) chooses a uniform random action irrespective of their states. It is important to have (A, 1) and (A, 2) choose these actions in a correlated way to ensure that they obtain the full instantaneous reward while not revealing any information.
1.2 An Illustrative Example for Sect. 3
The following example illustrates how to visualize games among teams from the coordinators’ viewpoint.
Example 3
Consider a variant of the Guessing Game in Example 2 with the same system model and information structure but different action sets and reward functions. In the new game, Team A moves at both \(t=1\) and \(t=2\), with \({\mathcal {U}}_t^{A, j} = \{-1, 1\}\) for \(t=1,2\) and \(j=1,2\). Team B moves only at time \(t=2\) as in the original game. The new reward functions are given by
In this example, Team A’s task is to guess its own state after a round of publicly observable communication while not leaking information to Team B.
A Team Nash equilibrium \((\sigma ^{*A}, \sigma ^{*B})\) of this game is as follows: Team A chooses one of the four pure strategy profiles listed below with equal probability:
while Team B choose \({\mathbf {U}}_2^{B}\) uniformly at random independent of \({\mathbf {U}}_1\). In words, from Team B’s point of view, Team A chooses \({\mathbf {U}}_1^{A}\) to be a uniform random vector independent of \({\mathbf {X}}_1^A\). However, the randomization is done in a coordinated manner: Before the game starts, both members of team A randomly draw a card from two cards, where one card says “lie” and the other says “tell the truth.” Both players then tell each other what card they have drawn before the game starts. At time \(t=1\), both players in Team A play the strategy indicated by their cards. At time \(t=2\), Team A can then perfectly recover \({\mathbf {X}}_1^A\) from \({\mathbf {U}}_1^A\) and the knowledge about the strategy being used at \(t=1\).
Now, we describe Team A’s equilibrium strategy by the equivalent coordinator A’s behavioral strategy. Use \(\mathbf {ng}\) to denote the prescription that maps \(-1\) to 1 and 1 to \(-1\). Use \(\mathbf {id}\) to denote the identity map prescription, i.e., the prescription that maps \(-1\) to \(-1\) and 1 to 1. Use \(\mathbf {cp}_{b}\) to denote the constant prescription that always instruct individuals to play \(b\in \{-1, 1\}\). The mixed strategy profile \(\sigma ^{*A}\) is equivalent to the following behavioral coordination strategy: At time \(t=1\), \(g_1^A(\varnothing )\in \varDelta ({\mathcal {A}}_1^{A, 1} \times {\mathcal {A}}_1^{A, 2} )\) satisfies
At time \(t=2\), \(g_2^A: {\mathcal {U}}_1^{A, 1}\times {\mathcal {U}}_1^{A, 2}\times {\mathcal {A}}_1^{A, 1}\times {\mathcal {A}}_1^{A, 2} \mapsto \varDelta ({\mathcal {A}}_2^{A, 1} \times {\mathcal {A}}_2^{A, 2})\) is a deterministic strategy that satisfies
where \(\textsc {dm}: {\mathcal {A}}_2^{A, 1}\times {\mathcal {A}}_2^{A, 2} \mapsto \varDelta ({\mathcal {A}}_2^{A, 1}\times {\mathcal {A}}_2^{A, 2})\) represents the delta measure. In words, the coordinator of Team A randomly chooses one of all four possible prescription profiles at time \(t=1\). At time \(t=2\), based on the observed action and the prescriptions chosen before, the coordinator of Team A directly assign actions to agents to instruct them to recover the state from the actions at \(t=1\). Note that the behavioral coordination strategy at \(t=2\) depends explicitly on the past prescription \(\varvec{\varGamma }_1^A\) in addition to the realization of past actions. This is because the coordinator needs to remember not only the agents’ actions, but also the rationale behind those actions in order to interpret the signals sent through the actions.
1.2.1 Proof of Claim in Example 2
Define two pure strategies \(\mu ^A\) and \({\tilde{\mu }}^A\) of Team A as follows:
Now, assume that Team A and Team B are restricted to use independently randomized strategies (type 2 strategies defined in Sect. 2.2). We will show in two steps that there exist no equilibria within this class of strategies.
Step 1: If Team A and Team B’s type 2 strategies form an equilibrium, then Team A is playing either \(\mu ^A\) or \({\tilde{\mu }}^A\).
Let \(p_j(x)\) denote the probability that player (A, j) plays \(U_1^{A, j} = -x\) given \(X_1^{A, j} = x\). Define
i.e., the ex-ante probability that player (A, j) “lies.”
Then, we have
Under an equilibrium, Team B will optimally respond to Team A strategy’s described through \((p_1, p_2)\). We can find a lower bound of Team B’s reward by fixing a strategy: Consider the “random guess” strategy of Team B, where each of (B, j) (for \(j=1,2\)) chooses \(U_2^{B, j}\) uniformly at random irrespective of \({\mathbf {U}}_1^A\) and independent of the other team member. Team B can thus guarantee an expected reward of \(\frac{1}{2} + \frac{1}{2} = 1\) given any strategy of Team A. Since \(r_2^A({\mathbf {X}}_2, {\mathbf {U}}_2) = -r_2^B({\mathbf {X}}_2, {\mathbf {U}}_2)\), we conclude that Team A’s total reward in an equilibrium is upper bounded by
Let \(\sigma ^B\) denote the strategy of Team B. Let \(\pi _j(u^1, u^2)\) denote the probability that player (B, j) plays \(U_2^{B, j} = -u^j\) given \(U_1^{A, 1} = u^{1}, U_1^{A, 2} = u^{2}\) (i.e., the probability that player (B, j) believes that (A, j) was “lying” hence guesses the opposite of what was signaled). If Team A plays \(\mu ^A\), then the total reward of Team A is
If Team A plays \({\tilde{\mu }}^A\), then the total reward of Team A is
Observe that \(J^A(\mu ^A, \sigma ^B) + J^A({\tilde{\mu }}^A, \sigma ^B) = 0\). Hence, for any \(\sigma ^B\), either \(J^A(\mu ^A, \sigma ^B)\ge 0\) or \(J^A({\tilde{\mu }}^A, \sigma ^B)\ge 0\). In particular, we can conclude that Team A’s total reward is at least 0 in any equilibrium.
We have established both an upper bound and lower bound for Team A’s total reward in an equilibrium. Hence, we must have
which implies \(q_1=0, q_2=1\) or \(q_1=1, q_2=0\). The former case corresponds to Team A playing the pure strategy \(\mu ^A\), and the latter to playing \({\tilde{\mu }}^A\).
Step 2: There does not exist equilibria where Team A plays \(\mu ^A\) or \({\tilde{\mu }}^A\).
Suppose that Team A plays \(\mu ^A\). Then, the only best response of Team B is to play \(U_2^{B, 1} = U_1^{A, 1}, U_2^{B, 2} = -U_1^{A, 2}\). Then, Team A’s total reward is \(J^A(\mu ^A, \sigma ^B) = 1 - 1 - 1 = -1\). If Team A deviate to \({\tilde{\mu }}^A\), then Team A can obtain a total reward of \(+1\) (recall that \(J^A(\mu ^A, \sigma ^B) + J^A({\tilde{\mu }}^A, \sigma ^B) = 0\) for any \(\sigma ^B\)). Hence, Team A does not play \(\mu ^A\) at equilibrium.
Similar arguments apply to \({\tilde{\mu }}^A\), which completes the proof.
Proof of Lemma 1
Given a pure strategy profile \(\mu ^i\) of team i, define a pure coordination strategy profile \(\nu ^i\) by
We first prove that for every pure strategy profile \(\mu ^i\), there exist a payoff-equivalent coordination strategy profile \(\nu ^i\) by coupling two systems. In one of the systems, we assume that team i uses a pure strategy. In the other system, we assume that team/coordinator i uses the corresponding pure coordination strategies. We assume that all teams other than i use the same pure strategy profile \(\mu ^{-i} = (\mu ^k)_{k\in {\mathcal {I}}\backslash \{i\} }\) in both systems. The realizations of primitive random variables (i.e., \((X_1^i)_{i\in {\mathcal {I}}}, (W_t^{i, X}, W_t^{i, Y})_{i\in {\mathcal {I}}, t\in {\mathcal {T}}}\)) are assumed to be the same for two systems. We proceed to show that the realizations of all system variables (i.e., \(({\mathbf {X}}_t, {\mathbf {Y}}_t, {\mathbf {U}}_t)_{t\in {\mathcal {T}}}\)) will be the same for both systems. As a result, the expected payoffs are the same for both systems.
We prove that the realizations of \(({\mathbf {X}}_t, {\mathbf {Y}}_t, {\mathbf {U}}_t)_{t\in {\mathcal {T}}}\) are the same by induction on time t.
Induction Base: At \(t=1\), the realizations of \({\mathbf {X}}_1\) are the same for two systems by assumption. For the first system, we have
and for the second system we have
which means that \(U_1^{i, j} = \mu _1^i(X_1^{i, j})\) also holds in the second system for all \((i, j)\in {\mathcal {N}}_i\).
It is clear that \({\mathbf {U}}_1^{-i}\) are the same for both systems since in both systems,
We conclude that \({\mathbf {U}}_1\) are the same for both systems. Since \((W_1^{k, Y})_{k\in {\mathcal {I}}}\) are the same for both systems, \(Y_1^k=\ell _1^i(X_1^k, {\mathbf {U}}_1, W_1^{k, Y}), k\in {\mathcal {I}}\) are the same for both systems.
Induction Step: Suppose that \({\mathbf {X}}_s, {\mathbf {Y}}_s, {\mathbf {U}}_s\) are the same for both systems for all \(s<t\). Now, we prove it for t.
First, since the realizations of \({\mathbf {X}}_{t-1}^i, {\mathbf {U}}_{t-1}, W_{t-1}^{i, X}\) are the same for both systems and
\({\mathbf {X}}_t\) are the same for both systems.
Consider the actions taken by the members of team i at time t. For the first system,
In the second system,
which means that
The actions taken by the members of other teams at time t are
for both systems.
We conclude that \({\mathbf {U}}_t\) has the same realization for two systems since \((H_t^k, X_{t-d+1:t}^{k, j})_{k\in {\mathcal {I}}}\) have the same realization by the induction hypothesis and the argument above. Since \((W_t^{i, Y})_{i\in {\mathcal {I}}}\) are the same for both systems, \(Y_t^k=\ell _t^k(X_t^k, {\mathbf {U}}_t, W_t^{k, Y}), k\in {\mathcal {I}}\) are same for both systems.
Therefore, we have established the induction step, proving that \(\mu ^i\) and \(\nu ^i\) generate the same realization of \(({\mathbf {X}}_t, {\mathbf {Y}}_t, {\mathbf {U}}_t)_{t\in {\mathcal {T}}}\) under the same realization of the primitive random variables. Therefore, \(\nu ^i\) is a payoff-equivalent pure coordination strategy profile of \(\mu ^i\).
To complete the other half of the proof, for each given coordination strategy \(\nu ^i\) of team/coordinator i we define a pure team strategy \(\mu ^i=(\mu _t^{i, j})_{(i, j)\in {\mathcal {N}}_i, t\in {\mathcal {T}}}\) through
where \(\gamma _{t}^i=(\gamma _{t}^{i, j})_{(i, j) \in {\mathcal {N}}_i}\) is recursively defined by \(\nu _{1:t}^i\) and \(h_t^i\) through
Then using an argument similar to the one for the proof of the first half we can show that \(\mu ^i\) is payoff-equivalent to \(\nu ^i\).
Proof of Lemma 3
Induction on time t.
Induction Base: At \(t=1\), we have \({\mathbf {X}}_{1}^k\) to be independent for different k because of the assumption on primitive random variables. Furthermore, since \(H_1^k\) is a deterministic random vector (see Remark 2) and the randomization of different coordinators are independent, we conclude that \(({\mathbf {X}}_{1}^k, \varvec{\varGamma }_{1}^k)\) are mutually independent for different k. The distribution of \(({\mathbf {X}}_{1}^k, \varvec{\varGamma }_{1}^k)\) depends on g only through \(g^k\).
Induction Step: Suppose that \(({\mathbf {X}}_{1:t}^k, \varvec{\varGamma }_{1:t}^k)\) are conditionally independent given \(H_t^0\) and \({\mathbb {P}}^g({\mathbf {X}}_{1:t}^k, \varvec{\varGamma }_{1:t}^k|H_t^0)\) depends on g only through \(g^k\). Now, we have
We then claim that
where for each \(k\in {\mathcal {I}}\), \(F_t^k\) is a function that depends only on \(g^k\).
To establish the claim, we note that
where in the third step we have used the induction hypothesis.
Given the claim, we have
and then
where \(G_t^k\) is given by
One can check that \(G_t^k\) depends on g only through \(g^k\) and
therefore
Hence, we establish the induction step.
Proof of Lemma 4
Assume that \({\overline{h}}_t^i\in \overline{{\mathcal {H}}}_t^i\) is admissible under g. From Lemma 3, we know that \({\mathbb {P}}^{g} (x_{1:t}^i, \gamma _{1:t}^i|h_t^0)\) does not depend on \(g^{-i}\). As a conditional distribution obtained from \({\mathbb {P}}^{g} (x_{1:t}^i, \gamma _{1:t}^i|h_t^0)\), \({\mathbb {P}}^{g}(x_{t-d+1:t}^i|{\overline{h}}_t^i)\) does not depend on \(g^{-i}\) either.
Therefore, we can compute the belief of coordinator i by replacing \(g^{-i}\) with \({\hat{g}}^{-i}\), which is an open-loop strategy profile that always generates the actions \(u_{1:t-1}^{-i}\).
Note that we always have \({\mathbb {P}}^{g^i, {\hat{g}}^{-i}}({\overline{h}}_t^i) > 0\) for all \({\overline{h}}_t^i\) admissible under g.
Furthermore, we can also introduce additional random variables into the condition that are conditionally independent according to Lemma 3, i.e.,
where \(x_{t-d:t}^{-i}\in {\mathcal {X}}_{t-d:t}^{-i}\) is such that \({\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{t-d:t}^{-i}|{\overline{h}}_t^i) > 0\).
Let \(\tau = t-d+1\). By Bayes’ rule
where
We have
The first three terms in the above product are
respectively.
The last term satisfies
Substituting (11) - (12) into (10), we obtain
where
Therefore, we have proved that
where \(P_t^i\) is independent of g.
Proof of Lemma 5
For notational convenience, define
Claim 1
\({\mathbb {P}}^g(x_{t-d+1:t}|\gamma _t, s_t^i, {\overline{h}}_{t}^{-i})\) does not depend on g.Footnote 9
Claim 2
\({\mathbb {P}}^g(\gamma _t, s_t^i, {\overline{h}}_{t}^{-i}) = {\mathbb {P}}^{\rho ^i, g^{-i}}(\gamma _t, s_t^i, {\overline{h}}_{t}^{-i})\) for all \(\gamma _t\in {\mathcal {A}}_t, s_t^i\in {\mathcal {S}}_t^i, {\overline{h}}_{t}^{-i}\in {\mathcal {H}}_t^{-i}\).
Given Claims 1 and 2, we conclude that
for all \(x_{t-d+1:t}\in {\mathcal {X}}_{t-d+1:t}, \gamma _t\in {\mathcal {A}}_t, s_t^i\in {\mathcal {S}}_t^i, {\overline{h}}_{t}^{-i}\in {\mathcal {H}}_t^{-i}\).
Marginalizing (13), we obtain
Since \(U_t^{k, j} = \varGamma _t^{k, j}(X_{t-d+1:t}^{k, j})\) for all \((k, j)\in {\mathcal {N}}\), we can write \(r_t^k({\mathbf {X}}_t, {\mathbf {U}}_t) = {\tilde{r}}_t^k({\mathbf {X}}_{t-d+1:t}, \varvec{\varGamma }_t)\) for some function \({\tilde{r}}_t^k\) that does not depend on the strategy profile. Then, using linearity of expectation and (14) we obtain
for all behavioral coordination strategy profile \(g^{-i}\). Hence, \(g^i\) and \(\rho ^i\) are payoff-equivalent.
Proof of Claim 1
For notational convenience, define
Consider \({\mathbb {P}}^g(x_{t-d+1:t}|\gamma _t, {\overline{h}}_t)\) first. Since \(\varvec{\varGamma }_t\) is a randomized prescription generated based on \({\overline{H}}_t\) which enters the system after \({\mathbf {X}}_{t-d+1:t}\) are realized, we have
Due to Lemma 3, we have
By Lemma 4,
where \(P_t^k\) is a function that does not depend on g.
Combining (15), (16), (17), we have
Since \((S_t^i, {\overline{H}}_t^{-i})\) is a function of \({\overline{H}}_t\), by Smoothing Property of Conditional Probability we conclude that
where the right-hand side does not depend on g. \(\square \)
Proof of Claim 2
Proof by induction on t.
Induction Base: The claim is true at \(t=1\) since \(\rho _1^i\) and \(g_1^i\) are the same strategies.
Induction Step: Suppose that the claim is true for time \(t-1\). Prove the result for t.
First,
Using Lemma 3, we have
Therefore,
for all \((h_{t}^0, s_{t}^i)\) admissible under g. Notice that \({\mathbb {P}}^g(h_{t}^0, s_{t}^i) = 0\) implies that \({\mathbb {P}}^g(s_{t}^i, {\overline{h}}_{t}^{-i}) = 0\), hence we conclude that
for all \(\gamma _{t}\in {\mathcal {A}}_t, s_{t}^i\in {\mathcal {S}}_t^i, {\overline{h}}_t^{-i}\in {\mathcal {H}}_t^{-i}\).
Similarly,
for all \(\gamma _{t}\in {\mathcal {A}}_t, s_{t}^i\in {\mathcal {S}}_t^i, {\overline{h}}_t^{-i}\in {\mathcal {H}}_t^{-i}\). Hence, it suffices to show that \({\mathbb {P}}^g(s_{t}^i, {\overline{h}}_{t}^{-i}) = {\mathbb {P}}^{\rho ^i, g^{-i}}(s_{t}^i, {\overline{h}}_{t}^{-i})\).
Given the induction hypothesis, it suffices to show that
for all \((\gamma _{t-1}, s_{t-1}^i, {\overline{h}}_{t-1}^{-i})\) admissible under g (or admissible under \((\rho ^i, g^{-i})\), which is an equivalent condition because of the induction hypothesis).
Given that
it follows that \((S_t^i, {\overline{H}}_t^{-i})\) is a strategy-independent function of \((\varvec{\varGamma }_{t-1}, S_{t-1}^i, {\overline{H}}_{t-1}^{-i}, {\mathbf {X}}_{t-d:t-1},{\mathbf {W}}_{t-1}^Y)\). Since \({\mathbf {W}}_{t-1}^Y=(W_{t-1}^{k, Y})_{k\in {\mathcal {I}}}\) is a primitive random vector independent of \((\varvec{\varGamma }_{t-1}, S_{t-1}^i, {\overline{H}}_{t-1}^{-i})\), it suffices to show that
We know that (18) is true due to Claim 1. Hence, we established the induction step. \(\square \)
Remark 13
In general, a behavioral coordination strategy profile yields different distributions on the trajectory of the system in comparison with the distributions generated from its associated SPIB strategy profile. It is the equivalence of marginal distributions that allows us to establish the equivalence of payoffs using linearity of expectation. This (payoff) equivalence between behavioral coordination strategies and their associated SPIB strategy profiles is different from the equivalence of behavioral coordination strategies with mixed team strategies where not only are the payoffs equivalent, but distributions on the trajectory of the system are also the same.
Proof of Lemma 6
We will prove a stronger result.
Lemma 9
Let \((\lambda ^{*k}, \psi ^*)\) be a CIB strategy such that \(\psi ^{*, k}\) is consistent with \(\lambda ^{*k}\). Let \(g^{*k}\) be the behavioral strategy profile generated from \((\lambda ^{*k}, \psi ^*)\). Let \(\pi _t^k\) represent the belief on \(S_t^k\) generated by \(\psi ^*\) at time t based on \(h_t^0\). Let \(t< \tau \). Consider a fixed \(h_{\tau }^0\in {\mathcal {H}}_{\tau }^0\) and some \({\tilde{g}}_{1:t-1}^k\) (not necessarily equal to \(g_{1:t-1}^{*k}\)). Assume that \(h_{\tau }^0\) is admissible under \(({\tilde{g}}_{1:t-1}^{k}, g_{t:\tau -1}^{*k})\). Suppose that
Then,
The assertion of Lemma 6 follows from Lemma 9 and the fact that (19) is true for \(t=1\).
Proof of Lemma 9
We only need to prove the result for \(\tau = t + 1\).
Since \(h_{t+1}^0\) is admissible under \(({\tilde{g}}_{1:t-1}^k, g_{t}^{*k})\), we have
where \({\hat{g}}_{1:t}^{-k}\) is the open-loop strategy where all coordinators except k choose prescriptions that generate the actions \(u_{1:t}^{-k}\).
From Lemma 3, we know that \({\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}, g^{-k}}(s_{t+1}^k|h_{t+1}^0)\) is independent of \(g^{-k}\). Therefore,
and the denominator of (21) is nonzero due to (20).
We have
where \(b_t=(\varvec{\pi }_t, y_{t-d+1:t-1}, u_{t-d:t-1})\) and \(\varvec{\pi }_t = (\pi _t^l)_{l\in {\mathcal {I}}}\) is generated from \(\psi ^*\).
Recall that we assume
Using (21), (22), and (23), we obtain
where
Therefore by the definition of consistency of \(\psi ^{*, k}\) with respect to \(\lambda ^{*k}\), we conclude that
Now, consider \({\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}({\tilde{x}}_{t-d+2:t+1}^k, s_{t+1}^k | h_{t+1}^0)\).
-
If \({\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}(s_{t+1}^k | h_{t+1}^0) = 0\), then we have \(\pi _{t+1}^k(s_{t+1}^k) = 0\) and
$$\begin{aligned}&\quad ~{\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}({\tilde{x}}_{t-d+2:t+1}^k, s_{t+1}^k | h_{t+1}^0) = 0. \end{aligned}$$ -
If \({\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}(s_{t+1}^k | h_{t+1}^0) > 0\), then
$$\begin{aligned}&{\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}({\tilde{x}}_{t-d+2:t+1}^k, s_{t+1}^k | h_{t+1}^0) \\ =&\,{\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}({\tilde{x}}_{t-d+1:t}^k | h_{t+1}^0, s_{t+1}^k) \pi _{t+1}^k(s_{t+1}^k). \end{aligned}$$We have shown in Lemma 4 that
$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}({\tilde{x}}_{t-d+2:t+1}^k | {\overline{h}}_{t+1}^k)\\&\quad =P_{t+1}^k({\tilde{x}}_{t-d+2:t+1}^k| y_{t-d+2:t}^k, u_{t-d+1:t}, s_{t+1}^k) \end{aligned}$$and \((h_{t+1}^0, s_{t+1}^k)\) is a function of \({\overline{h}}_{t+1}^k\). By the law of iterated expectation, we have
$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}, {\hat{g}}_{1:t}^{-k}}({\tilde{x}}_{t-d+2:t+1}^k | h_{t+1}^0, s_{t+1}^k)\\&\quad =P_{t+1}^k({\tilde{x}}_{t-d+2:t+1}^k| y_{t-d+2:t}^k, u_{t-d+1:t}, s_{t+1}^k). \end{aligned}$$
We conclude that
for all \(s_{t+1}^k\in {\mathcal {S}}_{t+1}^k\) and all \(x_{t-d+2:t+1}^k\in {\mathcal {X}}_{t-d+2:t+1}^k\). \(\square \)
Proof of Lemma 7
Let \(g^{-i}\) denote the behavioral strategy profile of all coordinators other than i generated from the CIB strategy profile \((\lambda ^k, \psi ^k)_{k\in {\mathcal {I}}\backslash \{ i\} }\). Let \(({\overline{h}}_t^i, \gamma _{t}^i)\) be admissible under \(g^{-i}\).
Let \({\tilde{g}}^i\) denote coordinator i’s behavioral coordination strategy. Because of Lemma 3, we have
We know that \(\varvec{\varGamma }_{t}^i\) and \({\mathbf {X}}_{t-d+1:t}^i\) are conditionally independent given \({\overline{H}}_t^i\) since \(\varvec{\varGamma }_{t}^i\) is chosen as a randomized function of \({\overline{H}}_t^i\) at a time when \({\mathbf {X}}_{t-d+1:t}^i\) are already realized. Therefore,
where \(s_t^i=(x_{t-d}^i, \phi _t^i)\) and \(P_t^i\) is the belief function defined in Eq. (1).
We conclude that
Since all coordinators other than coordinator i are using the same belief generation systems, we have \(B_t^j=B_t^k\) for \(j, k\ne i\). Denote \(B_t=B_t^k\) for all \(k\in {\mathcal {I}}\backslash \{i \}\). Let \(b_t=\left( \left( \pi _t^{*, l}\right) _{l\in {\mathcal {I}}}, y_{t-d+1:t-1}, u_{t-d:t-1}\right) \) be a realization of \(B_t\). Also define \(\psi ^*=\psi ^k\) for all \(k\ne i\).
Consider \(k\ne i\). Coordinator k’s strategy \(g^{k}\) is a self-consistent CIB strategy. We also have \(h_t^0\) admissible under \(g^{k}\) since \(({\overline{h}}_t^i, \gamma _{t}^i)\) is admissible under \(g^{-i}\). Hence, applying Lemma 6 we have
Hence, the second term of the right hand side of (24) satisfies
where \(P_t^k\) is the belief function defined in Eq. (1).
Recall that \(b_t=\left( \left( \pi _t^{*, l}\right) _{l\in {\mathcal {I}}}, y_{t-d+1:t-1}, u_{t-d:t-1}\right) \). From (24) and (25), we conclude that
for some function \(F_t^i\) for all \(({\overline{h}}_t^i, \gamma _t^i)\) admissible under \(g^{-i}\).
Consider the total reward of coordinator i. By the law of iterated expectation, we can write
For \(({\overline{h}}_t^i, \gamma _{t}^i)\) admissible under \(g^{-i}\),
for some function \({\overline{r}}_t^{i}\) that depends on \(g^{-i}\) (specifically, on \(\lambda _t^{-i}\)) but not on \({\tilde{g}}^i\).
We claim that \((B_t, S_{t}^i)\) is a controlled Markov process controlled by coordinator i’s prescriptions, given that other coordinators are using the strategy profile \(g^{-i}\). Let \({\tilde{g}}^i\) denote an arbitrary strategy for coordinator i (not necessarily a CIB strategy). We need to prove that
for some function \(\varXi _t^i\) independent of \({\tilde{g}}^i\).
We know that
Hence, \((B_{t+1}, S_{t}^i)\) is a fixed function of \((B_t, S_t^i, {\mathbf {X}}_{t-d+1:t}, \varvec{\varGamma }_t, {\mathbf {W}}_t^Y)\), where \({\mathbf {W}}_t^Y\) is a primitive random vector independent of \((B_{1:t}, S_{1:t}^i, \varvec{\varGamma }_{1:t}^i, {\mathbf {X}}_{t-d+1:t})\). Therefore, it suffices to prove that
for some function \(\varXi _t^i\) independent of \({\tilde{g}}^i\).
\((B_{1:t}, S_{1:t}^i, \varvec{\varGamma }_{1:t}^i)\) is a function of \(({\overline{H}}_t^i, \varvec{\varGamma }_{t}^i)\). Therefore, by applying smoothing property of conditional expectations to both sides of (26) we obtain
where we know that \(F_t^i\), as defined in (26), is independent of \({\tilde{g}}^i\).
We conclude that coordinator i faces a Markov Decision Problem where the state process is \((B_t, S_{t}^i)\), the control action is \(\varvec{\varGamma }_t^i\), and the total reward is
By standard MDP theory, coordinator i can form a best response by choosing \(\varvec{\varGamma }_t^i\) as a function of \((B_t, S_{t}^i)\).
Proof of Theorem 2
Let \((\lambda ^*, \psi ^*)\) be a pair that solves the dynamic program defined in the statement of the theorem. Let \(g^{*k}\) denote the behavioral coordination strategy corresponding to \((\lambda ^{*k}, \psi ^*)\) for \(k\in {\mathcal {I}}\). We only need to show the following: Suppose that the coordinators other than coordinator i play \(g^{*-i}\), then \(g^{*i}\) is a best response to \(g^{*-i}\).
Let \(h_t^0\in {\mathcal {H}}_t^0\) be admissible under \(g^{*-i}\). Then,
for all \(k\ne i\) by Lemma 6, where \(\pi _t^{k}\) is the belief generated by \(\psi ^*\) when \(h_t^0\) occurs.
By Lemma 4, we also have
Combining (27) and (28), the belief for coordinator i defined in the stage game according to Definition 15 satisfies
for all \((h_t^0, s_{t}^i)\) admissible under \(g^{*-i}\), i.e., the belief represents a true conditional distribution. Since \(\beta _t^i(\cdot |s_{t}^i)\) is a fixed function of \((b_t, s_{t}^i)\), by applying smoothing property on both sides of the above equation we can obtain
for all \((b_t, s_{t}^i)\) admissible under \(g^{*-i}\).Footnote 10
Then, the interim expected utility considered in the definition of IBNE correspondences (Definition 16) can be written as
for all \((b_t, s_{t}^i)\) admissible under \(g^{*-i}\).
The condition of Theorem 2 then implies
for all \((b_t, s_{t}^i)\) admissible under \(g^{*-i}\).
Recall that in the proof of Lemma 7, we have already proved that fixing \((\lambda ^{*-i}, \psi ^*)\), \((B_t, S_{t}^i)\) is a controlled Markov process controlled by \(\varvec{\varGamma }_{t}^i\). Hence, (29) and (30) show that \(\lambda _t^{*i}\) is a dynamic programming solution of the MDP with instantaneous reward
Therefore, \(\lambda ^{*i}\) maximizes
over all \(\lambda ^i=(\lambda _t^i)_{t\in {\mathcal {T}}}, \lambda _t^i: {\mathcal {B}}_t\times {\mathcal {S}}_t^i \mapsto \varDelta ({\mathcal {A}}_t^i)\).
Notice that for any \(\lambda ^i\), if \(g^i\) is the behavioral coordination strategy corresponding to the CIB strategy \((\lambda ^i, \psi _t^*)\), then by Law of Iterated Expectation
Hence, we know that \(g^{*i}\) maximizes
over all \(g^i\) generated from a CIB strategy with the belief generation system \(\psi ^*\).
By the closedness property of CIB strategies (Lemma 7), we conclude that \(g^{*i}\) is a best response to \(g^{*-i}\) over all behavioral coordination strategies of coordinator i, proving the result.
Proof of Proposition 1
We will characterize all the Bayes–Nash equilibria of Example 1 in terms of individual players’ behavioral strategies. Then, we will show that none of the BNE correspond to a CIB-CNE.
Let \(p=(p_1, p_2)\in [0, 1]^2\) describe Alice’s behavioral strategy: \(p_1\) is the probability that Alice plays \(U_1^A=-1\) given \(X_1^A=-1\); \(p_2\) is the probability that Alice plays \(U_1^A=+1\) given \(X_1^A=+1\). Let \(q=(q_1, q_2)\in [0, 1]^2\) denote Bob’s behavioral strategy: \(q_1\) is the probability that Bob plays \(U_3^B=\mathrm {L}\) when observing \(U_1^A=-1\), \(q_2\) is the probability that Bob plays \(U_3^B=\mathrm {L}\) when observing \(U_1^A=+1\).
Claim
is the unique BNE of Example 1.
Given the claim, one can conclude that a CIB-CNE does not exist in this game: Suppose that \((\lambda ^*, \psi ^*)\) forms a CIB-CNE, then by the definition of CIB strategies, at \(t=1\) the team of Alice chooses a prescription (which maps \({\mathcal {X}}_1^A\) to \({\mathcal {U}}_1^A\)) based on no information. At \(t=3\), the team of Bob chooses a prescription (which is equivalent to an action since Bob has no state) based solely on \(B_3\). Define the induced behavioral strategy of Alice and Bob through
where \(b_3[u]\) is the CCI under belief generation system \(\psi ^*\) when \(U_1^A=u\). \(\mathbf {id}\) is the prescription that chooses \(U_1^A=X_1^A\); \(\mathbf {cp}_{u}\) is the prescription that chooses \(U_1^A=u\) irrespective of \(X_1^A\); \({\mathbf {L}}\) is Bob’s prescription that chooses \(U_3^B=\mathrm {L}\).
The consistency of \(\psi _1^*\) with respect to \(\lambda _1^*\) implies that
The consistency of \(\psi _2^*\) with respect to \(\lambda _2^*\) implies that
If a CIB-CNE induces behavioral strategy \(p^*=\left( \frac{1}{3}, \frac{1}{3}\right) \), then the CIB belief \(\varPi _3\in \varDelta ({\mathcal {X}}_2)\) will be the same for both \(U_1=+1\) and \(U_1=-1\) under any consistent belief generation system \(\psi ^*\). Then, \(B_3=(\varPi _3, {\mathbf {U}}_2)\) will be the same for both \(U_1=+1\) and \(U_1=-1\) since \({\mathbf {U}}_2\) only takes one value. Hence, Bob’s-induced stage behavioral strategy q should satisfy \(q_1=q_2\). However, \(q^*=\left( \frac{1}{3}+\varepsilon , \frac{1}{3}-\varepsilon \right) \) is such that \(q_1^*\ne q_2^*\); hence, \((p^*, q^*)\) cannot be induced from any CIB-CNE.
Since the induced behavioral strategy of any CIB-CNE should form a BNE in the game among individuals, we conclude that a CIB-CNE does not exist in Example 1.
Proof of Claim
Denote Alice’s total expected payoff to be J(p, q). Then,
Since this is a zero-sum game, Alice’s expected payoff at equilibrium can be characterized as
Alice plays p at some equilibrium if and only if \(\min _q J(p, q) = J^*\). Define \(J^*(p) = \min _q J(p, q)\). We compute
The set of equilibrium strategies for Alice is the set of maximizers of \(J^*(p)\). Since \(J^*(p)\) is a continuous piecewise linear function, the set of maximizers can be found by comparing the values at the extreme points of the pieces.
We have
Since \(\varepsilon < \frac{1}{3}\), we have \((\frac{1}{3}, \frac{1}{3})\) to be the unique maximum among the extreme points. Hence, we have \(\arg \max _p J^*(p) = \{(\frac{1}{3}, \frac{1}{3}) \}\), i.e., Alice always plays \(p^*=(\frac{1}{3}, \frac{1}{3})\) in any BNE of the game.
Now, consider Bob’s equilibrium strategy. \(q^*\) is an equilibrium strategy of Bob only if \(p^* \in \arg \max _{p} J(p, q^*)\).
For each q, J(p, q) is a linear function of p and
We need \(\nabla _p J(p, q^*)\Big |_{p=p^*} = (0, 0)\). Hence,
which implies that \(q^*=(\frac{1}{3}+\varepsilon , \frac{1}{3}-\varepsilon )\), proving the claim. \(\square \)
Proof of Theorem 3
We use Theorem 2 to establish the existence of CIB-CNE: We show that for each t there always exists a pair \((\lambda _t^*, \psi _t^*)\) such that \(\lambda _t^*\) forms an equilibrium at t given \(\psi _t^*\), and \(\psi _t^*\) is consistent with \(\lambda _t^*\). We provide a constructive proof of existence of CIB-CNE by proceeding backwards in time.
Since \(d=1\), we have \(S_t^i = {\mathbf {X}}_{t-1}^i\). The CCI consists of the beliefs along with \({\mathbf {U}}_{t-1}\).
Consider the condensation of the information graph into a directed acyclic graph (DAG) whose nodes are strongly connected components. Each node may contain multiple teams. Consider one topological ordering of this DAG. Denote the nodes by \([1], [2], \cdots \) ([j] is reachable from [k] only if \(k < j\).) We use the notation \(X_t^{[k]}, \varPi _t^{[k]}\) to denote the vector of the system variables of the teams in a node. In particular, following Definition 15, we define \({\mathbf {Z}}_t^{[k]} = ({\mathbf {X}}_{t-1:t}^{[k]}, {\mathbf {W}}_t^{[k], Y})\). We also use [1 : k] as a short hand for the set \([1]\cup [2]\cup \cdots \cup [k]\). Define \(B_t^{[1:k]} = (\varPi _t^{[1:k]}, {\mathbf {U}}_{t-1}^{[1:k]})\). (Note that the usage of superscript here is different from the CCI \(B_t^i\) defined in Definition 12.)
We construct the solution first backwards in time, then in the order of the node for each stage. For this purpose, we an some induction invariant on the value functions \(V_t^i\) (as defined in Theorem 2) for the solution we are going to construct.
Induction Invariant: For each time t and each node index k,
-
\(V_t^{i}(b_t, x_{t-1}^{i})\) depends on \(b_t\) only through \((b_t^{[1:k-1]}, u_{t-1}^{i})\) for all teams \(i\in [k]\), if [k] consists of only one team. (With some abuse of notation, we write \(V_t^{i}(b_t, x_{t-1}^{i}) = V_t^{i}(b_t^{[1:k-1]}, u_{t-1}^{i}, x_{t-1}^{i})\) in this case.)
-
\(V_t^{i}(b_t, x_{t-1}^{i})\) depends on \(b_t\) only through \(b_t^{[1:k]}\) for all teams \(i\in [k]\), if [k] consists of multiple public teams. (We write \(V_t^{i}(b_t, x_{t-1}^{i}) = V_t^{i}(b_t^{[1:k]}, x_{t-1}^{i})\) in this case.)
Induction Base: For \(t=T+1\), we have \(V_{T+1}^i(\cdot )\equiv 0\) for all coordinators \(i\in {\mathcal {I}}\) hence the induction invariant is true.
Induction Step: Suppose that the induction invariant is true at time \(t+1\) for all nodes. We construct the solution so that it is also true at time t.
To complete this step, we provide a procedure to solve the stage game. We argue that one can solve a series of optimization problems or finite games following the topological order of the nodes through an inner induction step.
Inner Induction Step: Suppose that the first \(k-1\) nodes has been solved, and the equilibrium strategy \(\lambda _{t}^{*[1:k-1]}\) uses only \(b_t^{[1:k-1]}\) along with private information. Suppose that the update rules \(\psi _t^{*,[1:k-1]}\) have also been determined, and they use only \((b_t^{[1:k-1]}, y_t^{[1:k-1]}, u_t^{[1:k-1]})\). We now establish the same property for \((\lambda _{t}^{[k]}, \psi _t^{[k]})\).
-
If the k-th node contains a single coordinator i, the value to go is \(V_{t+1}^i(B_{t+1}^{[1:k-1]}, {\mathbf {U}}_t^i, {\mathbf {X}}_t^i)\) by the induction hypothesis. The instantaneous reward for a coordinator i in the k-th node can be expressed by \(r_t^i({\mathbf {X}}_t^{[1:k]}, {\mathbf {U}}_{t}^{[1:k]})\) by the information graph. In the stage game, coordinator i chooses a prescription to maximize the expected value of
$$\begin{aligned} Q_t^i(b_{t}^{[1:k-1]}, {\mathbf {Z}}_t^{[1:k]}, \varvec{\varGamma }_t^{[1:k]})&:=r_t^i({\mathbf {X}}_t^{[1:k]}, {\mathbf {U}}_{t}^{[1:k]}) + V_{t+1}^i(B_{t+1}^{[1:k-1]}, {\mathbf {U}}_{t}^i, {\mathbf {X}}_t^i), \end{aligned}$$where
$$\begin{aligned} B_{t+1}^{[1:k-1]}&= (\varPi _{t+1}^{[1:k-1]}, {\mathbf {U}}_t^{[1:k-1]} ),\\ \varPi _{t+1}^{j}&=\psi _t^{*,j}(b_t^{[1:k-1]}, {\mathbf {Y}}_t^{j}, {\mathbf {U}}_t^{[1:k-1]})\quad \forall j\in [1:k-1], \\ {\mathbf {Y}}_{t}^{j}&= \ell _t^j({\mathbf {X}}_{t}^j, {\mathbf {U}}_t^{[1:k-1]}, {\mathbf {W}}_t^{j, Y})\quad \forall j\in [1:k-1],\\ {\mathbf {U}}_{t}^j&= \varvec{\varGamma }_{t}^j({\mathbf {X}}_t^j)\quad \forall j\in [1:k]. \end{aligned}$$The expectation is computed using the belief \(\beta _t^i\) (defined through Eq. (4) in Definition 15) along with \(\lambda _{t}^{*[1:k-1]}\) that has already been determined. It can be written as
$$\begin{aligned}&\sum _{{\tilde{s}}_t, {\tilde{\gamma }}_t^{[1:k-1]}} \beta _t^i({\tilde{s}}_t|x_{t-1}^i) Q_t^i(b_t^{[1:k-1]}, {\tilde{s}}_t^{[1:k]}, ({\tilde{\gamma }}_t^{[1:k-1]}, \gamma _{t}^i) )\\&\qquad \times \prod _{j\in [1:k-1]} \lambda _{t}^j({\tilde{\gamma }}_t^j|b_t^{[1:k-1]}, {\tilde{x}}_{t-1}^j) \\&\qquad =\sum _{{\tilde{s}}_{t}^{[1:k]}, {\tilde{\gamma }}_t^{[1:k-1]}} \varvec{1}_{ \{{\tilde{x}}_{t-1}^i=x_{t-1}^i \} } {\mathbb {P}}({\tilde{w}}_t^{[1:k], Y}) \left( \prod _{j\in [1:k-1]} \pi _t^{j}({\tilde{x}}_{t-1}^j){\mathbb {P}}({\tilde{x}}_t^j|{\tilde{x}}_{t-1}^j, u_{t-1}^{[1:k-1]}) \right) \\&\qquad \times \left( \prod _{j\in [1:k-1]} \lambda _{t}^{*j}({\tilde{\gamma }}_t^j|b_t^{[1:k-1]}, x_{t-1}^j) \right) \\&\qquad \times {\mathbb {P}}({\tilde{x}}_t^i|x_{t-1}^i, u_{t-1}^{[1:k]}) Q_t^i(b_t^{[1:k-1]}, {\tilde{s}}_t^{[1:k]}, ({\tilde{\gamma }}_t^{[1:k]}, \gamma _{t}^i)). \end{aligned}$$Therefore, the expected reward of coordinator i depends on \(b_t\) through \((b_t^{[1:k-1]}, u_{t-1}^i)\). Coordinator i can choose the optimal prescription based on \((b_t^{[1:k-1]}, u_{t-1}^i, x_{t-1}^i)\), i.e., \(\lambda _t^{*i}(b_t, x_{t-1}^i)=\lambda _t^{*i}(b_t^{[1:k-1]}, u_{t-1}^i, x_{t-1}^i)\). We then have \(V_t^i(b_t, x_{t-1}^i) = V_t^i(b_t^{[1:k-1]}, u_{t-1}^i, x_{t-1}^i)\). The update rule \(\psi _t^{*, [k]} =\psi _t^{*, i}\) is then determined to be an arbitrary update rule consistent with \(\lambda _{t}^{*, i}\), which can be chosen as a function from \({\mathcal {B}}_t^{[1:k]}\times {\mathcal {Y}}_t^{[k]}\times {\mathcal {U}}_t^{[1:k]}\) (instead of \({\mathcal {B}}_t\times {\mathcal {Y}}_t^{[k]}\times {\mathcal {U}}_t\)) to \(\varPi _{t+1}^{[k]}\).
-
If the k-th node contains a group of public teams, then update rules \({\hat{\psi }}_t^{*, [k]}\) are fixed, irrespective of the stage game strategies, i.e., there exist a unique update rule \({\hat{\psi }}_t^{*, i}\) that is compatible with any \(\lambda _{t}^{*, i}\) for a public team i. This update rule is a map from \({\mathcal {Y}}_t^{[k]}\times {\mathcal {U}}_t^{[1:k]}\) to a vector of delta measures on \(\prod _{i\in [k]} \varDelta ({\mathcal {X}}_{t-1}^i)\), i.e., the map to recover \({\mathbf {X}}_{t-1}^{[k]}\) from the observations (see Definition 18). The function takes \({\mathbf {U}}_{t}^{[1:k]}\) as its argument due to the fact that the observations of the k-th node depends on \({\mathbf {U}}_t\) only through \({\mathbf {U}}_t^{[1:k]}\).
The value to go for each coordinator i can be expressed as \(V_{t+1}^i(B_t^{[1:k]}, {\mathbf {X}}_{t-1}^i)\) by induction hypothesis. The instantaneous reward can be written as \(r_t^i({\mathbf {X}}_t^{[1:k]}, {\mathbf {U}}_{t}^{[1:k]})\) by the definition of the information dependency graph.
In the stage game, coordinator i in the k-th node chooses a distribution \(\eta _{t}^i\) on prescriptions to maximize the expected value of
$$\begin{aligned} Q_t^i(b_{t}^{[1:k]}, {\mathbf {Z}}_t^{[1:k]}, \varvec{\varGamma }_{t}^{[1:k]}):=r_t^i({\mathbf {X}}_t^{[1:k]}, {\mathbf {U}}_{t}^{[1:k]}) + V_{t+1}^i(B_{t+1}^{[1:k]}, {\mathbf {X}}_t^i), \end{aligned}$$where
$$\begin{aligned} B_{t+1}^{[1:k]}&= (\varPi _{t+1}^{[1:k]}, {\mathbf {U}}_t^{[1:k]} ),\\ \varPi _{t+1}^{j}&=\psi _t^{*, j}(b_t^{[1:k-1]}, {\mathbf {Y}}_t^{j}, {\mathbf {U}}_t^{[1:k-1]})\quad \forall j\in [1:k-1], \\ \varPi _{t+1}^{[k]}&={\hat{\psi }}_t^{*, [k]}(b_{t}^{[1:k]}, {\mathbf {Y}}_{t}^{[1:k]}, {\mathbf {U}}_{t}^{[1:k]} ),\\ {\mathbf {Y}}_{t}^{j}&= \ell _t^j({\mathbf {X}}_{t}^j, {\mathbf {U}}_t^{[1:k]}, {\mathbf {W}}_t^{j, Y})\quad \forall j\in [1:k],\\ {\mathbf {U}}_{t}^j&= \varvec{\varGamma }_{t}^j({\mathbf {X}}_t^j)\quad \forall j\in [1:k]. \end{aligned}$$The expectation is taken with respect to the belief \(\beta _t^i\) (defined through Eq. (4) in Definition 15) and the strategy prediction \(\lambda _t^{[1:k]}\). This expectation can be written as
$$\begin{aligned}&\sum _{{\tilde{s}}_t, {\tilde{\gamma }}_t^{[1:k]}} \beta _t^i({\tilde{s}}_t|x_{t-1}^i) Q_t(b_t^{[1:k]}, {\tilde{s}}_t^{[1:k]}, {\tilde{\gamma }}_t^{[1:k]} ) \eta _{t}^{i}({\tilde{\gamma }}_t^i) \times \prod _{\begin{array}{c} j\in [1:k]\\ j\ne i \end{array}} \lambda _{t}^j({\tilde{\gamma }}_t^j|b_t^{[1:k-1]}, {\tilde{x}}_{t-1}^j) \\&\quad =\sum _{{\tilde{s}}_{t}^{[1:k]}, {\tilde{\gamma }}_t^{[1:k]}} \varvec{1}_{ \{{\tilde{x}}_{t-1}^i = x_{t-1}^i \} } {\mathbb {P}}({\tilde{w}}_t^{[1:k], Y}) \\&\qquad \times \left( \prod _{\begin{array}{c} j\in [1:k]\\ j\ne i \end{array}} \pi _{t}^j({\tilde{x}}_{t-1}^j){\mathbb {P}}({\tilde{x}}_t^j|{\tilde{x}}_{t-1}^j, u_{t-1}^{[1:k]})\lambda _{t}^{*j}({\tilde{\gamma }}_t^j|b_t^{[1:k]}, {\tilde{x}}_{t-1}^j) \right) \\&\qquad \times {\mathbb {P}}({\tilde{x}}_t^i|x_{t-1}^i, u_{t-1}^{[1:k]}) \eta _{t}^{i}({\tilde{\gamma }}_t^i) Q_t^i(b_t^{[1:k]}, {\tilde{s}}_t^{[1:k]}, {\tilde{\gamma }}_t^{[1:k]}), \end{aligned}$$which dependents only on \(b_t\) only through \(b_t^{[1:k]}\). Therefore, the stage game defined in Definition 15 induces a finite game between the coordinators in the k-th node (instead of all coordinators) with parameter \((b_t^{[1:k]}, (\psi _t^{*, [1:k-1]}, {\hat{\psi }}_t^{*,[k]}))\) (instead of \((b_t, \psi _t)\)), where \(\lambda _t^{*[1:k-1]}\) has been fixed. Teams in the k-th node form/play a stage game where the first \(k-1\) nodes act like nature, while the coordinators after k-th node have no effect in the payoffs of the coordinators in the k-th node. Hence, a coordinator i in the k-th node can based their decision on \((b_t^{[1:k]}, x_{t-1}^i)\), i.e., \(\lambda _t^{*i}(b_t, x_{t-1}^i)=\lambda _t^{*i}(b_t^{[1:k]}, x_{t-1}^i)\). We also have \(V_t^i(b_t, x_{t-1}^i) = V_t^i(b_t^{[1:k]}, x_{t-1}^i)\). The update rule is determined by \(\psi _t^{*,[k]} = {\hat{\psi }}_t^{*,[k]}\), which is guaranteed to be consistent with \(\lambda _t^{*[k]}\).
In summary, we determine \((\lambda _{t}^*, \psi _t^*)\) using a node-by-node approach. If the k-th node consists of one team, then we first determine \(\lambda _{t}^{*[k]}\) from an optimization problem dependent on \((\lambda _{t}^{*[1:k-1]}, \psi _t^{*,[1:k-1]})\) and then determine \(\psi _t^{*,[k]}\). If the k-th node consists of multiple public players, then we first determine \(\psi _t^{*,[k]}\) and then solve \(\lambda _{t}^{*[k]}\) from a finite game dependent on \((\lambda _{t}^{*[1:k-1]}, \psi _t^{*, [1:k]})\). Hence, we have constructed the solution and established both inner and outer induction steps, proving the theorem.
Proof of Theorem 4
We prove the Theorem for \(d=1\). The proof idea for \(d>1\) is similar.
We will prove a stronger result. For each \(\varPi _t^i\in \varDelta ({\mathcal {X}}_{t-1}^i)\), define the corresponding \({\hat{\varPi }}_t^i\in \varDelta ({\mathcal {X}}_t)\) by
Define \({\hat{\psi }}_t^i\) to be the signaling-free update function, i.e., the belief update function such that
Define open-loop prescriptions as the prescriptions that simply instruct members of a team to take a certain action irrespective their private information. We will show that there exist an equilibrium where each team plays a common information-based signaling-free (CIBSF) strategy, i.e., the common belief generation system for all coordinators is given by the signaling-free update functions \({\hat{\psi }}\), and coordinator i chooses randomized open-loop prescriptions based on \({\overline{\varvec{\varPi }}}_t=({\overline{\varPi }}_t^i)_{i\in {\mathcal {I}}}\) instead of \((B_t, {\mathbf {X}}_{t-1}^i)\).
Induction Invariant: \(V_t^i(B_t, {\mathbf {X}}_{t-1}^i) = V_t^i({\overline{\varvec{\varPi }}}_t, {\mathbf {X}}_{t-1}^i)\).
Induction Base: The induction variant is true for \(t=T+1\) since \(V_{T+1}^i(\cdot ) \equiv 0\) for all \(i\in {\mathcal {I}}\).
Induction Step: Suppose that the induction variant is true for \(t+1\), prove it for time t.
Let \({\hat{\psi }}_t\) be the signaling-free update rule. We solve the stage game \(G_t(V_{t+1}, {\hat{\psi }}_t, b_t)\). In the stage game, coordinator i chooses a prescription to maximize the expectation of
where
Since \(V_{t+1}^i({\overline{\varvec{\varPi }}}_{t+1}, {\mathbf {X}}_t^i)\) does not depend on coordinator i’s prescriptions, coordinator i only need to maximize the expectation of \(r_t^i({\mathbf {X}}_t^{-i}, {\mathbf {U}}_{t})\), which is
Claim
In the stage game, if all coordinators \(-i\) use CIBSF strategy, then coordinator i can respond with a CIBSF strategy.
Proof of Claim
Let \(\eta _{t}^k: {\overline{\varPi }}_t \mapsto \varDelta ({\mathcal {U}}_t^k)\) be the CIBSF strategy of coordinator \(k\ne i\). Then, coordinator i’s expected payoff given \(\gamma _{t}^i\) can be written as
Hence, coordinator i can respond with a prescription \(\gamma _{t}^i\) such that \(\gamma _{t}^i(x_t^i) = u_t^i\) for all \(x_t^i\), where
can be chosen based on \(({\overline{\pi }}_t, \eta _{t}^{-i})\), proving the claim. \(\square \)
Given the claim, we conclude that there exist a stage game equilibrium where all coordinators play CIBSF strategies: Define a new stage game where we restrict each coordinator to CIBSF strategies. A best response in the restricted stage game will be also a best response in the original stage game due to the claim. The restricted game is a finite game (It is a game of symmetrical information with parameter \({\overline{\pi }}_t\) where coordinator i’s action is \(u_t^i\) and its payoff is a function of \({\overline{\pi }}_t\) and \(u_t\).) that always has an equilibrium. The equilibrium strategy will be consistent with \({\hat{\psi }}_t\) due to Lemma 10.
Lemma 10
The signaling-free update rule \({\hat{\psi }}_t^i\) is consistent with any \(\lambda _t^i: {\mathcal {B}}_t\times {\mathcal {X}}_{t-1}^i \mapsto \varDelta ({\mathcal {A}}_{t}^i)\) that corresponds to a CIBSF strategy at time t.
Proof
It follows from standard arguments related to strategy independence of belief (See Chapter 6 of Kumar and Varaiya [25]). \(\square \)
Let \(\eta _{t}^{*}=(\eta _{t}^{*j})_{j\in {\mathcal {I}}}, \eta _{t}^{*j}: {\overline{\varPi }}_t \mapsto \varDelta ({\mathcal {U}}_t^j)\) be a CIBSF strategy profile that is a stage game equilibrium. Then, the value function
depends on \((b_t, x_{t-1}^i)\) only through \(({\overline{\pi }}_t, x_{t-1}^i)\), establishing the induction step.
Proof of Lemma 8
In this appendix, when we specify a team’s strategy through a profile of individual strategies, for example \(\varphi ^i=(\varphi ^{i, l})_{(i, l)\in {\mathcal {N}}_i}\), we assume that members of team i apply these strategies independent of their teammates.
We first show three auxiliary results, Lemmas 11–13 that forms the basis of our proof of Lemma 8.
Lemma 11
(Conditional independence among teammates) Suppose that members of team i use behavioral strategies \(\varphi ^i=(\varphi ^{i, l})_{(i, l)\in {\mathcal {N}}_i}\) where \(\varphi ^{i, l}=(\varphi _t^{i, l})_{t\in {\mathcal {T}}}, \varphi _t^{i, l}: {\mathcal {H}}_t^{i, l}\mapsto \varDelta ({\mathcal {U}}_t^{i, l})\). Suppose that all teams other than i use a behavioral coordination strategy profile \(g^{-i}\). Then, \(({\mathbf {X}}_{t-d+1:t}^{i, l})_{(i, l)\in {\mathcal {N}}_i}\) are conditionally independent given the common information \(H_t^i\). Furthermore, the conditional distribution of \({\mathbf {X}}_{t-d+1:t}^{i, j}\) given \(H_t^i\) depends on \((\varphi ^i, g^{-i})\) only through \(\varphi ^{i, j}\).
Lemma 12
Let \(\mu ^{i, j}\) be a pure strategy of agent (i, j). Let \(\varphi _t^{i, -j}=(\varphi _t^{i, l})_{(i, l)\in {\mathcal {N}}_i\backslash \{(i, j) \}, t\in {\mathcal {T}} }, \varphi _t^{i, l}{:} {\mathcal {H}}_t^{i, l} \mapsto \varDelta ({\mathcal {U}}_t^{i, l})\) be behavioral strategies of all members of team i except (i, j). Then, there exist a behavioral strategy \({\bar{\varphi }}^{i, j} = ({\bar{\varphi }}_t^{i, j})_{t\in {\mathcal {T}}}, {\bar{\varphi }}_t^{i, j}: {\mathcal {H}}_t^{i}\times {\mathcal {X}}_t^{i, j} \mapsto \varDelta ({\mathcal {U}}_t^{i, j})\) such that \((\mu ^{i, j}, \varphi ^{i, -j})\) is payoff-equivalent to \(({\bar{\varphi }}^{i, j}, \varphi ^{i, -j})\).
Lemma 13
Let \(\mu ^i\) be a pure strategy of team i. There exists a payoff-equivalent behavioral strategy profile \({\bar{g}}^i\) that only assigns simple prescriptions.
Based on Lemmas 11–13, we proceed to complete the proof of Lemma 8 via the following steps.
-
1.
Let \(\sigma ^i\) be a payoff-equivalent mixed team strategy to \(g^i\). (See Sect. 3).
-
2.
For each \(\mu ^i\in \mathrm {supp}(\sigma ^i)\), let \({\bar{g}}^{i}[\mu ^i]\) be a payoff-equivalent behavioral strategy profile \({\bar{g}}^i\) that only assigns simple prescriptions (Lemma 13)
-
3.
Let \({\bar{\varsigma }}^i[\mu ^i]\) be a payoff-equivalent mixed coordination strategy of \({\bar{g}}^{i}[\mu ^i]\) constructed from Kuhn’s Theorem [24].
-
4.
Define a new mixed coordination strategy \({\bar{\varsigma }}^i\) by
$$\begin{aligned} {\bar{\varsigma }}^i = \sum _{\mu ^i\in \mathrm {supp}(\sigma ^i)} \sigma ^i(\mu ^i) \cdot {\bar{\varsigma }}^i[\mu ^i]. \end{aligned}$$ -
5.
Let \({\bar{g}}^i\) be a payoff-equivalent behavioral coordination strategy profile to \({\bar{\varsigma }}^i\) constructed from Kuhn’s Theorem [24].
It is clear that \({\bar{g}}^i\) will be payoff-equivalent to \(\sigma ^i\). Furthermore, \({\bar{g}}^i\) always assigns simple prescriptions since the construction in Kuhn’s Theorem does not change the set of possible prescriptions.
Proof of Lemma 11
Assume that \(h_t^i\) is admissible under \(\varphi ^i\). Let \(g^i\) be a behavioral coordination strategy defined by
i.e., at time t, the coordinator generate independent prescriptions for each member of the team. If we view the prescription \(\varGamma _t^{i, j}\) as a table of actions, then it is determined as follows: Each entry of the table is determined independently, where the entry corresponding to \(x_{t-d+1:t}^{i, j}\) is randomly drawn with distribution \(\varphi _t^{i, j}(h_t^i, x_{t-d+1:t}^{i, j})\).
Using arguments similar to those in the proof of Lemma 1 one can show that \((g^i, g^{-i})\) and \((\varphi ^i, g^{-i})\) generate the same distributions of \(({\mathbf {Y}}_{1:t}, {\mathbf {U}}_{1:t}, {\mathbf {X}}_{1:t})\), hence
By Lemma 3, we know that \({\mathbb {P}}^{g}(x_{1:t}^i, \gamma _{1:t}^i|h_t^0)\) does not depend on \(g^{-i}\). As a conditional distribution obtained from \({\mathbb {P}}^{g}(x_{1:t}^i, \gamma _{1:t}^i|h_t^0)\), \({\mathbb {P}}^{g}(x_{t-d+1:t}^i|h_t^i)\) does not depend on \(g^{-i}\) either. Therefore, we have
where \({\hat{g}}^{-i}\) is an open-loop strategy profile that always generates the actions \(u_{1:t-1}^{-i}\).
Again, \((g^i, {\hat{g}}^{-i})\) and \((\varphi ^i, {\hat{g}}^{-i})\) generate the same distributions on \(({\mathbf {Y}}_{1:t}, {\mathbf {U}}_{1:t}, {\mathbf {X}}_{1:t})\), hence
We now have
Due to Lemma 3, we also have
where \(x_{t-d:t}^{-i}\in {\mathcal {X}}_{t-d:t}^{-i}\) is such that \({\mathbb {P}}^{\varphi ^i, {\hat{g}}^{-i}}(x_{t-d:t}^{-i}|h_t^i)>0\).
Let \(\tau =t-d+1\). By Bayes’ rule,
We have
Substituting (32) into (31), we obtain
where
is a function that depends on \(\varphi ^{i, j}\) but not \(\varphi ^{i, -j}\).
Therefore, we have proved that
Marginaling (33) we have
which depends on \((\varphi ^i, g^{-i})\) only through \(\varphi ^{i, j}\).
Hence, we conclude that
and \({\mathbb {P}}^{\varphi ^i, g^{-i}}(x_{t-d+1:t}^{i, j}|h_t^i)\) depends on \((\varphi ^i, g^{-i})\) only through \(\varphi ^{i, j}\).
Remark 14
In general, the conditional independence among teammates is not true when team members jointly randomize.
\(\square \)
Proof of Lemma 12
For notational convenience, define
Due to Lemma 11, \({\mathbb {P}}^{\varphi ^i, g^{-i}}({\tilde{x}}_{t-d+1:t-1}^{i, j}|h_t^i, x_t^{i, j})\) depends on the strategy profile only through \(\varphi ^{i, j}\).
Set
for all \((h_t^i, x_t^{i, j})\) admissible under \(\mu ^{i, j}\). Otherwise, \({\overline{\varphi }}_t^{i, j}(h_t^i, x_t^{i, j})\) is set arbitrarily.
Let \(\mu ^{-i}\) be a pure team strategy profile of teams other than i. Let the superscript \(-(i, j)\) denote all agents (of all teams) other than (i, j). We will prove by induction that
Given (34), the claim can be established with linearity of expectation similar to the proof of Lemma 5.
Induction Base: (34) is true for \(t=1\) since \({\overline{\varphi }}_1^{i, j}\) is the same strategy as \(\mu _1^{i, j}\).
Induction Step: Suppose that (34) is true for time \(t-1\). Prove the result for time t.
First,
where
From Lemmas 3 and 11, we know that
for all \((x_t^{i, j}, h_t^i)\) admissible under \(\mu ^{i, j}\). Note that \({\mathbb {P}}^{\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t) = 0\) for \((x_t^{i, j}, h_t^i)\) not admissible under \(\mu ^{i, j}\).
Hence, we conclude that
and
Similarly, we have
Hence, it suffices to prove that
Given the induction hypothesis, it suffices to show that
for all \((x_{t-1}, x_{t-d:t-2}^{-(i, j)}, h_{t-1})\) admissible under \((\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i})\) (or admissible under \(({\bar{\varphi }}^{i, j}, \varphi _t^{i, -j}, \mu ^{-i})\), which is the same condition due to the induction hypothesis).
Since
we have \(({\mathbf {X}}_{t}, {\mathbf {X}}_{t-d+1:t-1}^{-(i, j)}, H_t)\) to be a strategy-independent function of the random vector \(({\mathbf {U}}_{t-1}, {\mathbf {X}}_{t-1}, {\mathbf {X}}_{t-d:t-2}^{-(i, j)}, H_{t-1}, {\mathbf {W}}_{t-1}^{X}, {\mathbf {W}}_{t-1}^{Y})\), where \(({\mathbf {W}}_{t-1}^{X}, {\mathbf {W}}_{t-1}^{Y})\) is a primitive random vector independent of \(({\mathbf {U}}_{t-1}, {\mathbf {X}}_{t-1}, {\mathbf {X}}_{t-d:t-2}^{-(i, j)}, H_{t-1})\). Therefore, (35) is true and we established the induction step. \(\square \)
Proof of Lemma 13
Through iterative application of Lemma 12, we conclude that for every pure strategy \(\mu ^i\), there exist a payoff-equivalent behavioral strategy profile \({\bar{\varphi }}^{i}=({\bar{\varphi }}_t^{i, j})_{(i, j)\in {\mathcal {N}}_i, t\in {\mathcal {T}}}\), where \({\bar{\varphi }}_t^{i, j}: {\mathcal {H}}_t^{i}\times {\mathcal {X}}_t^{i, j} \mapsto \varDelta ({\mathcal {U}}_t^{i, j})\). Define \({\bar{g}}^i\) by
where \(\bar{{\mathcal {A}}}_t^i\subset {\mathcal {A}}_t^i\) is the set of simple prescriptions. Then, using arguments similar to those in the proof of Lemma 1 one can show that \({\bar{g}}^i\) is payoff-equivalent to \({\bar{\varphi }}^i\), and hence payoff-equivalent to \(\mu ^i\). \(\square \)
Rights and permissions
About this article
Cite this article
Tang, D., Tavafoghi, H., Subramanian, V. et al. Dynamic Games Among Teams with Delayed Intra-Team Information Sharing. Dyn Games Appl 13, 353–411 (2023). https://doi.org/10.1007/s13235-022-00424-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13235-022-00424-4