Skip to main content

Advertisement

Log in

Dynamic Games Among Teams with Delayed Intra-Team Information Sharing

  • Published:
Dynamic Games and Applications Aims and scope Submit manuscript

Abstract

We analyze a class of stochastic dynamic games among teams with asymmetric information, where members of a team share their observations internally with a delay of d. Each team is associated with a controlled Markov Chain, whose dynamics are coupled through the players’ actions. These games exhibit challenges in both theory and practice due to the presence of signaling and the increasing domain of information over time. We develop a general approach to characterize a subset of Nash equilibria where the agents can use a compressed version of their information, instead of the full information, to choose their actions. We identify two subclasses of strategies: sufficient private information-Based (SPIB) strategies, which only compress private information, and compressed information-based (CIB) strategies, which compress both common and private information. We show that SPIB-strategy-based equilibria exist and the set of payoff profiles of such equilibria is the same as that of all Nash equilibria. On the other hand, we show that CIB-strategy-based equilibria may not exist. We develop a backward inductive sequential procedure, whose solution (if it exists) provides a CIB strategy-based equilibrium. We identify some instances where we can guarantee the existence of a solution to the above procedure. Our results highlight the tension among compression of information, ability of compression-based strategies to sustain all or some of the equilibrium payoff profiles, and backward inductive sequential computation of equilibria in stochastic dynamic games with asymmetric information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Availability of Data and Material

Not applicable.

Notes

  1. A team strategy is person-by-person optimal (PBPO) when each team member’s strategy is an optimal response given other team members’ strategy profile.

  2. In contrast to signaling in teams, signaling in games is complicated by the fact that agents have diverging incentives.

  3. Example of such strategy dependencies appears in Ho [20] and in Nayyar and Teneketzis [40] for team problems with non-classical information structure. Since these strategy dependencies are solely due to the problem’s information structure, they also appear in dynamic games with non-classical information structure (see [21]).

  4. We do not restrict the strategy types of \(g^{i}\) and \({\tilde{g}}^i\) in Definition 4. In particular, each of \(g^{i}\) and \({\tilde{g}}^i\) could be a coordination strategy or a team strategy.

  5. The \((d-1)\)-step PRPs are the same as the partial functions defined in the second structural result in Nayyar et al. [41].

  6. The compression of private information of coordinators in our model is closely related to Tavafoghi et al.’s [53] sufficient information approach. One can show that our sufficient private information \(S_{t}^i=({\mathbf {X}}_{t-d}^i, \varvec{\varPhi }_t^i)\) satisfies the definition of sufficient private information (Definition 4) in Tavafoghi et al. [53] (hence, we choose to use the same terminology).

  7. Since \({\mathcal {X}}_t,{\mathcal {U}}_t,{\mathcal {Y}}_t\) are finite sets, one can assume that \({\mathbf {W}}_t^Y\) also takes finite values without lost of generality.

  8. The compression of hidden information to sufficient hidden information is similar to the shedding of irrelevant information in Mahajan [29].

  9. We claim that the value of this conditional probability is the same for g and \({\tilde{g}}\) whenever the conditional probability is well-defined under both g and \({\tilde{g}}\). However, whether or not the the conditional probability is well-defined does depend on g. In the lemma, we always apply Claim 1 by multiplying \({\mathbb {P}}^g(x_{t-d+1:t}|\gamma _t, s_t^i, {\overline{h}}_{t}^{-i})\) with some other terms. Those terms will be 0 whenever \({\mathbb {P}}^g(x_{t-d+1:t}|\gamma _t, s_t^i, {\overline{h}}_{t}^{-i})\) is not well defined.

  10. Note that \({\mathbb {P}}^{ g^{-i}}({\tilde{z}}_t|b_t, s_{t}^i)\) is different from \(\beta _t^i({\tilde{z}}_t|s_{t}^i)\). Since \(B_t\) is just a compression of the common information based on an predetermined update rule \(\psi \), which may or may not be consistent with the actually played strategy, \(B_t\) may not represent the true belief. \({\mathbb {P}}^{ g^{-i}}({\tilde{z}}_t|b_t, s_{t}^i)\) is the belief an agent inferred from the event \(B_t=b_t, S_t^i=s_t^i\). The agent knows that \(b_t\) might not contain the true belief, but it is useful anyway in inferring the true state. \(\beta _t^i({\tilde{z}}_t|s_{t}^i)\) is a conditional distribution computed with \(b_t\), pretending that \(b_t\) contains the true belief.

References

  1. Amin S, Litrico X, Sastry S, Bayen AM (2013) Cyber security of water SCADA systems—Part I: analysis and experimentation of stealthy deception attacks. IEEE Trans Control Syst Technol 21(5):1963–1970. https://doi.org/10.1109/tcst.2012.2211873

  2. Amin S, Schwartz GA, Cárdenas AA, Sastry SS (2015) Game-theoretic models of electricity theft detection in smart utility networks: providing new capabilities with advanced metering infrastructure. IEEE Control Syst Mag 35(1):66–81. https://doi.org/10.1109/mcs.2014.2364711

    Article  MathSciNet  MATH  Google Scholar 

  3. Anantharam V, Borkar V (2007) Common randomness and distributed control: a counterexample. Syst Control Lett 56(7–8):568–572. https://doi.org/10.1016/j.sysconle.2007.03.010

    Article  MathSciNet  MATH  Google Scholar 

  4. Başar T, Olsder GJ (1999) Dynamic noncooperative game theory, vol 23. SIAM, Philadelphia

    MATH  Google Scholar 

  5. Bergemann D, Välimäki J (2006) Dynamic price competition. J Econ Theory 127(1):232–263. https://doi.org/10.1016/j.jet.2005.01.002

    Article  MathSciNet  MATH  Google Scholar 

  6. Bhattacharya S, Başar T (2012) Multi-layer hierarchical approach to double sided jamming games among teams of mobile agents. In: 2012 IEEE 51st IEEE conference on decision and control (CDC), IEEE, pp 5774–5779. https://doi.org/10.1109/cdc.2012.6426411

  7. Cabral L (2011) Dynamic price competition with network effects. Rev Econ Stud 78(1):83–111. https://doi.org/10.1093/restud/rdq007

    Article  MathSciNet  MATH  Google Scholar 

  8. Cardaliaguet P, Rainer C, Rosenberg D, Vieille N (2016) Markov games with frequent actions and incomplete information-the limit case. Math Oper Res 41(1):49–71. https://doi.org/10.1287/moor.2015.0715

    Article  MathSciNet  MATH  Google Scholar 

  9. Colombino M, Smith RS, Summers TH (2017) Mutually quadratically invariant information structures in two-team stochastic dynamic games. IEEE Trans Autom Control 63(7):2256–2263. https://doi.org/10.1109/tac.2017.2772020

    Article  MathSciNet  MATH  Google Scholar 

  10. Cooper DJ, Kagel JH (2005) Are two heads better than one? Team versus individual play in signaling games. Am Econ Rev 95(3):477–509. https://doi.org/10.1257/0002828054201431

    Article  Google Scholar 

  11. Cox CA, Stoddard B (2018) Strategic thinking in public goods games with teams. J Public Econ 161:31–43. https://doi.org/10.1016/j.jpubeco.2018.03.007

    Article  Google Scholar 

  12. Doganoglu T (2003) Dynamic price competition with consumption externalities. NETNOMICS Econ Res Electron Netw 5(1):43–69. https://doi.org/10.1023/A:1024994117734

    Article  Google Scholar 

  13. Farina G, Celli A, Gatti N, Sandholm T (2018) Ex-ante coordination and collusion in zero-sum multi-player extensive-form games. In: Conference on neural information processing systems (NIPS)

  14. Filar J, Vrieze K (2012) Competitive Markov decision processes. Springer, New York

    MATH  Google Scholar 

  15. Gensbittel F, Renault J (2015) The value of Markov chain games with incomplete information on both sides. Math Oper Res 40(4):820–841. https://doi.org/10.1287/moor.2014.0697

    Article  MathSciNet  MATH  Google Scholar 

  16. Gupta A, Nayyar A, Langbort C, Başar T (2014) Common information based Markov perfect equilibria for linear-Gaussian games with asymmetric information. SIAM J Control Optim 52(5):3228–3260. https://doi.org/10.1137/140953514

    Article  MathSciNet  MATH  Google Scholar 

  17. Gupta A, Langbort C, Başar T (2016) Dynamic games with asymmetric information and resource constrained players with applications to security of cyberphysical systems. IEEE Trans Control Netw Syst 4(1):71–81. https://doi.org/10.1109/tcns.2016.2584183

    Article  MathSciNet  MATH  Google Scholar 

  18. Hancock PA, Nourbakhsh I, Stewart J (2019) On the future of transportation in an era of automated and autonomous vehicles. Proc Natl Acad Sci 116(16):7684–7691. https://doi.org/10.1073/pnas.1805770115

    Article  Google Scholar 

  19. Harbert T (2014) Radio wrestlers fight it out at the DARPA Spectrum Challenge. https://spectrum.ieee.org/telecom/wireless/radio-wrestlers-fight-it-out-at-the-darpa-spectrum-challenge

  20. Ho YC (1980) Team decision theory and information structures. Proc IEEE 68(6):644–654. https://doi.org/10.1109/proc.1980.11718

    Article  Google Scholar 

  21. Kartik D, Nayyar A (2020) Upper and lower values in zero-sum stochastic games with asymmetric information. Dyn Games Appl 1–26. https://doi.org/10.1007/s13235-020-00364-x

  22. Kartik D, Nayyar A, Mitra U (2021) Common information belief based dynamic programs for stochastic zero-sum games with competing teams. arXiv preprint arXiv:2102.05838

  23. Kaspi Y, Merhav N (2010) Structure theorem for real-time variable-rate lossy source encoders and memory-limited decoders with side information. In: 2010 IEEE international symposium on information theory (ISIT), pp 86–90. https://doi.org/10.1109/isit.2010.5513283

  24. Kuhn H (1953) Extensive games and the problem of information. In: Contributions to the theory of games (AM-28), volume II. Princeton University Press, pp 193–216. https://doi.org/10.1515/9781400881970-012

  25. Kumar PR, Varaiya P (1986) Stochastic systems: estimation, identification and adaptive control. Prentice-Hall, Inc, Englewood Cliffs

    MATH  Google Scholar 

  26. Li L, Shamma J (2014) LP formulation of asymmetric zero-sum stochastic games. In: 2014 53rd IEEE conference on decision and control. IEEE, pp 1930–1935. https://doi.org/10.1109/cdc.2014.7039680

  27. Li L, Langbort C, Shamma J (2019) An LP approach for solving two-player zero-sum repeated Bayesian games. IEEE Trans Autom Control 64(9):3716–3731. https://doi.org/10.1109/tac.2018.2885644

    Article  MathSciNet  MATH  Google Scholar 

  28. Mahajan A (2008) Sequential decomposition of sequential dynamic teams: Applications to real-time communication and networked control systems. PhD thesis, University of Michigan, Ann Arbor

  29. Mahajan A (2013) Optimal decentralized control of coupled subsystems with control sharing. IEEE Trans Autom Control 58(9):2377–2382. https://doi.org/10.1109/cdc.2011.6160970

    Article  MathSciNet  MATH  Google Scholar 

  30. Mahajan A, Teneketzis D (2009) Optimal performance of networked control systems with nonclassical information structures. SIAM J Control Optim 48(3):1377–1404. https://doi.org/10.1137/060678130

    Article  MathSciNet  MATH  Google Scholar 

  31. Mailath GJ, Samuelson L (2006) Repeated games and reputations: long-run relationships. Oxford University Press, Oxford

    Book  Google Scholar 

  32. Maskin E, Tirole J (1988) A theory of dynamic oligopoly. I: Overview and quantity competition with large fixed costs. Econom J Econom Soc 549–569. https://doi.org/10.2307/1911700

  33. Maskin E, Tirole J (1988) A theory of dynamic oligopoly. II: Price competition, kinked demand curves, and edgeworth cycles. Econom J Econom Soc. https://doi.org/10.2307/1911701

  34. Maskin E, Tirole J (2001) Markov perfect equilibrium: I. observable actions. J Econ Theory 100(2):191–219. https://doi.org/10.1006/jeth.2000.2785

    Article  MathSciNet  MATH  Google Scholar 

  35. Maskin E, Tirole J (2013) Markov equilibrium. In: J. F. Mertens memorial conference. https://youtu.be/UNtLnKJzrhs

  36. Myerson RB (2013) Game theory. Harvard University Press, Harvard

    Book  MATH  Google Scholar 

  37. Nayyar A, Başar T (2012) Dynamic stochastic games with asymmetric information. In: 2012 IEEE 51st IEEE conference on decision and control (CDC). IEEE, pp 7145–7150. https://doi.org/10.1109/cdc.2012.6426857

  38. Nayyar A, Teneketzis D (2011) On the structure of real-time encoding and decoding functions in a multiterminal communication system. IEEE Trans Inf Theory 57(9):6196–6214. https://doi.org/10.1109/tit.2011.2161915

    Article  MathSciNet  MATH  Google Scholar 

  39. Nayyar A, Teneketzis D (2011) Sequential problems in decentralized detection with communication. IEEE Trans Inf Theory 57(8):5410–5435. https://doi.org/10.1109/tit.2011.2158478

    Article  MathSciNet  MATH  Google Scholar 

  40. Nayyar A, Teneketzis D (2019) Common knowledge and sequential team problems. IEEE Trans Autom Control 64(12):5108–5115. https://doi.org/10.1109/tac.2019.2912536

    Article  MathSciNet  MATH  Google Scholar 

  41. Nayyar A, Mahajan A, Teneketzis D (2011) Optimal control strategies in delayed sharing information structures. IEEE Trans Autom Control 56(7):1606–1620. https://doi.org/10.1109/tac.2010.2089381

    Article  MathSciNet  MATH  Google Scholar 

  42. Nayyar A, Gupta A, Langbort C, Başar T (2013) Common information based Markov perfect equilibria for stochastic games with asymmetric information: finite games. IEEE Trans Autom Control 59(3):555–570. https://doi.org/10.1109/tac.2013.2283743

    Article  MathSciNet  MATH  Google Scholar 

  43. Nayyar A, Mahajan A, Teneketzis D (2013) Decentralized stochastic control with partial history sharing: a common information approach. IEEE Trans Autom Control 58(7):1644–1658. https://doi.org/10.1109/tac.2013.2239000

    Article  MathSciNet  MATH  Google Scholar 

  44. Ouyang Y, Tavafoghi H, Teneketzis D (2015) Dynamic oligopoly games with private Markovian dynamics. In: 2015 54th IEEE conference on decision and control (CDC). IEEE, pp 5851–5858. https://doi.org/10.1109/cdc.2015.7403139

  45. Ouyang Y, Tavafoghi H, Teneketzis D (2016) Dynamic games with asymmetric information: common information based perfect Bayesian equilibria and sequential decomposition. IEEE Trans Autom Control 62(1):222–237. https://doi.org/10.1109/tac.2016.2544936

    Article  MathSciNet  MATH  Google Scholar 

  46. Renault J (2006) The value of Markov chain games with lack of information on one side. Math Oper Res 31(3):490–512. https://doi.org/10.1287/moor.1060.0199

    Article  MathSciNet  MATH  Google Scholar 

  47. Renault J (2012) The value of repeated games with an informed controller. Math Oper Res 37(1):154–179. https://doi.org/10.1287/moor.1110.0518

    Article  MathSciNet  MATH  Google Scholar 

  48. Shelar D, Amin S (2017) Security assessment of electricity distribution networks under DER node compromises. IEEE Trans Control Netw Syst 4(1):23–36. https://doi.org/10.1109/tcns.2016.2598427

    Article  MathSciNet  MATH  Google Scholar 

  49. Summers T, Li C, Kamgarpour M (2017) Information structure design in team decision problems. IFAC-PapersOnLine 50(1):2530–2535. https://doi.org/10.1016/j.ifacol.2017.08.067

    Article  Google Scholar 

  50. Tavafoghi H (2017) On design and analysis of cyber-physical systems with strategic agents. PhD thesis, University of Michigan, Ann Arbor

  51. Tavafoghi H, Ouyang Y, Teneketzis D (2016) On stochastic dynamic games with delayed sharing information structure. In: 2016 IEEE 55th conference on decision and control (CDC). IEEE, pp 7002–7009. https://doi.org/10.1109/cdc.2016.7799348

  52. Tavafoghi H, Ouyang Y, Teneketzis D, Wellman M (2019) Game theoretic approaches to cyber security: challenges, results, and open problems. In: Jajodia S, Cybenko G, Liu P, Wang C, Wellman M (eds) Adversarial and uncertain reasoning for adaptive cyber defense: control-and game-theoretic approaches to cyber security, vol 11830. Springer, New York, pp 29–53. https://doi.org/10.1007/978-3-030-30719-6_3

    Chapter  Google Scholar 

  53. Tavafoghi H, Ouyang Y, Teneketzis D (March 2022) A unified approach to dynamic decision problems with asymmetric information: non-strategic agents. IEEE Trans Autom Control. https://doi.org/10.1109/tac.2021.3060835, to appear

  54. Teneketzis D (2006) On the structure of optimal real-time encoders and decoders in noisy communication. IEEE Trans Inf Theory 52(9):4017–4035. https://doi.org/10.1109/tit.2006.880067

    Article  MathSciNet  MATH  Google Scholar 

  55. Teneketzis D, Ho YC (1987) The decentralized Wald problem. Inf Comput 73(1):23–44. https://doi.org/10.1016/0890-5401(87)90038-1

    Article  MathSciNet  MATH  Google Scholar 

  56. Teneketzis D, Varaiya P (1984) The decentralized quickest detection problem. IEEE Trans Autom Control 29(7):641–644. https://doi.org/10.1109/tac.1984.1103601

    Article  MathSciNet  MATH  Google Scholar 

  57. Tenney RR, Sandell NR (1981) Detection with distributed sensors. IEEE Trans Aerosp Electron Syst AES 17(4):501–510. https://doi.org/10.1109/taes.1981.309178

    Article  MathSciNet  Google Scholar 

  58. Tsitsiklis JN (1993) Decentralized detection. Adv Stat Signal Process 297–344

  59. Varaiya P, Walrand J (1983) Causal coding and control for Markov chains. Syst Control Lett 3(4):189–192. https://doi.org/10.1016/0167-6911(83)90012-9

    Article  MathSciNet  MATH  Google Scholar 

  60. Vasal D, Sinha A, Anastasopoulos A (2019) A systematic process for evaluating structured perfect Bayesian equilibria in dynamic games with asymmetric information. IEEE Trans Autom Control 64(1):81–96. https://doi.org/10.1109/tac.2018.2809863

    Article  MathSciNet  MATH  Google Scholar 

  61. Veeravalli VV (2001) Decentralized quickest change detection. IEEE Trans Inf Theory 47(4):1657–1665. https://doi.org/10.1109/18.923755

    Article  MathSciNet  MATH  Google Scholar 

  62. Veeravalli VV, Başar T, Poor HV (1993) Decentralized sequential detection with a fusion center performing the sequential test. IEEE Trans Inf Theory 39(2):433–442. https://doi.org/10.1109/18.212274

    Article  MATH  Google Scholar 

  63. Veeravalli VV, Başar T, Poor HV (1994) Decentralized sequential detection with sensors performing sequential tests. Math Control Signals Syst 7(4):292–305. https://doi.org/10.1007/bf01211521

    Article  MathSciNet  MATH  Google Scholar 

  64. Walrand J, Varaiya P (1983) Optimal causal coding-decoding problems. IEEE Trans Inf Theory 29(6):814–820. https://doi.org/10.1109/tit.1983.1056760

    Article  MathSciNet  MATH  Google Scholar 

  65. Witsenhausen HS (1973) A standard form for sequential stochastic control. Math Syst Theory 7(1):5–11. https://doi.org/10.1007/bf01824800

    Article  MathSciNet  MATH  Google Scholar 

  66. Witsenhausen HS (1979) On the structure of real-time source coders. Bell Syst Tech J 58(6):1437–1451. https://doi.org/10.1002/j.1538-7305.1979.tb02263.x

    Article  MATH  Google Scholar 

  67. Yoshikawa T (1978) Decomposition of dynamic team decision problems. IEEE Trans Autom Control 23(4):627–632. https://doi.org/10.1109/tac.1978.1101791

    Article  MATH  Google Scholar 

  68. Zhang Y, An B (2020) Computing team-maxmin equilibria in zero-sum multiplayer extensive-form games. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, no. 02, pp 2318–2325. https://doi.org/10.1609/aaai.v34i02.5610

  69. Zheng J, Castañón DA (2013) Decomposition techniques for Markov zero-sum games with nested information. In: 2013 52nd IEEE conference on decision and control. IEEE, pp 574–581. https://doi.org/10.1109/cdc.2013.6759943

  70. Zhu Q, Başar T (2015) Game-theoretic methods for robustness, security, and resilience of cyberphysical control systems: games-in-games principle for optimal cross-layer resilient control systems. IEEE Control Syst Mag 35(1):46–65. https://doi.org/10.1109/mcs.2014.2364710

    Article  MathSciNet  MATH  Google Scholar 

Download references

Funding

This work is supported by National Science Foundation (NSF) Grant No. ECCS 1750041, ECCS 2038416, ECCS 1608361, CCF 2008130, Army Research Office (ARO) Award No. W911NF-17-1-0232, and Michigan Institute for Data Science (MIDAS) Sponsorship Funds by General Dynamics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dengwang Tang.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Code Availability

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Multi-agent Dynamic Decision Making and Learning” edited by Konstantin Avrachenkov, Vivek S. Borkar and U. Jayakrishnan Nair.

Appendices

Two Examples

1.1 A Motivating Example for Sect. 2

The following example illustrates the importance of considering jointly randomized mixed strategies when we study games among teams. Similar to the role mixed strategies play in games among individual players, the space of jointly randomized mixed strategies contains the minimum richness of strategies that ensures an equilibrium exists in games among teams. In particular, if we restrict the teams to use independently randomized strategies, i.e., type 1 and type 2 strategies described in Sect. 2.2, then an equilibrium may not exist. This example is similar to the examples in Farina et al. [13], Zhang and An [68], Anantharam and Borkar [3] in spirit, despite the fact that in our example the players in the same team have asymmetric information.

Example 2

(Guessing game) Consider a two-stage game (i.e., \({\mathcal {T}}=\{1, 2\}\)) of two teams \({\mathcal {I}} = \{A, B\}\), each consisting of two players. The set of all agents is given by \({\mathcal {N}} = \{(A, 1), (A, 2), (B,1), (B, 2) \}\). Let \({\mathbf {X}}_t^A=(X_t^{A,1}, X_t^{A,2})\in \{-1, 1\}^2\) and Team B does not have a state, i.e., \({\mathbf {X}}_t^B = \varnothing \). Assume \({\mathcal {U}}_t^{i, j} = \{-1, 1\}\) for \(t=1, i=A\) or \(t=2, i=B\) and \({\mathcal {U}}_t^{i, j}=\varnothing \) otherwise, i.e., Team A moves at time 1, and Team B moves at time 2. At time 1, \(X_1^{A, 1}\) and \(X_1^{A, 2}\) are independently uniformly distributed on \(\{-1, 1\}\). Team A’s system is assumed to be static, i.e., \({\mathbf {X}}_{2}^{A} = {\mathbf {X}}_{1}^{A}\).

The rewards of Team A are given by

$$\begin{aligned} r_1^A({\mathbf {X}}_1, {\mathbf {U}}_1)&= \varvec{1}_{\{ X_{1}^{A, 1} U_1^{A, 1} X_{1}^{A, 2} U_1^{A, 2}=-1\} },\\ r_2^A({\mathbf {X}}_2, {\mathbf {U}}_2)&= - \varvec{1}_{\{ X_2^{A, 1} = U_2^{B, 1}\}} - \varvec{1}_{\{ X_2^{A, 2} = U_2^{B, 2}\}}, \end{aligned}$$

and the rewards of Team B are given by

$$\begin{aligned} r_1^B({\mathbf {X}}_1, {\mathbf {U}}_1)&= 0,\\ r_2^B({\mathbf {X}}_2, {\mathbf {U}}_2)&= \varvec{1}_{\{ X_2^{A, 1} = U_2^{B, 1}\}} + \varvec{1}_{\{ X_2^{A, 2} = U_2^{B, 2}\}}. \end{aligned}$$

Assume that there are no additional common observations other than past actions, i.e., \({\mathbf {Y}}_t = \varnothing \). We set the delay \(d=2\), i.e., agent (A, 1) does not know \(X_t^{A,2}\) throughout the game and a similar property is true for agent (A, 2). In this game, the task of Team A is to choose actions according to their states at \(t=1\) in order to earn a positive reward, while not revealing too much information through their actions to Team B. The task of Team B is to guess Team A’s state.

It can be verified (see Appendix A.2.1 for a detailed derivation) that if we restrict both teams to use independently randomized strategies (including deterministic strategies), then there exists no equilibria. However, there does exist an equilibrium where Team A randomizes in a correlated manner, specifically, the following strategy profile \(\sigma ^*\): At \(t=1\), Team A plays \(\gamma ^A=(\gamma ^{A, 1}, \gamma ^{A, 2})\) with probability 1/2, and \({\tilde{\gamma }}^A=({\tilde{\gamma }}^{A, 1}, {\tilde{\gamma }}^{A, 2})\) with probability 1/2, where

$$\begin{aligned} \gamma ^{A, 1}(x_1^{A, 1})&= x_1^{A, 1}, \quad \gamma ^{A, 2}(x_1^{A, 2}) = - x_1^{A, 2},\\ {\tilde{\gamma }}^{A, 1}(x_1^{A, 1})&= - x_1^{A, 1}, \quad {\tilde{\gamma }}^{A, 2}(x_1^{A, 2}) = x_1^{A, 2} \end{aligned}$$

and at \(t=2\), the two members of Team B choose independent and uniformly distributed actions on \(\{-1, 1\}\), independent of their action and observation history. In \(\sigma ^*\), each agent (Aj) chooses a uniform random action irrespective of their states. It is important to have (A, 1) and (A, 2) choose these actions in a correlated way to ensure that they obtain the full instantaneous reward while not revealing any information.

1.2 An Illustrative Example for Sect. 3

The following example illustrates how to visualize games among teams from the coordinators’ viewpoint.

Example 3

Consider a variant of the Guessing Game in Example 2 with the same system model and information structure but different action sets and reward functions. In the new game, Team A moves at both \(t=1\) and \(t=2\), with \({\mathcal {U}}_t^{A, j} = \{-1, 1\}\) for \(t=1,2\) and \(j=1,2\). Team B moves only at time \(t=2\) as in the original game. The new reward functions are given by

$$\begin{aligned} r_1^A({\mathbf {X}}_1, {\mathbf {U}}_1)&= 0,\\ r_2^A({\mathbf {X}}_2, {\mathbf {U}}_2)&= \varvec{1}_{\{X_2^{A, 2} = U_2^{A, 1}, X_2^{A, 1} = U_2^{A, 2}\}} + \varvec{1}_{\{ {\mathbf {X}}_2^{A} \ne {\mathbf {U}}_2^{B}\}},\\ r_1^B({\mathbf {X}}_1, {\mathbf {U}}_1)&= 0,\\ r_2^B({\mathbf {X}}_2, {\mathbf {U}}_2)&= \varvec{1}_{\{ {\mathbf {X}}_2^{A} = {\mathbf {U}}_2^{B}\}}. \end{aligned}$$

In this example, Team A’s task is to guess its own state after a round of publicly observable communication while not leaking information to Team B.

A Team Nash equilibrium \((\sigma ^{*A}, \sigma ^{*B})\) of this game is as follows: Team A chooses one of the four pure strategy profiles listed below with equal probability:

$$\begin{aligned} \bullet ~&\mu _1^{A, 1}(x_1^{A, 1}) = - x_1^{A, 1}, \mu _1^{A, 2}(x_1^{A, 2}) = x_1^{A, 2},\\&\mu _2^{A, 1}({\mathbf {u}}_1, x_{1:2}^{A, 1}) = u_1^{A, 2}, \mu _2^{A, 2}({\mathbf {u}}_1, x_{1:2}^{A, 2}) = -u_1^{A, 1};\\ \bullet ~&\mu _1^{A, 1}(x_1^{A, 1}) = - x_1^{A, 1}, \mu _1^{A, 2}(x_1^{A, 2}) = - x_1^{A, 2},\\&\mu _2^{A, 1}({\mathbf {u}}_1, x_{1:2}^{A, 1}) = -u_1^{A, 2}, \mu _2^{A, 2}({\mathbf {u}}_1, x_{1:2}^{A, 2}) = -u_1^{A, 1};\\ \bullet ~&\mu _1^{A, 1}(x_1^{A, 1}) = x_1^{A, 1}, \mu _1^{A, 2}(x_1^{A, 2}) = x_1^{A, 2},\\&\mu _2^{A, 1}({\mathbf {u}}_1, x_{1:2}^{A, 1}) = u_1^{A, 2} , \mu _2^{A, 2}({\mathbf {u}}_1, x_{1:2}^{A, 2}) = u_1^{A, 1};\\ \bullet ~&\mu _1^{A, 1}(x_1^{A, 1}) = x_1^{A, 1}, \mu _1^{A, 2}(x_1^{A, 2}) = -x_1^{A, 2},\\&\mu _2^{A, 1}({\mathbf {u}}_1, x_{1:2}^{A, 1}) = -u_1^{A, 2}, \mu _2^{A, 2}({\mathbf {u}}_1, x_{1:2}^{A, 2}) = u_1^{A, 1}; \end{aligned}$$

while Team B choose \({\mathbf {U}}_2^{B}\) uniformly at random independent of \({\mathbf {U}}_1\). In words, from Team B’s point of view, Team A chooses \({\mathbf {U}}_1^{A}\) to be a uniform random vector independent of \({\mathbf {X}}_1^A\). However, the randomization is done in a coordinated manner: Before the game starts, both members of team A randomly draw a card from two cards, where one card says “lie” and the other says “tell the truth.” Both players then tell each other what card they have drawn before the game starts. At time \(t=1\), both players in Team A play the strategy indicated by their cards. At time \(t=2\), Team A can then perfectly recover \({\mathbf {X}}_1^A\) from \({\mathbf {U}}_1^A\) and the knowledge about the strategy being used at \(t=1\).

Now, we describe Team A’s equilibrium strategy by the equivalent coordinator A’s behavioral strategy. Use \(\mathbf {ng}\) to denote the prescription that maps \(-1\) to 1 and 1 to \(-1\). Use \(\mathbf {id}\) to denote the identity map prescription, i.e., the prescription that maps \(-1\) to \(-1\) and 1 to 1. Use \(\mathbf {cp}_{b}\) to denote the constant prescription that always instruct individuals to play \(b\in \{-1, 1\}\). The mixed strategy profile \(\sigma ^{*A}\) is equivalent to the following behavioral coordination strategy: At time \(t=1\), \(g_1^A(\varnothing )\in \varDelta ({\mathcal {A}}_1^{A, 1} \times {\mathcal {A}}_1^{A, 2} )\) satisfies

$$\begin{aligned}&g_1^A(\varnothing )(\gamma _1^{A, 1}, \gamma _1^{A, 2}) = \frac{1}{4}\qquad \forall \gamma _1^{A, 1}, \gamma _1^{A, 2} \in \{\mathbf {ng}, \mathbf {id} \}. \end{aligned}$$

At time \(t=2\), \(g_2^A: {\mathcal {U}}_1^{A, 1}\times {\mathcal {U}}_1^{A, 2}\times {\mathcal {A}}_1^{A, 1}\times {\mathcal {A}}_1^{A, 2} \mapsto \varDelta ({\mathcal {A}}_2^{A, 1} \times {\mathcal {A}}_2^{A, 2})\) is a deterministic strategy that satisfies

$$\begin{aligned}&g_2^A(u^1, u^2, \mathbf {ng}, \mathbf {id} ) = \textsc {dm}(\mathbf {cp}_{u^2}, \mathbf {cp}_{-u^1}),\\&g_2^A(u^1, u^2, \mathbf {ng}, \mathbf {ng} ) = \textsc {dm}(\mathbf {cp}_{-u^2}, \mathbf {cp}_{-u^1}),\\&g_2^A(u^1, u^2, \mathbf {id}, \mathbf {id} ) = \textsc {dm}(\mathbf {cp}_{u^2}, \mathbf {cp}_{u^1}),\\&g_2^A(u^1, u^2, \mathbf {id}, \mathbf {ng} ) = \textsc {dm}(\mathbf {cp}_{-u^2}, \mathbf {cp}_{u^1}), \end{aligned}$$

where \(\textsc {dm}: {\mathcal {A}}_2^{A, 1}\times {\mathcal {A}}_2^{A, 2} \mapsto \varDelta ({\mathcal {A}}_2^{A, 1}\times {\mathcal {A}}_2^{A, 2})\) represents the delta measure. In words, the coordinator of Team A randomly chooses one of all four possible prescription profiles at time \(t=1\). At time \(t=2\), based on the observed action and the prescriptions chosen before, the coordinator of Team A directly assign actions to agents to instruct them to recover the state from the actions at \(t=1\). Note that the behavioral coordination strategy at \(t=2\) depends explicitly on the past prescription \(\varvec{\varGamma }_1^A\) in addition to the realization of past actions. This is because the coordinator needs to remember not only the agents’ actions, but also the rationale behind those actions in order to interpret the signals sent through the actions.

1.2.1 Proof of Claim in Example 2

Define two pure strategies \(\mu ^A\) and \({\tilde{\mu }}^A\) of Team A as follows:

$$\begin{aligned} \mu ^{A, 1}(x_1^{A, 1})&= x_1^{A, 1}, \quad \mu ^{A, 2}(x_1^{A, 2}) = - x_1^{A, 2},\\ {\tilde{\mu }}^{A, 1}(x_1^{A, 1})&= - x_1^{A, 1}, \quad {\tilde{\mu }}^{A, 2}(x_1^{A, 2}) = x_1^{A, 2}. \end{aligned}$$

Now, assume that Team A and Team B are restricted to use independently randomized strategies (type 2 strategies defined in Sect. 2.2). We will show in two steps that there exist no equilibria within this class of strategies.

Step 1: If Team A and Team B’s type 2 strategies form an equilibrium, then Team A is playing either \(\mu ^A\) or \({\tilde{\mu }}^A\).

Let \(p_j(x)\) denote the probability that player (A, j) plays \(U_1^{A, j} = -x\) given \(X_1^{A, j} = x\). Define

$$\begin{aligned} q_j = \frac{1}{2}p_j(-1) + \frac{1}{2}p_j(+1), \end{aligned}$$

i.e., the ex-ante probability that player (A, j) “lies.”

Then, we have

$$\begin{aligned} {\mathbb {E}}[r_1^A({\mathbf {X}}_1, {\mathbf {U}}_1)]=q_1(1-q_2) + q_2(1-q_1). \end{aligned}$$

Under an equilibrium, Team B will optimally respond to Team A strategy’s described through \((p_1, p_2)\). We can find a lower bound of Team B’s reward by fixing a strategy: Consider the “random guess” strategy of Team B, where each of (Bj) (for \(j=1,2\)) chooses \(U_2^{B, j}\) uniformly at random irrespective of \({\mathbf {U}}_1^A\) and independent of the other team member. Team B can thus guarantee an expected reward of \(\frac{1}{2} + \frac{1}{2} = 1\) given any strategy of Team A. Since \(r_2^A({\mathbf {X}}_2, {\mathbf {U}}_2) = -r_2^B({\mathbf {X}}_2, {\mathbf {U}}_2)\), we conclude that Team A’s total reward in an equilibrium is upper bounded by

$$\begin{aligned} q_1(1-q_2) + q_2(1-q_1) - 1 = -q_1q_2-(1-q_1)(1-q_2)\le 0 \end{aligned}$$

Let \(\sigma ^B\) denote the strategy of Team B. Let \(\pi _j(u^1, u^2)\) denote the probability that player (Bj) plays \(U_2^{B, j} = -u^j\) given \(U_1^{A, 1} = u^{1}, U_1^{A, 2} = u^{2}\) (i.e., the probability that player (B, j) believes that (A, j) was “lying” hence guesses the opposite of what was signaled). If Team A plays \(\mu ^A\), then the total reward of Team A is

$$\begin{aligned} J^A(\mu ^A, \sigma ^B)&= 1 - {\mathbb {E}}[1 -\pi _1(X_1^{A, 1}, -X_1^{A, 2}) + \pi _2(X_1^{A, 1}, -X_1^{A, 2})]\\&=\dfrac{1}{4}\sum _{{\mathbf {x}}\in \{-1, 1\}^2 } (-\pi _1({\mathbf {x}}) + \pi _2({\mathbf {x}})). \end{aligned}$$

If Team A plays \({\tilde{\mu }}^A\), then the total reward of Team A is

$$\begin{aligned} J^A({\tilde{\mu }}^A, \sigma ^B)&= 1 - {\mathbb {E}}[\pi _1(-X_1^{A, 1}, X_1^{A, 2}) + 1 - \pi _2(-X_1^{A, 1}, X_1^{A, 2})]\\&=\dfrac{1}{4}\sum _{{\mathbf {x}}\in \{-1, 1\}^2 } (\pi _1({\mathbf {x}}) - \pi _2({\mathbf {x}})). \end{aligned}$$

Observe that \(J^A(\mu ^A, \sigma ^B) + J^A({\tilde{\mu }}^A, \sigma ^B) = 0\). Hence, for any \(\sigma ^B\), either \(J^A(\mu ^A, \sigma ^B)\ge 0\) or \(J^A({\tilde{\mu }}^A, \sigma ^B)\ge 0\). In particular, we can conclude that Team A’s total reward is at least 0 in any equilibrium.

We have established both an upper bound and lower bound for Team A’s total reward in an equilibrium. Hence, we must have

$$\begin{aligned} -q_1q_2-(1-q_1)(1-q_2) = 0, \end{aligned}$$

which implies \(q_1=0, q_2=1\) or \(q_1=1, q_2=0\). The former case corresponds to Team A playing the pure strategy \(\mu ^A\), and the latter to playing \({\tilde{\mu }}^A\).

Step 2: There does not exist equilibria where Team A plays \(\mu ^A\) or \({\tilde{\mu }}^A\).

Suppose that Team A plays \(\mu ^A\). Then, the only best response of Team B is to play \(U_2^{B, 1} = U_1^{A, 1}, U_2^{B, 2} = -U_1^{A, 2}\). Then, Team A’s total reward is \(J^A(\mu ^A, \sigma ^B) = 1 - 1 - 1 = -1\). If Team A deviate to \({\tilde{\mu }}^A\), then Team A can obtain a total reward of \(+1\) (recall that \(J^A(\mu ^A, \sigma ^B) + J^A({\tilde{\mu }}^A, \sigma ^B) = 0\) for any \(\sigma ^B\)). Hence, Team A does not play \(\mu ^A\) at equilibrium.

Similar arguments apply to \({\tilde{\mu }}^A\), which completes the proof.

Proof of Lemma 1

Given a pure strategy profile \(\mu ^i\) of team i, define a pure coordination strategy profile \(\nu ^i\) by

$$\begin{aligned} \nu _t^{i}(h_t^i, \gamma _{1:t-1}^i)&= (\mu _t^{i, j}(h_t^i, \cdot ))_{(i, j)\in {\mathcal {N}}_i}\quad \forall h_t^i\in {\mathcal {H}}_t^i, \gamma _{1:t-1}^i\in {\mathcal {A}}_{1:t-1}^i. \end{aligned}$$

We first prove that for every pure strategy profile \(\mu ^i\), there exist a payoff-equivalent coordination strategy profile \(\nu ^i\) by coupling two systems. In one of the systems, we assume that team i uses a pure strategy. In the other system, we assume that team/coordinator i uses the corresponding pure coordination strategies. We assume that all teams other than i use the same pure strategy profile \(\mu ^{-i} = (\mu ^k)_{k\in {\mathcal {I}}\backslash \{i\} }\) in both systems. The realizations of primitive random variables (i.e., \((X_1^i)_{i\in {\mathcal {I}}}, (W_t^{i, X}, W_t^{i, Y})_{i\in {\mathcal {I}}, t\in {\mathcal {T}}}\)) are assumed to be the same for two systems. We proceed to show that the realizations of all system variables (i.e., \(({\mathbf {X}}_t, {\mathbf {Y}}_t, {\mathbf {U}}_t)_{t\in {\mathcal {T}}}\)) will be the same for both systems. As a result, the expected payoffs are the same for both systems.

We prove that the realizations of \(({\mathbf {X}}_t, {\mathbf {Y}}_t, {\mathbf {U}}_t)_{t\in {\mathcal {T}}}\) are the same by induction on time t.

Induction Base: At \(t=1\), the realizations of \({\mathbf {X}}_1\) are the same for two systems by assumption. For the first system, we have

$$\begin{aligned} U_1^{i, j} = \mu _1^{i, j}(X_1^{i, j})\quad \forall (i, j)\in {\mathcal {N}}_i, \end{aligned}$$

and for the second system we have

$$\begin{aligned} \varvec{\varGamma }_1^{i}&= \nu _t^i(H_1^{i}) = (\mu _t^{i, j}(\cdot ))_{(i, j)\in {\mathcal {N}}_i}, \\ U_1^{i, j}&= \varGamma _1^{i, j}(X_1^{i, j})\quad \forall (i, j)\in {\mathcal {N}}_i, \end{aligned}$$

which means that \(U_1^{i, j} = \mu _1^i(X_1^{i, j})\) also holds in the second system for all \((i, j)\in {\mathcal {N}}_i\).

It is clear that \({\mathbf {U}}_1^{-i}\) are the same for both systems since in both systems,

$$\begin{aligned} U_1^{k, j}&= \mu _1^{k, j}(X_1^{k, j})\quad \forall (k, j)\in {\mathcal {N}}\backslash {\mathcal {N}}_i. \end{aligned}$$

We conclude that \({\mathbf {U}}_1\) are the same for both systems. Since \((W_1^{k, Y})_{k\in {\mathcal {I}}}\) are the same for both systems, \(Y_1^k=\ell _1^i(X_1^k, {\mathbf {U}}_1, W_1^{k, Y}), k\in {\mathcal {I}}\) are the same for both systems.

Induction Step: Suppose that \({\mathbf {X}}_s, {\mathbf {Y}}_s, {\mathbf {U}}_s\) are the same for both systems for all \(s<t\). Now, we prove it for t.

First, since the realizations of \({\mathbf {X}}_{t-1}^i, {\mathbf {U}}_{t-1}, W_{t-1}^{i, X}\) are the same for both systems and

$$\begin{aligned} {\mathbf {X}}_t^k = f_t^k({\mathbf {X}}_{t-1}^k, {\mathbf {U}}_{t-1}, W_{t-1}^{k, X})\quad \forall k\in {\mathcal {I}}, \end{aligned}$$

\({\mathbf {X}}_t\) are the same for both systems.

Consider the actions taken by the members of team i at time t. For the first system,

$$\begin{aligned} U_t^{i, j} = \mu _t^{i, j}(H_t^{i, j}) = \mu _t^{i, j}(H_t^{i}, X_{t-d+1:t}^{i, j})\quad \forall (i, j)\in {\mathcal {N}}_i. \end{aligned}$$

In the second system,

$$\begin{aligned} \varvec{\varGamma }_t^{i}&= \nu _t^i(H_t^{i}) = (\mu _t^{i, j}(H_t^i, \cdot ))_{(i, j)\in {\mathcal {N}}_i} \\ U_t^{i, j}&= \varGamma _t^{i, j}(X_{t-d+1:t}^{i, j})\quad \forall (i, j)\in {\mathcal {N}}_i, \end{aligned}$$

which means that

$$\begin{aligned} U_t^{i, j} = \mu _t^{i, j}(H_t^i, X_{t-d+1:t}^{i, j})\quad \forall (i, j)\in {\mathcal {N}}_i. \end{aligned}$$

The actions taken by the members of other teams at time t are

$$\begin{aligned} U_t^{k, j} = \mu _t^{k, j}(H_t^k, X_{t-d+1:t}^{k, j})\quad \forall (k, j)\in {\mathcal {N}}\backslash {\mathcal {N}}_i. \end{aligned}$$

for both systems.

We conclude that \({\mathbf {U}}_t\) has the same realization for two systems since \((H_t^k, X_{t-d+1:t}^{k, j})_{k\in {\mathcal {I}}}\) have the same realization by the induction hypothesis and the argument above. Since \((W_t^{i, Y})_{i\in {\mathcal {I}}}\) are the same for both systems, \(Y_t^k=\ell _t^k(X_t^k, {\mathbf {U}}_t, W_t^{k, Y}), k\in {\mathcal {I}}\) are same for both systems.

Therefore, we have established the induction step, proving that \(\mu ^i\) and \(\nu ^i\) generate the same realization of \(({\mathbf {X}}_t, {\mathbf {Y}}_t, {\mathbf {U}}_t)_{t\in {\mathcal {T}}}\) under the same realization of the primitive random variables. Therefore, \(\nu ^i\) is a payoff-equivalent pure coordination strategy profile of \(\mu ^i\).

To complete the other half of the proof, for each given coordination strategy \(\nu ^i\) of team/coordinator i we define a pure team strategy \(\mu ^i=(\mu _t^{i, j})_{(i, j)\in {\mathcal {N}}_i, t\in {\mathcal {T}}}\) through

$$\begin{aligned} \mu _t^{i, j}(h_t^{i, j}) = \gamma _{t}^{i, j}(x_{t-d+1:t}^{i, j})\qquad \forall h_t^{i, j}\in {\mathcal {H}}_t^{i, j}\qquad \forall (i, j)\in {\mathcal {N}}_i, \end{aligned}$$

where \(\gamma _{t}^i=(\gamma _{t}^{i, j})_{(i, j) \in {\mathcal {N}}_i}\) is recursively defined by \(\nu _{1:t}^i\) and \(h_t^i\) through

$$\begin{aligned} \gamma _{t}^i = \nu _t^i(h_t^i, \gamma _{1:t-1}^i)\quad \forall t\in {\mathcal {T}}. \end{aligned}$$

Then using an argument similar to the one for the proof of the first half we can show that \(\mu ^i\) is payoff-equivalent to \(\nu ^i\).

Proof of Lemma 3

Induction on time t.

Induction Base: At \(t=1\), we have \({\mathbf {X}}_{1}^k\) to be independent for different k because of the assumption on primitive random variables. Furthermore, since \(H_1^k\) is a deterministic random vector (see Remark 2) and the randomization of different coordinators are independent, we conclude that \(({\mathbf {X}}_{1}^k, \varvec{\varGamma }_{1}^k)\) are mutually independent for different k. The distribution of \(({\mathbf {X}}_{1}^k, \varvec{\varGamma }_{1}^k)\) depends on g only through \(g^k\).

Induction Step: Suppose that \(({\mathbf {X}}_{1:t}^k, \varvec{\varGamma }_{1:t}^k)\) are conditionally independent given \(H_t^0\) and \({\mathbb {P}}^g({\mathbf {X}}_{1:t}^k, \varvec{\varGamma }_{1:t}^k|H_t^0)\) depends on g only through \(g^k\). Now, we have

$$\begin{aligned}&{\mathbb {P}}^g(x_{1:t+1}, \gamma _{1:t+1}|h_{t+1}^0)\\ =&\,{\mathbb {P}}^g(x_{t+1}|h_{t+1}^0, x_{1:t}, \gamma _{1:t+1}){\mathbb {P}}^g(\gamma _{t+1}|h_{t+1}^0, x_{1:t}, \gamma _{1:t}) {\mathbb {P}}^g(x_{1:t}, \gamma _{1:t}|h_{t+1}^0)\\ =&\left( \prod _{k\in {\mathcal {I}}} {\mathbb {P}}(x_{t+1}^k|x_t^k, u_t)g_{t+1}^k(\gamma _{t+1}^k|h_{t+1}^0, x_{1:t-d+1}^k, \gamma _{1:t}^k)\right) {\mathbb {P}}^g(x_{1:t}, \gamma _{1:t}|h_{t+1}^0). \end{aligned}$$

We then claim that

$$\begin{aligned} {\mathbb {P}}^g(x_{1:t}, \gamma _{1:t}, y_t, u_t|h_{t}^0) = \prod _{k\in {\mathcal {I}}} F_t^k(x_{1:t}^k, \gamma _{1:t}^k, h_{t+1}^0) \end{aligned}$$

where for each \(k\in {\mathcal {I}}\), \(F_t^k\) is a function that depends only on \(g^k\).

To establish the claim, we note that

$$\begin{aligned}&{\mathbb {P}}^g(x_{1:t}, \gamma _{1:t}, y_t, u_t|h_{t}^0) \\ =&\, {\mathbb {P}}^g(y_t, u_t|h_{t}^0, x_{1:t}, \gamma _{1:t}) {\mathbb {P}}^g(x_{1:t}, \gamma _{1:t}|h_t^0)\\ =&\left( \prod _{k\in {\mathcal {I}}}{\mathbb {P}}(y_t^k|x_t^k, u_t) \varvec{1}_{\{u_{t}^k = \gamma _{t}^k(x_{t-d+1:t}^k)\}}\right) {\mathbb {P}}^g(x_{1:t}, \gamma _{1:t}|h_t^0)\\ =&\left( \prod _{k\in {\mathcal {I}}}{\mathbb {P}}(y_t^k|x_t^k, u_t) \varvec{1}_{\{u_{t}^k = \gamma _{t}^k(x_{t-d+1:t}^k)\}}\right) \left( \prod _{k\in {\mathcal {I}}}{\mathbb {P}}^{g_k}(x_{1:t}^k, \gamma _{1:t}^k|h_t^0) \right) \\ =&\prod _{k\in {\mathcal {I}}} F_t^k(x_{1:t}^k, \gamma _{1:t}^k, h_{t+1}^0), \end{aligned}$$

where in the third step we have used the induction hypothesis.

Given the claim, we have

$$\begin{aligned} {\mathbb {P}}^g(x_{1:t}, \gamma _{1:t}|h_{t+1}^0)&=\dfrac{{\mathbb {P}}^g(x_{1:t}, \gamma _{1:t}, y_t, u_t|h_{t}^0)}{\sum _{{\tilde{x}}_{1:t}, {\tilde{\gamma }}_{1:t}} {\mathbb {P}}^g({\tilde{x}}_{1:t}, {\tilde{\gamma }}_{1:t}, y_t, u_t|h_{t}^0)}\\&=\dfrac{\prod _{k\in {\mathcal {I}}} F_t^k(x_{1:t}^k, \gamma _{1:t}^k, h_{t+1}^0)}{\sum _{{\tilde{x}}_{1:t}, {\tilde{\gamma }}_{1:t}}\prod _{k\in {\mathcal {I}}} F_t^k({\tilde{x}}_{1:t}^k, {\tilde{\gamma }}_{1:t}^k, h_{t+1}^0)}\\&=\dfrac{\prod _{k\in {\mathcal {I}}} F_t^k(x_{1:t}^k, \gamma _{1:t}^k, h_{t+1}^0)}{\prod _{k\in {\mathcal {I}}} \left( \sum _{{\tilde{x}}_{1:t}^k, {\tilde{\gamma }}_{1:t}^k}F_t^k({\tilde{x}}_{1:t}^k, {\tilde{\gamma }}_{1:t}^k, h_{t+1}^0)\right) }\\&=\prod _{k\in {\mathcal {I}}} \left( \dfrac{ F_t^k(x_{1:t}^k, \gamma _{1:t}^k, h_{t+1}^0)}{\sum _{{\tilde{x}}_{1:t}^k, {\tilde{\gamma }}_{1:t}^k}F_t^k({\tilde{x}}_{1:t}^k, {\tilde{\gamma }}_{1:t}^k, h_{t+1}^0)}\right) \end{aligned}$$

and then

$$\begin{aligned} {\mathbb {P}}^g(x_{1:t+1}, \gamma _{1:t+1}|h_{t+1}^0) = \prod _{k\in {\mathcal {I}}} G_t^k(x_{1:t+1}^k, \gamma _{1:t+1}^k, h_{t+1}^0), \end{aligned}$$

where \(G_t^k\) is given by

$$\begin{aligned} G_t^k(x_{1:t+1}^k, \gamma _{1:t+1}^k, h_{t+1}^0) =&{\mathbb {P}}(x_{t+1}^k|x_t^k, u_t)g_{t+1}^k(\gamma _{t+1}^k|h_{t+1}^0, x_{1:t-d+1}^k, \gamma _{1:t}^k) \times \\&\times \dfrac{ F_t^k(x_{1:t}^k, \gamma _{1:t}^k, h_{t+1}^0)}{\sum _{{\tilde{x}}_{1:t}^k, {\tilde{\gamma }}_{1:t}^k}F_t^k({\tilde{x}}_{1:t}^k, {\tilde{\gamma }}_{1:t}^k, h_{t+1}^0)}. \end{aligned}$$

One can check that \(G_t^k\) depends on g only through \(g^k\) and

$$\begin{aligned} \sum _{{\tilde{x}}_{1:t+1}^k, {\tilde{\gamma }}_{1:t+1}^k} G_t^k({\tilde{x}}_{1:t+1}^k, {\tilde{\gamma }}_{1:t+1}^k, h_{t+1}^0) = 1, \end{aligned}$$

therefore

$$\begin{aligned} G_t^k(x_{1:t+1}^k, \gamma _{1:t+1}^k, h_{t+1}^0) = {\mathbb {P}}^{g^k}(x_{1:t+1}^k, \gamma _{1:t+1}^k| h_{t+1}^0). \end{aligned}$$

Hence, we establish the induction step.

Proof of Lemma 4

Assume that \({\overline{h}}_t^i\in \overline{{\mathcal {H}}}_t^i\) is admissible under g. From Lemma 3, we know that \({\mathbb {P}}^{g} (x_{1:t}^i, \gamma _{1:t}^i|h_t^0)\) does not depend on \(g^{-i}\). As a conditional distribution obtained from \({\mathbb {P}}^{g} (x_{1:t}^i, \gamma _{1:t}^i|h_t^0)\), \({\mathbb {P}}^{g}(x_{t-d+1:t}^i|{\overline{h}}_t^i)\) does not depend on \(g^{-i}\) either.

Therefore, we can compute the belief of coordinator i by replacing \(g^{-i}\) with \({\hat{g}}^{-i}\), which is an open-loop strategy profile that always generates the actions \(u_{1:t-1}^{-i}\).

$$\begin{aligned}&{\mathbb {P}}^{g^i, g^{-i}}(x_{t-d+1:t}^i|{\overline{h}}_t^i)={\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{t-d+1:t}^i|{\overline{h}}_t^i). \end{aligned}$$

Note that we always have \({\mathbb {P}}^{g^i, {\hat{g}}^{-i}}({\overline{h}}_t^i) > 0\) for all \({\overline{h}}_t^i\) admissible under g.

Furthermore, we can also introduce additional random variables into the condition that are conditionally independent according to Lemma 3, i.e.,

$$\begin{aligned}&{\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{t-d+1:t}^i|{\overline{h}}_t^i)={\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{t-d+1:t}^i|{\overline{h}}_t^i, x_{t-d:t}^{-i}), \end{aligned}$$

where \(x_{t-d:t}^{-i}\in {\mathcal {X}}_{t-d:t}^{-i}\) is such that \({\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{t-d:t}^{-i}|{\overline{h}}_t^i) > 0\).

Let \(\tau = t-d+1\). By Bayes’ rule

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{\tau :t}^i|{\overline{h}}_t^i, x_{\tau -1:t}^{-i})\nonumber \\&\quad =\dfrac{{\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{\tau :t}, y_{\tau :t-1}, u_{\tau :t-1}, \gamma _{\tau :t-1}^i|h_{\tau }^{*i} )}{ \sum _{{\tilde{x}}_{\tau :t}^i} {\mathbb {P}}^{g^i, {\hat{g}}^{-i}}({\tilde{x}}_{\tau :t}^i, x_{\tau :t}^{-i}, y_{\tau :t-1}, u_{\tau :t-1}, \gamma _{\tau :t-1}^i|h_{\tau }^{*i} )}, \end{aligned}$$
(10)

where

$$\begin{aligned} h_{\tau }^{*i} = (y_{1:\tau -1}, u_{1:\tau -1}, x_{1:\tau -1}^i, x_{\tau -1}^{-i}, \gamma _{1:\tau -1}^i). \end{aligned}$$

We have

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{\tau :t}, y_{\tau :t-1}, u_{\tau :t-1}, \gamma _{\tau :t-1}^i|h_{\tau }^{*i} )\nonumber \\&\quad =\prod _{l=1}^{d-1} \Big [{\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{t-l+1}, y_{t-l}|h_\tau ^{*i}, x_{\tau :t-l}, y_{\tau :t-l-1}, u_{\tau :t-l}, \gamma _{\tau :t-l}^i)\nonumber \\&\qquad ~\times {\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(u_{t-l}^i|h_\tau ^{*i}, x_{\tau :t-l}, y_{\tau :t-l-1}, u_{\tau :t-l-1}, \gamma _{\tau :t-l}^i)\nonumber \\&\qquad ~\times {\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(\gamma _{t-l}^i|h_\tau ^{*i}, x_{\tau :t-l}, y_{\tau :t-l-1}, u_{\tau :t-l-1}, \gamma _{\tau :t-l-1}^i)\Big ]\nonumber \\&\qquad ~\times {\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{\tau }|h_{\tau }^{*i}). \end{aligned}$$
(11)

The first three terms in the above product are

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{t-l+1}, y_{t-l}|h_\tau ^{*i}, x_{\tau :t-l}, y_{\tau :t-l-1}, u_{\tau :t-l}, \gamma _{\tau :t-l}^i)\!\nonumber \\&\quad =\prod _{k\in {\mathcal {I}}}[{\mathbb {P}}(x_{t-l+1}^k|x_{t-l}^k, u_{t-l}){\mathbb {P}}(y_{t-l}^k|x_{t-l}^k, u_{t-l}) ],\nonumber \\&\qquad \quad {\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(u_{t-l}^i|h_\tau ^{*i}, x_{\tau :t-l}, y_{\tau :t-l-1}, u_{\tau :t-l-1}, \gamma _{\tau :t-l}^i)\nonumber \\&\quad =\prod _{(i, j)\in {\mathcal {N}}_i }\varvec{1}_{\{u_{t-l}^{i, j} = \gamma _{t-l}^{i, j}(x_{t-l-d+1:t-l}^{i, j}) \} } \nonumber \\&\quad = \prod _{(i, j)\in {\mathcal {N}}_i } \varvec{1}_{\{u_{t-l}^{i, j} = \phi _{t-l, l}^i(x_{\tau :t-l}^i) \} },\nonumber \\&\qquad \quad {\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(\gamma _{t-l}^i|h_\tau ^{*i}, x_{\tau :t-l}, y_{\tau :t-l-1}, u_{\tau :t-l-1}, \gamma _{\tau :t-l-1}^i)\nonumber \\&\quad =g_{t-l}^i(\gamma _{t-l}^i|y_{1:t-l-1}, u_{1:t-l-1}, x_{1:t-d-l}^i, \gamma _{1:t-l-1}^i),\quad \end{aligned}$$
(12)

respectively.

The last term satisfies

$$\begin{aligned}&{\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{\tau }|h_{\tau }^{*i}) = \prod _{k\in {\mathcal {I}}}{\mathbb {P}}(x_{\tau }^k|x_{\tau -1}^k, u_{\tau -1}). \end{aligned}$$

Substituting (11) - (12) into (10), we obtain

$$\begin{aligned} {\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{\tau :t}^i|{\overline{h}}_t^i, x_{\tau -1:t}^{-i})=\dfrac{F_t^i(x_{\tau :t}^i, y_{\tau :t-1}^i, u_{\tau -1:t-1}, x_{\tau -1}^i, \phi _{t}^i)}{\sum _{{\tilde{x}}_{\tau :t}^i } F_t^i({\tilde{x}}_{\tau :t}^i, y_{\tau :t-1}^i, u_{\tau -1:t-1}, x_{\tau -1}^i, \phi _{t}^i)} \end{aligned}$$

where

$$\begin{aligned}&F_t^i(x_{\tau :t}^i, y_{\tau :t-1}^i, u_{\tau -1:t-1}, \phi _{t}^i):= {\mathbb {P}}(x_{\tau }^i|x_{\tau -1}^i, u_{\tau -1})\\&\quad \quad \quad \quad \quad \times \prod _{l=1}^{d-1}\left( {\mathbb {P}}(x_{t-l+1}^i|x_{t-l}^i, u_{t-l}){\mathbb {P}}(y_{t-l}^i|x_{t-l}^i, u_{t-l}) \prod _{(i, j)\in {\mathcal {N}}_i } \varvec{1}_{\{u_{t-l}^{i, j} = \phi _{t-l, l}^{i, j}(x_{\tau :t-l}^{i, j}) \} } \right) \end{aligned}$$

Therefore, we have proved that

$$\begin{aligned} {\mathbb {P}}^{g}(x_{t-d+1:t}^i|{\overline{h}}_t^i)&=P_t^i(x_{t-d+1:t}^i|y_{t-d+1:t-1}^i, u_{t-d:t-1}, x_{t-d}^i, \phi _{t}^i)\\&:=\dfrac{F_t^i(x_{t-d+1:t}^i, y_{t-d+1:t-1}^i, u_{t-d:t-1}, x_{t-d}^i, \phi _{t}^i)}{\sum _{{\tilde{x}}_{t-d+1:t}^i } F_t^i({\tilde{x}}_{t-d+1:t}^i, y_{t-d+1:t-1}^i, u_{t-d:t-1}, x_{t-d}^i, \phi _{t}^i)} \end{aligned}$$

where \(P_t^i\) is independent of g.

Proof of Lemma 5

For notational convenience, define

$$\begin{aligned} {\overline{H}}_t^{-i} = ({\mathbf {Y}}_{1:t-1}, {\mathbf {U}}_{1:t-1}, {\mathbf {X}}_{1:t-d}^{-i}, \varvec{\varGamma }_{1:t-1}^{-i}) \end{aligned}$$

Claim 1

\({\mathbb {P}}^g(x_{t-d+1:t}|\gamma _t, s_t^i, {\overline{h}}_{t}^{-i})\) does not depend on g.Footnote 9

Claim 2

\({\mathbb {P}}^g(\gamma _t, s_t^i, {\overline{h}}_{t}^{-i}) = {\mathbb {P}}^{\rho ^i, g^{-i}}(\gamma _t, s_t^i, {\overline{h}}_{t}^{-i})\) for all \(\gamma _t\in {\mathcal {A}}_t, s_t^i\in {\mathcal {S}}_t^i, {\overline{h}}_{t}^{-i}\in {\mathcal {H}}_t^{-i}\).

Given Claims 1 and 2, we conclude that

$$\begin{aligned} {\mathbb {P}}^g(x_{t-d+1:t}, \gamma _t, s_t^i, {\overline{h}}_{t}^{-i})={\mathbb {P}}^{\rho ^i, g^{-i}}(x_{t-d+1:t}, \gamma _t, s_t^i, {\overline{h}}_{t}^{-i}) \end{aligned}$$
(13)

for all \(x_{t-d+1:t}\in {\mathcal {X}}_{t-d+1:t}, \gamma _t\in {\mathcal {A}}_t, s_t^i\in {\mathcal {S}}_t^i, {\overline{h}}_{t}^{-i}\in {\mathcal {H}}_t^{-i}\).

Marginalizing (13), we obtain

$$\begin{aligned} {\mathbb {P}}^g(x_{t-d+1:t}, \gamma _t) = {\mathbb {P}}^{\rho ^i, g^{-i}}(x_{t-d+1:t}, \gamma _t). \end{aligned}$$
(14)

Since \(U_t^{k, j} = \varGamma _t^{k, j}(X_{t-d+1:t}^{k, j})\) for all \((k, j)\in {\mathcal {N}}\), we can write \(r_t^k({\mathbf {X}}_t, {\mathbf {U}}_t) = {\tilde{r}}_t^k({\mathbf {X}}_{t-d+1:t}, \varvec{\varGamma }_t)\) for some function \({\tilde{r}}_t^k\) that does not depend on the strategy profile. Then, using linearity of expectation and (14) we obtain

$$\begin{aligned} J^k(g)&= {\mathbb {E}}^g\left[ \sum _{t\in {\mathcal {T}}}{\tilde{r}}_t^k({\mathbf {X}}_{t-d+1:t}, \varvec{\varGamma }_t) \right] \\&=\sum _{t\in {\mathcal {T}}} {\mathbb {E}}^g[{\tilde{r}}_t^k({\mathbf {X}}_{t-d+1:t}, \varvec{\varGamma }_t)] = \sum _{t\in {\mathcal {T}}} {\mathbb {E}}^{\rho ^i, g^{-i}}[{\tilde{r}}_t^k({\mathbf {X}}_{t-d+1:t}, \varvec{\varGamma }_t)]\\&={\mathbb {E}}^{\rho ^i, g^{-i}}\left[ \sum _{t\in {\mathcal {T}}}{\tilde{r}}_t^k({\mathbf {X}}_{t-d+1:t}, \varvec{\varGamma }_t) \right] =J^k(\rho ^i, g^{-i}) \end{aligned}$$

for all behavioral coordination strategy profile \(g^{-i}\). Hence, \(g^i\) and \(\rho ^i\) are payoff-equivalent.

Proof of Claim 1

For notational convenience, define

$$\begin{aligned} {\overline{H}}_t = \bigcup _{i\in {\mathcal {I}}} {\overline{H}}_t^i = ({\mathbf {Y}}_{1:t-1}, {\mathbf {U}}_{1:t-1}, {\mathbf {X}}_{1:t-d}, \varvec{\varGamma }_{1:t-1}). \end{aligned}$$

Consider \({\mathbb {P}}^g(x_{t-d+1:t}|\gamma _t, {\overline{h}}_t)\) first. Since \(\varvec{\varGamma }_t\) is a randomized prescription generated based on \({\overline{H}}_t\) which enters the system after \({\mathbf {X}}_{t-d+1:t}\) are realized, we have

$$\begin{aligned} {\mathbb {P}}^g(x_{t-d+1:t}|\gamma _t, {\overline{h}}_t) = {\mathbb {P}}^g(x_{t-d+1:t}|{\overline{h}}_t). \end{aligned}$$
(15)

Due to Lemma 3, we have

$$\begin{aligned} {\mathbb {P}}^g(x_{t-d+1:t}|{\overline{h}}_t) = \prod _{k\in {\mathcal {I}}} {\mathbb {P}}^g(x_{t-d+1:t}^k|{\overline{h}}_t^k). \end{aligned}$$
(16)

By Lemma 4,

$$\begin{aligned} {\mathbb {P}}^g(x_{t-d+1:t}^k|{\overline{h}}_t^k) = P_t^k(x_{t-d+1:t}^k|y_{t-d+1:t-1}^k, u_{t-d+1:t-1}, s_t^i) \end{aligned}$$
(17)

where \(P_t^k\) is a function that does not depend on g.

Combining (15), (16), (17), we have

$$\begin{aligned} {\mathbb {P}}^g(x_{t-d+1:t}|\gamma _t, {\overline{h}}_t) = \prod _{k\in {\mathcal {I}}} P_t^k(x_{t-d+1:t}^k|y_{t-d+1:t-1}^k, u_{t-d+1:t-1}, s_t^k). \end{aligned}$$

Since \((S_t^i, {\overline{H}}_t^{-i})\) is a function of \({\overline{H}}_t\), by Smoothing Property of Conditional Probability we conclude that

$$\begin{aligned} {\mathbb {P}}^g(x_{t-d+1:t}|\gamma _t, s_t^i, {\overline{h}}_{t}^{-i}) = \prod _{k\in {\mathcal {I}}} P_t^k(x_{t-d+1:t}^k|y_{t-d+1:t-1}^k, u_{t-d+1:t-1}, s_t^k) \end{aligned}$$

where the right-hand side does not depend on g. \(\square \)

Proof of Claim 2

Proof by induction on t.

Induction Base: The claim is true at \(t=1\) since \(\rho _1^i\) and \(g_1^i\) are the same strategies.

Induction Step: Suppose that the claim is true for time \(t-1\). Prove the result for t.

First,

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^g(\gamma _{t}, s_{t}^i, {\overline{h}}_{t}^{-i}) \\&\quad = \sum _{({\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i): {\tilde{s}}_t^i=s_t^i} {\mathbb {P}}^g(\gamma _{t}|s_{t}^i, {\overline{h}}_{t}^{-i}, {\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i) {\mathbb {P}}^g({\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i, s_{t}^i, {\overline{h}}_{t}^{-i})\\&\quad = \sum _{({\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i): {\tilde{s}}_t^i=s_t^i} g_t^i(\gamma _{t}^i| h_{t}^{0}, {\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i)\left( \prod _{k\ne i} g_t^{k}(\gamma _{t}^k|{\overline{h}}_{t}^k)\right) {\mathbb {P}}^g({\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i, s_{t}^i, {\overline{h}}_{t}^{-i})\\&\quad = \sum _{({\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i): {\tilde{s}}_t^i=s_t^i} g_t^i(\gamma _{t}^i| h_{t}^{0},{\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i)\left( \prod _{k\ne i} g_t^{k}(\gamma _{t}^k|{\overline{h}}_{t}^k)\right) {\mathbb {P}}^g({\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i| s_{t}^i, {\overline{h}}_{t}^{-i})\\&\qquad \times {\mathbb {P}}^g(s_{t}^i, {\overline{h}}_{t}^{-i})\\&\quad = \left( \sum _{({\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i): {\tilde{s}}_t^i=s_t^i} g_t^i(\gamma _{t}^i|{\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i, h_{t}^{0}){\mathbb {P}}^g({\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i| s_{t}^i, {\overline{h}}_{t}^{-i})\right) \\&\qquad \times \left( \prod _{k\ne i} g_t^{k}(\gamma _{t}^k|{\overline{h}}_{t}^k)\right) {\mathbb {P}}^g(s_{t}^i, {\overline{h}}_{t}^{-i}). \end{aligned}$$

Using Lemma 3, we have

$$\begin{aligned} {\mathbb {P}}^g({\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i| s_{t}^i, {\overline{h}}_{t}^{-i}) = {\mathbb {P}}^{g^i}({\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i| s_{t}^i, h_{t}^{0}). \end{aligned}$$

Therefore,

$$\begin{aligned} \sum _{({\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i): {\tilde{s}}_t^i=s_t^i} g_t^i(\gamma _{t}^i|{\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i, h_{t}^{0}){\mathbb {P}}^g({\tilde{x}}_{1:t-d}^i,{\tilde{\gamma }}_{1:t-1}^i| s_{t}^i, {\overline{h}}_{t}^{-i})=\rho _{t}^i(\gamma _{t}^i|h_{t}^0, s_{t}^i) \end{aligned}$$

for all \((h_{t}^0, s_{t}^i)\) admissible under g. Notice that \({\mathbb {P}}^g(h_{t}^0, s_{t}^i) = 0\) implies that \({\mathbb {P}}^g(s_{t}^i, {\overline{h}}_{t}^{-i}) = 0\), hence we conclude that

$$\begin{aligned} {\mathbb {P}}^g(\gamma _{t}, s_{t}^i, {\overline{h}}_{t}^{-i}) = \rho _{t}^i(\gamma _{t}^i|h_{t}^0, s_{t}^i)\left( \prod _{k\ne i} g_t^{k}(\gamma _{t}^k|{\overline{h}}_{t}^k)\right) {\mathbb {P}}^g(s_{t}^i, {\overline{h}}_{t}^{-i}) \end{aligned}$$

for all \(\gamma _{t}\in {\mathcal {A}}_t, s_{t}^i\in {\mathcal {S}}_t^i, {\overline{h}}_t^{-i}\in {\mathcal {H}}_t^{-i}\).

Similarly,

$$\begin{aligned} {\mathbb {P}}^{\rho ^i, g^{-i}}(\gamma _{t}, s_{t}^i, {\overline{h}}_{t}^{-i}) = \rho _{t}^i(\gamma _{t}^i|h_{t}^0, s_{t}^i)\left( \prod _{k\ne i} g_t^{k}(\gamma _{t}^k|{\overline{h}}_{t}^k)\right) {\mathbb {P}}^{\rho ^i, g^{-i}}(s_{t}^i, {\overline{h}}_{t}^{-i}) \end{aligned}$$

for all \(\gamma _{t}\in {\mathcal {A}}_t, s_{t}^i\in {\mathcal {S}}_t^i, {\overline{h}}_t^{-i}\in {\mathcal {H}}_t^{-i}\). Hence, it suffices to show that \({\mathbb {P}}^g(s_{t}^i, {\overline{h}}_{t}^{-i}) = {\mathbb {P}}^{\rho ^i, g^{-i}}(s_{t}^i, {\overline{h}}_{t}^{-i})\).

Given the induction hypothesis, it suffices to show that

$$\begin{aligned} {\mathbb {P}}^g(s_{t}^i, {\overline{h}}_{t}^{-i}| \gamma _{t-1}, s_{t-1}^i, {\overline{h}}_{t-1}^{-i}) = {\mathbb {P}}^{\rho ^i, g^{-i}}(s_{t}^i, {\overline{h}}_{t}^{-i}| \gamma _{t-1}, s_{t-1}^i, {\overline{h}}_{t-1}^{-i}) \end{aligned}$$

for all \((\gamma _{t-1}, s_{t-1}^i, {\overline{h}}_{t-1}^{-i})\) admissible under g (or admissible under \((\rho ^i, g^{-i})\), which is an equivalent condition because of the induction hypothesis).

Given that

$$\begin{aligned} S_{t}^i&= \iota _{t-1}^i(S_{t-1}^i, {\mathbf {X}}_{t-d}^i, \varvec{\varGamma }_{t-1}^i),\\ {\overline{H}}_t^{-i}&= ({\overline{H}}_{t-1}^{-i}, {\mathbf {Y}}_{t-1}, {\mathbf {U}}_{t-1}, {\mathbf {X}}_{t-d}^{-i}, \varvec{\varGamma }_{t-1}^{-i}),\\ Y_{t-1}^k&= \ell _{t-1}^k({\mathbf {X}}_{t-1}^k, {\mathbf {U}}_{t-1}^k, W_{t-1}^{k, Y})\quad \forall k\in {\mathcal {I}},\\ U_{t-1}^{k, j}&= \varGamma _{t-1}^{k, j}(X_{t-d:t-1}^{k, j})\quad \forall (k, j)\in {\mathcal {N}}, \end{aligned}$$

it follows that \((S_t^i, {\overline{H}}_t^{-i})\) is a strategy-independent function of \((\varvec{\varGamma }_{t-1}, S_{t-1}^i, {\overline{H}}_{t-1}^{-i}, {\mathbf {X}}_{t-d:t-1},{\mathbf {W}}_{t-1}^Y)\). Since \({\mathbf {W}}_{t-1}^Y=(W_{t-1}^{k, Y})_{k\in {\mathcal {I}}}\) is a primitive random vector independent of \((\varvec{\varGamma }_{t-1}, S_{t-1}^i, {\overline{H}}_{t-1}^{-i})\), it suffices to show that

$$\begin{aligned} {\mathbb {P}}^g(x_{t-d:t-1}| \gamma _{t-1}, s_{t-1}^i, {\overline{h}}_{t-1}^{-i}) = {\mathbb {P}}^{\rho ^i, g^{-i}}(x_{t-d:t-1}| \gamma _{t-1}, s_{t-1}^i, {\overline{h}}_{t-1}^{-i}). \end{aligned}$$
(18)

We know that (18) is true due to Claim 1. Hence, we established the induction step. \(\square \)

Remark 13

In general, a behavioral coordination strategy profile yields different distributions on the trajectory of the system in comparison with the distributions generated from its associated SPIB strategy profile. It is the equivalence of marginal distributions that allows us to establish the equivalence of payoffs using linearity of expectation. This (payoff) equivalence between behavioral coordination strategies and their associated SPIB strategy profiles is different from the equivalence of behavioral coordination strategies with mixed team strategies where not only are the payoffs equivalent, but distributions on the trajectory of the system are also the same.

Proof of Lemma 6

We will prove a stronger result.

Lemma 9

Let \((\lambda ^{*k}, \psi ^*)\) be a CIB strategy such that \(\psi ^{*, k}\) is consistent with \(\lambda ^{*k}\). Let \(g^{*k}\) be the behavioral strategy profile generated from \((\lambda ^{*k}, \psi ^*)\). Let \(\pi _t^k\) represent the belief on \(S_t^k\) generated by \(\psi ^*\) at time t based on \(h_t^0\). Let \(t< \tau \). Consider a fixed \(h_{\tau }^0\in {\mathcal {H}}_{\tau }^0\) and some \({\tilde{g}}_{1:t-1}^k\) (not necessarily equal to \(g_{1:t-1}^{*k}\)). Assume that \(h_{\tau }^0\) is admissible under \(({\tilde{g}}_{1:t-1}^{k}, g_{t:\tau -1}^{*k})\). Suppose that

$$\begin{aligned} {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k}(s_t^k, x_{t-d+1:t}^k|h_t^0) =&\pi _t^k(s_t^k)P_t^k(x_{t-d+1:t}^k|y_{t-d+1:t-1}^k, u_{t-d:t-1}, s_t^k) \nonumber \\&\forall s_t^k\in {\mathcal {S}}_t^k ~\forall x_{t-d+1:t}^k\in {\mathcal {X}}_{t-d+1:t}^k. \end{aligned}$$
(19)

Then,

$$\begin{aligned} {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t:\tau -1}^{*k}}(s_\tau ^k, x_{\tau -d+1:\tau }^k|h_\tau ^0) =&\pi _\tau ^k(s_\tau ^k)P_\tau ^k(x_{\tau -d+1:\tau }^k|y_{\tau -d+1:\tau -1}^k, u_{\tau -d:\tau -1}, s_\tau ^k) \\&\forall s_\tau ^k\in {\mathcal {S}}_\tau ^k~ \forall x_{\tau -d+1:\tau }^k\in {\mathcal {X}}_{\tau -d+1:\tau }^k. \end{aligned}$$

The assertion of Lemma 6 follows from Lemma 9 and the fact that (19) is true for \(t=1\).

Proof of Lemma 9

We only need to prove the result for \(\tau = t + 1\).

Since \(h_{t+1}^0\) is admissible under \(({\tilde{g}}_{1:t-1}^k, g_{t}^{*k})\), we have

$$\begin{aligned} {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}, {\hat{g}}_{1:t}^{-k}}(h_{t+1}^0) > 0 \end{aligned}$$
(20)

where \({\hat{g}}_{1:t}^{-k}\) is the open-loop strategy where all coordinators except k choose prescriptions that generate the actions \(u_{1:t}^{-k}\).

From Lemma 3, we know that \({\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}, g^{-k}}(s_{t+1}^k|h_{t+1}^0)\) is independent of \(g^{-k}\). Therefore,

$$\begin{aligned} {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}(s_{t+1}^k|h_{t+1}^0)&=\dfrac{{\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}, {\hat{g}}_{1:t}^{-k}}(s_{t+1}^k, y_t, u_t| h_{t}^0) }{\sum _{{\tilde{s}}_{t+1}^k}{\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}, {\hat{g}}_{1:t}^{-k}}({\tilde{s}}_{t+1}^k, y_t, u_t| h_{t}^0)}, \end{aligned}$$
(21)

and the denominator of (21) is nonzero due to (20).

We have

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}, {\hat{g}}_{1:t}^{-k}}(s_{t+1}^k, y_t, u_t|h_{t}^0)\nonumber \\&\quad =\sum _{{\tilde{s}}_t^k}\sum _{{\tilde{x}}_{t-d+1:t}^k }\sum _{{\tilde{x}}_t^{-k}} \sum _{ {\tilde{\gamma }}_t^k: {\tilde{\gamma }}_t^k({\tilde{x}}_{t-d+1:t}^k) = u_t^k } \Big [{\mathbb {P}}(y_t^k|{\tilde{x}}_{t}^k, u_{t}) {\mathbb {P}}(y_t^{-k}|{\tilde{x}}_{t}^{-k}, u_{t}) \nonumber \\&\qquad \times \varvec{1}_{ \{s_{t+1}^k = \iota _t^k({\tilde{s}}_t^k, {\tilde{x}}_{t-d+1}^k, {\tilde{\gamma }}_t^k) \} } \lambda _{t}^{*k}({\tilde{\gamma }}_{t}^k|b_{t}, {\tilde{s}}_{t}^k) {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}, {\hat{g}}_{1:t}^{-k}}({\tilde{x}}_{t-d+1:t}^k, {\tilde{x}}_t^{-k}, {\tilde{s}}_t^k|h_t^0)\Big ] \nonumber \\&\quad =\sum _{{\tilde{s}}_t^k}\sum _{{\tilde{x}}_{t-d+1:t}^k }\sum _{{\tilde{x}}_t^{-k}} \sum _{ {\tilde{\gamma }}_t^k: {\tilde{\gamma }}_t^k({\tilde{x}}_{t-d+1:t}^k) = u_t^k } \Big [{\mathbb {P}}(y_t^k|{\tilde{x}}_{t}^k, u_{t}) {\mathbb {P}}(y_t^{-k}|{\tilde{x}}_{t}^{-k}, u_{t}) \nonumber \\&\qquad \times \varvec{1}_{ \{s_{t+1}^k = \iota _t^k({\tilde{s}}_t^k, {\tilde{x}}_{t-d+1}^k, {\tilde{\gamma }}_t^k) \} } \lambda _{t}^{*k}({\tilde{\gamma }}_{t}^k|b_{t}, {\tilde{s}}_{t}^k) {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}, {\hat{g}}_{1:t}^{-k}}({\tilde{x}}_{t-d+1:t}^k, {\tilde{s}}_t^k|h_t^0)\nonumber \\&\qquad \times {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}, {\hat{g}}_{1:t}^{-k}}({\tilde{x}}_t^{-k}| h_t^0) \Big ] \nonumber \\&\quad = \left( \sum _{{\tilde{x}}_t^{-k}} {\mathbb {P}}(y_t^{-k}|{\tilde{x}}_{t}^{-k}, u_{t}) {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}, {\hat{g}}_{1:t}^{-k}}({\tilde{x}}_t^{-k}| h_t^0) \right) \nonumber \\&\qquad \times \sum _{{\tilde{s}}_t^k}\sum _{{\tilde{x}}_{t-d+1:t}^k } \sum _{ {\tilde{\gamma }}_t^k: {\tilde{\gamma }}_t^k({\tilde{x}}_{t-d+1:t}^k) = u_t^k } \Big [{\mathbb {P}}(y_t^k|{\tilde{x}}_{t}^k, u_{t}) \varvec{1}_{ \{s_{t+1}^k = \iota _t^k({\tilde{s}}_t^k, {\tilde{x}}_{t-d+1}^k, {\tilde{\gamma }}_t^k) \} } \nonumber \\&\qquad \times \lambda _{t}^{*k}({\tilde{\gamma }}_{t}^k|b_{t}, {\tilde{s}}_{t}^k) {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}, {\hat{g}}_{1:t}^{-k}}({\tilde{x}}_{t-d+1:t}^k, {\tilde{s}}_t^k|h_t^0) \Big ]. \end{aligned}$$
(22)

where \(b_t=(\varvec{\pi }_t, y_{t-d+1:t-1}, u_{t-d:t-1})\) and \(\varvec{\pi }_t = (\pi _t^l)_{l\in {\mathcal {I}}}\) is generated from \(\psi ^*\).

Recall that we assume

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}, {\hat{g}}_{1:t}^{-k}}({\tilde{x}}_{t-d+1:t}^k, {\tilde{s}}_t^k|h_t^0)\nonumber \\&\quad =\pi _t^k({\tilde{s}}_{t}^k) P_t^k({\tilde{x}}_{t-d+1:t}^k| y_{t-d+1:t-1}^k, u_{t-d:t-1}, {\tilde{s}}_{t}^k). \end{aligned}$$
(23)

Using (21), (22), and (23), we obtain

$$\begin{aligned} {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}(s_{t+1}^k|h_{t+1}^0)&=\dfrac{\varUpsilon _t^k(b_t, y_t^k, u_t, s_{t+1}^k)}{\sum _{ {\tilde{s}}_{t+1}^{k} }\varUpsilon _t^k(b_t, y_t^k, u_t, {\tilde{s}}_{t+1}^k)} \end{aligned}$$

where

$$\begin{aligned}&\qquad \quad \varUpsilon _t^k(b_t, y_t^k, u_t, s_{t+1}^k)\\&\quad =\sum _{{\tilde{s}}_t^k}\sum _{{\tilde{x}}_{t-d+1:t}^k } \sum _{ {\tilde{\gamma }}_t^k: {\tilde{\gamma }}_t^k({\tilde{x}}_{t-d+1:t}^k) = u_t^k } \Big [{\mathbb {P}}(y_t^k|{\tilde{x}}_{t}^k, u_{t}) \varvec{1}_{ \{s_{t+1}^k = \iota _t^k({\tilde{s}}_t^k, {\tilde{x}}_{t-d+1}^k, {\tilde{\gamma }}_t^k) \} } \\&\qquad \times \lambda _{t}^{*k}({\tilde{\gamma }}_{t}^k|b_{t}, {\tilde{s}}_{t}^k) \pi _t^k({\tilde{s}}_{t}^k) P_t^k({\tilde{x}}_{t-d+1:t}^k| y_{t-d+1:t-1}^k, u_{t-d:t-1}, {\tilde{s}}_{t}^k) \Big ], \end{aligned}$$

Therefore by the definition of consistency of \(\psi ^{*, k}\) with respect to \(\lambda ^{*k}\), we conclude that

$$\begin{aligned} {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}(s_{t+1}^k|h_{t+1}^0) = \pi _{t+1}^k(s_{t+1}^k). \end{aligned}$$

Now, consider \({\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}({\tilde{x}}_{t-d+2:t+1}^k, s_{t+1}^k | h_{t+1}^0)\).

  • If \({\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}(s_{t+1}^k | h_{t+1}^0) = 0\), then we have \(\pi _{t+1}^k(s_{t+1}^k) = 0\) and

    $$\begin{aligned}&\quad ~{\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}({\tilde{x}}_{t-d+2:t+1}^k, s_{t+1}^k | h_{t+1}^0) = 0. \end{aligned}$$
  • If \({\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}(s_{t+1}^k | h_{t+1}^0) > 0\), then

    $$\begin{aligned}&{\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}({\tilde{x}}_{t-d+2:t+1}^k, s_{t+1}^k | h_{t+1}^0) \\ =&\,{\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}({\tilde{x}}_{t-d+1:t}^k | h_{t+1}^0, s_{t+1}^k) \pi _{t+1}^k(s_{t+1}^k). \end{aligned}$$

    We have shown in Lemma 4 that

    $$\begin{aligned}&\qquad \quad {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}({\tilde{x}}_{t-d+2:t+1}^k | {\overline{h}}_{t+1}^k)\\&\quad =P_{t+1}^k({\tilde{x}}_{t-d+2:t+1}^k| y_{t-d+2:t}^k, u_{t-d+1:t}, s_{t+1}^k) \end{aligned}$$

    and \((h_{t+1}^0, s_{t+1}^k)\) is a function of \({\overline{h}}_{t+1}^k\). By the law of iterated expectation, we have

    $$\begin{aligned}&\qquad \quad {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}, {\hat{g}}_{1:t}^{-k}}({\tilde{x}}_{t-d+2:t+1}^k | h_{t+1}^0, s_{t+1}^k)\\&\quad =P_{t+1}^k({\tilde{x}}_{t-d+2:t+1}^k| y_{t-d+2:t}^k, u_{t-d+1:t}, s_{t+1}^k). \end{aligned}$$

We conclude that

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{{\tilde{g}}_{1:t-1}^k, g_{t}^{*k}}({\tilde{x}}_{t-d+2:t+1}^k, s_{t+1}^k | h_{t+1}^0)\\&\quad =P_t^k({\tilde{x}}_{t-d+2:t+1}^k| y_{t-d+2:t}^k, u_{t-d+1:t}, s_{t+1}^k) \pi _{t+1}^k(s_{t+1}^k) \end{aligned}$$

for all \(s_{t+1}^k\in {\mathcal {S}}_{t+1}^k\) and all \(x_{t-d+2:t+1}^k\in {\mathcal {X}}_{t-d+2:t+1}^k\). \(\square \)

Proof of Lemma 7

Let \(g^{-i}\) denote the behavioral strategy profile of all coordinators other than i generated from the CIB strategy profile \((\lambda ^k, \psi ^k)_{k\in {\mathcal {I}}\backslash \{ i\} }\). Let \(({\overline{h}}_t^i, \gamma _{t}^i)\) be admissible under \(g^{-i}\).

Let \({\tilde{g}}^i\) denote coordinator i’s behavioral coordination strategy. Because of Lemma 3, we have

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{{\tilde{g}}^i, g^{-i}}(x_{t-d+1:t}, \gamma _{t}^{-i}|{\overline{h}}_t^i, \gamma _t^i)\\&\quad ={\mathbb {P}}^{{\tilde{g}}^i, g^{-i}}(x_{t-d+1:t}, \gamma _{t}^{-i}|h_t^0, x_{1:t-d}^i, \gamma _{1:t}^i)\\&\quad ={\mathbb {P}}^{{\tilde{g}}^i}(x_{t-d+1:t}^i|h_t^0, x_{1:t-d}^i, \gamma _{1:t}^i)\prod _{k\ne i} {\mathbb {P}}^{g^{k}}(x_{t-d+1:t}^k, \gamma _{t}^k|h_t^0). \end{aligned}$$

We know that \(\varvec{\varGamma }_{t}^i\) and \({\mathbf {X}}_{t-d+1:t}^i\) are conditionally independent given \({\overline{H}}_t^i\) since \(\varvec{\varGamma }_{t}^i\) is chosen as a randomized function of \({\overline{H}}_t^i\) at a time when \({\mathbf {X}}_{t-d+1:t}^i\) are already realized. Therefore,

$$\begin{aligned} {\mathbb {P}}^{{\tilde{g}}^i, g^{-i}}(x_{t-d+1:t}^i|h_t^0, x_{1:t-d}^i, \gamma _{1:t}^i)&={\mathbb {P}}^{{\tilde{g}}^i, g^{-i}}(x_{t-d+1:t}^i|h_t^0, x_{1:t-d}^i, \gamma _{1:t-1}^i)\\&=P_t^i(x_{t-d:t}^i| y_{t-d+1:t-1}^i, u_{t-d:t-1}, s_{t}^i), \end{aligned}$$

where \(s_t^i=(x_{t-d}^i, \phi _t^i)\) and \(P_t^i\) is the belief function defined in Eq. (1).

We conclude that

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{g^{-i}}(x_{t-d+1:t}, \gamma _t^{-i}|{\overline{h}}_t^i, \gamma _{t}^i)\nonumber \\&\quad =P_t^i(x_{t-d:t}^i| y_{t-d+1:t-1}^i, u_{t-d:t-1}, s_{t}^i) \prod _{k\ne i}{\mathbb {P}}^{g^{k}}(x_{t-d+1:t}^{k}, \gamma _{t}^{k}|h_t^0). \end{aligned}$$
(24)

Since all coordinators other than coordinator i are using the same belief generation systems, we have \(B_t^j=B_t^k\) for \(j, k\ne i\). Denote \(B_t=B_t^k\) for all \(k\in {\mathcal {I}}\backslash \{i \}\). Let \(b_t=\left( \left( \pi _t^{*, l}\right) _{l\in {\mathcal {I}}}, y_{t-d+1:t-1}, u_{t-d:t-1}\right) \) be a realization of \(B_t\). Also define \(\psi ^*=\psi ^k\) for all \(k\ne i\).

Consider \(k\ne i\). Coordinator k’s strategy \(g^{k}\) is a self-consistent CIB strategy. We also have \(h_t^0\) admissible under \(g^{k}\) since \(({\overline{h}}_t^i, \gamma _{t}^i)\) is admissible under \(g^{-i}\). Hence, applying Lemma 6 we have

$$\begin{aligned} {\mathbb {P}}^{g^{k}}( {\tilde{s}}_{t}^k, x_{t-d+1:t}^{k}|h_t^0)&= \pi _t^{*, k}( {\tilde{s}}_{t}^k)P_t^k(x_{t-d+1:t}^k| y_{t-d+1:t-1}^k, u_{t-d:t-1}, {\tilde{s}}_{t}^{k}). \end{aligned}$$

Hence, the second term of the right hand side of (24) satisfies

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{g^{k}}(x_{t-d+1:t}^{k}, \gamma _{t}^{k}|h_t^0)= \sum _{ {\tilde{s}}_{t}^{k} } {\mathbb {P}}^{g^{k}}({\tilde{s}}_{t}^{k}, x_{t-d+1:t}^k, \gamma _{t}^{k}|h_t^0) \nonumber \\&\quad = \sum _{ {\tilde{s}}_{t}^{k} } \left[ \pi _t^{*, k}({\tilde{s}}_{t}^k) P_t^k(x_{t-d+1:t}^k| y_{t-d+1:t-1}^k, u_{t-d:t-1}, {\tilde{s}}_{t}^{k}) \lambda _t^k(\gamma _{t}^k| b_t, {\tilde{s}}_{t}^{k}) \right] , \end{aligned}$$
(25)

where \(P_t^k\) is the belief function defined in Eq. (1).

Recall that \(b_t=\left( \left( \pi _t^{*, l}\right) _{l\in {\mathcal {I}}}, y_{t-d+1:t-1}, u_{t-d:t-1}\right) \). From (24) and (25), we conclude that

$$\begin{aligned}&{\mathbb {P}}^{g^{-i}}(x_{t-d+1:t}, \gamma _{t}^{-i}|{\overline{h}}_t^i, \gamma _t^i )=F_t^i(x_{t-d+1:t}, \gamma _{t}^{-i}| b_t, s_{t}^i) \end{aligned}$$
(26)

for some function \(F_t^i\) for all \(({\overline{h}}_t^i, \gamma _t^i)\) admissible under \(g^{-i}\).

Consider the total reward of coordinator i. By the law of iterated expectation, we can write

$$\begin{aligned} J^{i}({\tilde{g}}^i, g^{-i}) ={\mathbb {E}}^{{\tilde{g}}^i, g^{-i}}\left[ \sum _{t\in {\mathcal {T}}}{\mathbb {E}}^{g^{-i}}[r_t^i({\mathbf {X}}_t, {\mathbf {U}}_t)|{\overline{H}}_t^i, \varvec{\varGamma }_t^i] \right] . \end{aligned}$$

For \(({\overline{h}}_t^i, \gamma _{t}^i)\) admissible under \(g^{-i}\),

$$\begin{aligned}&\qquad \quad {\mathbb {E}}^{g^{-i}}[r_t^i({\mathbf {X}}_t, {\mathbf {U}}_t)|{\overline{h}}_t^i, \gamma _t^i]\\&\quad = \sum _{{\tilde{x}}_{t-d+1:t}}\sum _{{\tilde{\gamma }}_t^{-i}} r_t^i({\tilde{x}}_t, (\gamma _t^i({\tilde{x}}_{t-d+1:t}^i), {\tilde{\gamma }}_t^{-i}({\tilde{x}}_{t-d+1:t}^{-i}) )) F_t^i({\tilde{x}}_{t-d+1:t}, {\tilde{\gamma }}_t^{-i}| b_t, s_{t}^i)\\&\quad = {\overline{r}}_t^{i}(b_t, s_{t}^i, \gamma _t^i), \end{aligned}$$

for some function \({\overline{r}}_t^{i}\) that depends on \(g^{-i}\) (specifically, on \(\lambda _t^{-i}\)) but not on \({\tilde{g}}^i\).

We claim that \((B_t, S_{t}^i)\) is a controlled Markov process controlled by coordinator i’s prescriptions, given that other coordinators are using the strategy profile \(g^{-i}\). Let \({\tilde{g}}^i\) denote an arbitrary strategy for coordinator i (not necessarily a CIB strategy). We need to prove that

$$\begin{aligned} {\mathbb {P}}^{{\tilde{g}}^i, g^{-i}}(b_{t+1}, s_{t+1}^i| b_{1:t}, s_{1:t}^i, \gamma _{1:t}^i) =&\varXi _t^i(b_{t+1}, s_{t}^i| b_{t}, s_{t}^i, \gamma _{t}^i)\\&\forall (b_{1:t}, s_{1:t}^i, \gamma _{1:t}^i) ~\mathrm {s.t.}~{\mathbb {P}}^{{\tilde{g}}^i, g^{-i}}(b_{1:t}, s_{1:t}^i, \gamma _{1:t}^i) > 0 \end{aligned}$$

for some function \(\varXi _t^i\) independent of \({\tilde{g}}^i\).

We know that

$$\begin{aligned} B_{t+1}&= (\varvec{\varPi }_{t+1}, {\mathbf {Y}}_{t-d+2:t}, {\mathbf {U}}_{t-d+1:t}),\\ \varvec{\varPi }_{t+1}&= \psi _t^*(B_t, {\mathbf {Y}}_{t}, {\mathbf {U}}_t),\\ Y_t^k&= \ell _t^k({\mathbf {X}}_{t}^k, {\mathbf {U}}_t, W_t^{k, Y})\quad \forall k\in {\mathcal {I}}, \\ U_t^{k, j}&= \varGamma _t^{k, j}(X_{t-d+1:t}^{k, j})\quad \forall (k, j)\in {\mathcal {N}},\\ S_{t+1}^{i}&= \iota _t^i(S_t^i, {\mathbf {X}}_{t-d+1}^i, \varvec{\varGamma }_t^i). \end{aligned}$$

Hence, \((B_{t+1}, S_{t}^i)\) is a fixed function of \((B_t, S_t^i, {\mathbf {X}}_{t-d+1:t}, \varvec{\varGamma }_t, {\mathbf {W}}_t^Y)\), where \({\mathbf {W}}_t^Y\) is a primitive random vector independent of \((B_{1:t}, S_{1:t}^i, \varvec{\varGamma }_{1:t}^i, {\mathbf {X}}_{t-d+1:t})\). Therefore, it suffices to prove that

$$\begin{aligned} {\mathbb {P}}^{{\tilde{g}}^i, g^{-i}}(x_{t-d+1:t}, \gamma _{t}^{-i}|b_{1:t}, s_{1:t}^i, \gamma _{1:t}^i) = \varXi _t^i(x_{t-d+1:t}, \gamma _{t}^{-i}| b_{t}, s_{t}^i, \gamma _{t}^i) \end{aligned}$$

for some function \(\varXi _t^i\) independent of \({\tilde{g}}^i\).

\((B_{1:t}, S_{1:t}^i, \varvec{\varGamma }_{1:t}^i)\) is a function of \(({\overline{H}}_t^i, \varvec{\varGamma }_{t}^i)\). Therefore, by applying smoothing property of conditional expectations to both sides of (26) we obtain

$$\begin{aligned} {\mathbb {P}}^{{\tilde{g}}^i, g^{-i}}(x_{t-d+1:t}, \gamma _{t}^{-i}|b_{1:t}, s_{1:t}^i, \gamma _{1:t}^i)=F_t^i(x_{t-d+1:t}, \gamma _{t}^{-i}| b_t, s_{t}^i), \end{aligned}$$

where we know that \(F_t^i\), as defined in (26), is independent of \({\tilde{g}}^i\).

We conclude that coordinator i faces a Markov Decision Problem where the state process is \((B_t, S_{t}^i)\), the control action is \(\varvec{\varGamma }_t^i\), and the total reward is

$${\mathbb {E}}\left[ \sum _{t\in {\mathcal {T}}} {\overline{r}}_t^{i}(B_t, S_{t}^i, \varvec{\varGamma }_t^i)\right] .$$

By standard MDP theory, coordinator i can form a best response by choosing \(\varvec{\varGamma }_t^i\) as a function of \((B_t, S_{t}^i)\).

Proof of Theorem 2

Let \((\lambda ^*, \psi ^*)\) be a pair that solves the dynamic program defined in the statement of the theorem. Let \(g^{*k}\) denote the behavioral coordination strategy corresponding to \((\lambda ^{*k}, \psi ^*)\) for \(k\in {\mathcal {I}}\). We only need to show the following: Suppose that the coordinators other than coordinator i play \(g^{*-i}\), then \(g^{*i}\) is a best response to \(g^{*-i}\).

Let \(h_t^0\in {\mathcal {H}}_t^0\) be admissible under \(g^{*-i}\). Then,

$$\begin{aligned} {\mathbb {P}}^{g^{*k}}(s_{t}^k, x_{t-d+1:t}^k|h_t^0)&=\pi _t^{k}(s_{t}^k) P_t^k(x_{t-d+1:t}^k|y_{t-d+1:t-1}^k, u_{t-d:t-1}, s_{t}^k) \end{aligned}$$
(27)

for all \(k\ne i\) by Lemma 6, where \(\pi _t^{k}\) is the belief generated by \(\psi ^*\) when \(h_t^0\) occurs.

By Lemma 4, we also have

$$\begin{aligned} {\mathbb {P}}({\tilde{s}}_t^i, {\tilde{x}}_{t-d+1:t}^i |h_{t}^0, s_t^i)&=P_t^i({\tilde{x}}_{t-d+1:t}^i|y_{t-d+1:t-1}^i, u_{t-d:t-1}, {\tilde{s}}_{t}^i) \end{aligned}$$
(28)

Combining (27) and (28), the belief for coordinator i defined in the stage game according to Definition 15 satisfies

$$\begin{aligned}&\qquad \quad \beta _t^i({\tilde{z}}_t|s_{t}^i) \\&\quad = \varvec{1}_{\{{\tilde{s}}_{t}^i=s_{t}^i \} } \prod _{k\ne i} \pi _t^{k}({\tilde{s}}_{t}^k) \left( \prod _{k\in {\mathcal {I}}} P_t^k({\tilde{x}}_{t-d+1:t}^k|y_{t-d+1:t-1}^k, u_{t-d:t-1}, {\tilde{s}}_{t}^k)\right) {\mathbb {P}}({\tilde{w}}_t^{Y})\\&\quad ={\mathbb {P}}({\tilde{s}}_t^i, {\tilde{x}}_{t-d+1:t}^i |h_{t}^0, s_t^i) \left( \prod _{k\ne i} {\mathbb {P}}^{g^{*k}}({\tilde{s}}_t^k, {\tilde{x}}_{t-d+1:t}^k|h_t^0)\right) {\mathbb {P}}({\tilde{w}}_t^{Y})\\&\quad ={\mathbb {P}}^{g^{*-i}}({\tilde{s}}_t ,{\tilde{x}}_{t-d+1:t} |h_{t}^0, s_t^i){\mathbb {P}}({\tilde{w}}_t^{k, Y}) = {\mathbb {P}}^{g^{*-i}}({\tilde{z}}_t|h_t^0, s_{t}^i) \end{aligned}$$

for all \((h_t^0, s_{t}^i)\) admissible under \(g^{*-i}\), i.e., the belief represents a true conditional distribution. Since \(\beta _t^i(\cdot |s_{t}^i)\) is a fixed function of \((b_t, s_{t}^i)\), by applying smoothing property on both sides of the above equation we can obtain

$$\begin{aligned} \beta _t^i({\tilde{z}}_t|s_{t}^i) = {\mathbb {P}}^{ g^{*-i}}({\tilde{z}}_t|b_t, s_{t}^i). \end{aligned}$$

for all \((b_t, s_{t}^i)\) admissible under \(g^{*-i}\).Footnote 10

Then, the interim expected utility considered in the definition of IBNE correspondences (Definition 16) can be written as

$$\begin{aligned}&\sum _{{\tilde{z}}_t, {\tilde{\gamma }}_t} \eta ({\tilde{\gamma }}_{t}^i) Q_t^i({\tilde{z}}_t, {\tilde{\gamma }}_t) \beta _t^i({\tilde{z}}_t|x_{t-1}^i)\prod _{k\ne i}\lambda _t^{*k}({\tilde{\gamma }}_t^k|b_t, {\tilde{s}}_{t}^k) \\ =&\sum _{{\tilde{\gamma }}_t^i} \eta ({\tilde{\gamma }}_{t}^i) {\mathbb {E}}^{g_{1:t}^{*-i}}[Q_t^i({\mathbf {Z}}_t, \varvec{\varGamma }_{t})|b_t, s_{t}^i, {\tilde{\gamma }}_{t}^i]. \end{aligned}$$

for all \((b_t, s_{t}^i)\) admissible under \(g^{*-i}\).

The condition of Theorem 2 then implies

$$\begin{aligned}&\lambda _t^{*i}(b_t, s_{t}^i) \in \underset{\eta \in \varDelta ({\mathcal {A}}_{t}^i)}{\arg \max } \sum _{{\tilde{\gamma }}_t} \eta ({\tilde{\gamma }}_{t}^i) {\mathbb {E}}^{g^{*-i}}\left[ r_t^i({\mathbf {X}}_t, {\mathbf {U}}_t) +V_{t+1}^i(B_{t+1}, S_{t+1}^i)|b_t, s_{t}^i,{\tilde{\gamma }}_{t}^i\right] ; \end{aligned}$$
(29)
$$\begin{aligned}&V_t^i(b_t, s_{t}^i)= \sum _{{\tilde{\gamma }}_t^i} \left[ \lambda _t^{*i}({\tilde{\gamma }}_t^i|b_t, s_{t}^i) {\mathbb {E}}^{g_{1:t}^{*-i}}[r_t^i({\mathbf {X}}_t, {\mathbf {U}}_t) +V_{t+1}^i(B_{t+1}, S_{t+1}^i)|b_t, s_{t}^i, {\tilde{\gamma }}_{t}^i]\right] \end{aligned}$$
(30)

for all \((b_t, s_{t}^i)\) admissible under \(g^{*-i}\).

Recall that in the proof of Lemma 7, we have already proved that fixing \((\lambda ^{*-i}, \psi ^*)\), \((B_t, S_{t}^i)\) is a controlled Markov process controlled by \(\varvec{\varGamma }_{t}^i\). Hence, (29) and (30) show that \(\lambda _t^{*i}\) is a dynamic programming solution of the MDP with instantaneous reward

$$\begin{aligned} {\overline{r}}_t^i(B_t, S_{t}^i, \varvec{\varGamma }_{t}^i):= {\mathbb {E}}^{g^{*-i}}[r_t^i({\mathbf {X}}_t, {\mathbf {U}}_t)|B_t, S_{t}^i, \varvec{\varGamma }_{t}^i]. \end{aligned}$$

Therefore, \(\lambda ^{*i}\) maximizes

$$\begin{aligned} {\mathbb {E}}^{\lambda ^i, \lambda ^{*-i}}\left[ \sum _{t\in {\mathcal {T}}} {\overline{r}}_t^i(B_t, S_{t}^i, \varvec{\varGamma }_t^i) \right] \end{aligned}$$

over all \(\lambda ^i=(\lambda _t^i)_{t\in {\mathcal {T}}}, \lambda _t^i: {\mathcal {B}}_t\times {\mathcal {S}}_t^i \mapsto \varDelta ({\mathcal {A}}_t^i)\).

Notice that for any \(\lambda ^i\), if \(g^i\) is the behavioral coordination strategy corresponding to the CIB strategy \((\lambda ^i, \psi _t^*)\), then by Law of Iterated Expectation

$$\begin{aligned} {\mathbb {E}}^{\lambda ^i, \lambda ^{*-i}}\left[ \sum _{t\in {\mathcal {T}}} {\overline{r}}_t^i(B_t, S_{t}^i, \varvec{\varGamma }_t^i) \right]&= {\mathbb {E}}^{g^i, g^{*-i}}\left[ \sum _{t\in {\mathcal {T}}} r_t^i({\mathbf {X}}_t, {\mathbf {U}}_t) \right] . \end{aligned}$$

Hence, we know that \(g^{*i}\) maximizes

$$\begin{aligned} {\mathbb {E}}^{g^{i}, g^{*-i}}\left[ \sum _{t\in {\mathcal {T}}} r_t^i({\mathbf {X}}_t, {\mathbf {U}}_t) \right] \end{aligned}$$

over all \(g^i\) generated from a CIB strategy with the belief generation system \(\psi ^*\).

By the closedness property of CIB strategies (Lemma 7), we conclude that \(g^{*i}\) is a best response to \(g^{*-i}\) over all behavioral coordination strategies of coordinator i, proving the result.

Proof of Proposition 1

We will characterize all the Bayes–Nash equilibria of Example 1 in terms of individual players’ behavioral strategies. Then, we will show that none of the BNE correspond to a CIB-CNE.

Let \(p=(p_1, p_2)\in [0, 1]^2\) describe Alice’s behavioral strategy: \(p_1\) is the probability that Alice plays \(U_1^A=-1\) given \(X_1^A=-1\); \(p_2\) is the probability that Alice plays \(U_1^A=+1\) given \(X_1^A=+1\). Let \(q=(q_1, q_2)\in [0, 1]^2\) denote Bob’s behavioral strategy: \(q_1\) is the probability that Bob plays \(U_3^B=\mathrm {L}\) when observing \(U_1^A=-1\), \(q_2\) is the probability that Bob plays \(U_3^B=\mathrm {L}\) when observing \(U_1^A=+1\).

Claim

$$\begin{aligned} p^*=\left( \frac{1}{3}, \frac{1}{3}\right) ,\quad q^*=\left( \frac{1}{3}+\varepsilon , \frac{1}{3}-\varepsilon \right) \end{aligned}$$

is the unique BNE of Example 1.

Given the claim, one can conclude that a CIB-CNE does not exist in this game: Suppose that \((\lambda ^*, \psi ^*)\) forms a CIB-CNE, then by the definition of CIB strategies, at \(t=1\) the team of Alice chooses a prescription (which maps \({\mathcal {X}}_1^A\) to \({\mathcal {U}}_1^A\)) based on no information. At \(t=3\), the team of Bob chooses a prescription (which is equivalent to an action since Bob has no state) based solely on \(B_3\). Define the induced behavioral strategy of Alice and Bob through

$$\begin{aligned} p_1&= \lambda _1^{*A}(\mathbf {id}|\varnothing ) + \lambda _1^{*A}(\mathbf {cp}_{-1}|\varnothing ),\\ p_2&= \lambda _1^{*A}(\mathbf {id}|\varnothing ) + \lambda _1^{*A}(\mathbf {cp}_{+1}|\varnothing ),\\ q_1&= \lambda _3^{*B}({\mathbf {L}}|b_3[-1]),\\ q_2&= \lambda _3^{*B}({\mathbf {L}}|b_3[+1]), \end{aligned}$$

where \(b_3[u]\) is the CCI under belief generation system \(\psi ^*\) when \(U_1^A=u\). \(\mathbf {id}\) is the prescription that chooses \(U_1^A=X_1^A\); \(\mathbf {cp}_{u}\) is the prescription that chooses \(U_1^A=u\) irrespective of \(X_1^A\); \({\mathbf {L}}\) is Bob’s prescription that chooses \(U_3^B=\mathrm {L}\).

The consistency of \(\psi _1^*\) with respect to \(\lambda _1^*\) implies that

$$\begin{aligned} \varPi _2(-1)&= \dfrac{p_1}{p_1+1-p_2}\quad \text { if }p\ne (0, 1), U_1=-1,\\ \varPi _2(+1)&= \dfrac{p_2}{p_2+1-p_1}\quad \text { if }p\ne (1, 0), U_1=+1, \end{aligned}$$

The consistency of \(\psi _2^*\) with respect to \(\lambda _2^*\) implies that

$$\begin{aligned} \varPi _3(+1)&= \varPi _2(U_1^A). \end{aligned}$$

If a CIB-CNE induces behavioral strategy \(p^*=\left( \frac{1}{3}, \frac{1}{3}\right) \), then the CIB belief \(\varPi _3\in \varDelta ({\mathcal {X}}_2)\) will be the same for both \(U_1=+1\) and \(U_1=-1\) under any consistent belief generation system \(\psi ^*\). Then, \(B_3=(\varPi _3, {\mathbf {U}}_2)\) will be the same for both \(U_1=+1\) and \(U_1=-1\) since \({\mathbf {U}}_2\) only takes one value. Hence, Bob’s-induced stage behavioral strategy q should satisfy \(q_1=q_2\). However, \(q^*=\left( \frac{1}{3}+\varepsilon , \frac{1}{3}-\varepsilon \right) \) is such that \(q_1^*\ne q_2^*\); hence, \((p^*, q^*)\) cannot be induced from any CIB-CNE.

Since the induced behavioral strategy of any CIB-CNE should form a BNE in the game among individuals, we conclude that a CIB-CNE does not exist in Example 1.

Proof of Claim

Denote Alice’s total expected payoff to be J(pq). Then,

$$\begin{aligned}&\qquad \quad J(p, q) \\&\quad = \frac{1}{2}\varepsilon (1-p_1+p_2) + \frac{1}{2} \left( (1-p_1)(1-q_2) + p_1 \cdot 2q_1 \right) +\\&\qquad + \frac{1}{2} \left( (1-p_2)(1-q_1) + p_2\cdot 2q_2 \right) \\&\quad =\frac{1}{2}\varepsilon (1-p_1+p_2) + \frac{1}{2} (2 - p_1 - p_2) + \frac{1}{2}(2p_1 + p_2 - 1)q_1 + \frac{1}{2}(2p_2 + p_1 - 1)q_2. \end{aligned}$$

Since this is a zero-sum game, Alice’s expected payoff at equilibrium can be characterized as

$$\begin{aligned} J^* = \max _{p} \min _q J(p, q). \end{aligned}$$

Alice plays p at some equilibrium if and only if \(\min _q J(p, q) = J^*\). Define \(J^*(p) = \min _q J(p, q)\). We compute

$$\begin{aligned} J^*(p)&= \frac{1}{2}\varepsilon (1-p_1+p_2) + \frac{1}{2} (2 - p_1 - p_2) \\&\quad +{\left\{ \begin{array}{ll} \frac{1}{2}(3p_1 + 3p_2) - 1&{} 2p_1 + p_2 \le 1, 2p_2 + p_1 \le 1\\ \frac{1}{2}(2p_2 + p_1 - 1) &{} 2p_1 + p_2> 1, 2p_2 + p_1 \le 1\\ \frac{1}{2}(2p_1 + p_2 - 1) &{} 2p_1 + p_2 \le 1, 2p_2 + p_1> 1\\ 0&{} 2p_1 + p_2> 1, 2p_2 + p_1 > 1 \end{array}\right. } \end{aligned}$$

The set of equilibrium strategies for Alice is the set of maximizers of \(J^*(p)\). Since \(J^*(p)\) is a continuous piecewise linear function, the set of maximizers can be found by comparing the values at the extreme points of the pieces.

We have

$$\begin{aligned} J^*(0, 0)&= \frac{1}{2}\varepsilon + 1 - 1 = \frac{1}{2}\varepsilon ;\\ J^*\left( \frac{1}{2}, 0\right)&= \frac{1}{2}\varepsilon \cdot \frac{1}{2} + \frac{1}{2} \cdot \frac{3}{2} + \frac{1}{2} \cdot \frac{3}{2} - 1= \frac{1}{4}\varepsilon + \frac{1}{2};\\ J^*\left( 0, \frac{1}{2}\right)&= \frac{1}{2}\varepsilon \cdot \frac{3}{2} + \frac{1}{2} \cdot \frac{3}{2} + \frac{1}{2} \cdot \frac{3}{2} - 1= \frac{3}{4}\varepsilon + \frac{1}{2};\\ J^*(1, 0)&= \frac{1}{2}\varepsilon \cdot 0+ \frac{1}{2}\cdot 1 + \frac{1}{2}\cdot 0 = \frac{1}{2};\\ J^*(0, 1)&= \frac{1}{2}\varepsilon \cdot 2 + \frac{1}{2}\cdot 1 + \frac{1}{2} \cdot 0 = \varepsilon + \frac{1}{2};\\ J^*\left( \frac{1}{3}, \frac{1}{3}\right)&= \frac{1}{2}\varepsilon + \frac{1}{2}\cdot \frac{4}{3} + \frac{1}{2} \cdot 0 = \frac{1}{2}\varepsilon + \frac{2}{3};\\ J^*(1, 1)&= \frac{1}{2}\varepsilon + \frac{1}{2}\cdot 0 + 0 = \frac{1}{2}\varepsilon . \end{aligned}$$
Fig. 1
figure 1

The pieces (polygons) for which \(J^*(p)\) is linear on. The extreme points of the pieces are labeled

Since \(\varepsilon < \frac{1}{3}\), we have \((\frac{1}{3}, \frac{1}{3})\) to be the unique maximum among the extreme points. Hence, we have \(\arg \max _p J^*(p) = \{(\frac{1}{3}, \frac{1}{3}) \}\), i.e., Alice always plays \(p^*=(\frac{1}{3}, \frac{1}{3})\) in any BNE of the game.

Now, consider Bob’s equilibrium strategy. \(q^*\) is an equilibrium strategy of Bob only if \(p^* \in \arg \max _{p} J(p, q^*)\).

For each q, J(pq) is a linear function of p and

$$\begin{aligned} \nabla _p J(p, q) = \left( -\frac{1}{2}\varepsilon - \frac{1}{2} + q_1 + \frac{1}{2}q_2, \frac{1}{2}\varepsilon - \frac{1}{2} + \frac{1}{2}q_1 + q_2 \right) \quad \forall p\in (0, 1)^2.&\end{aligned}$$

We need \(\nabla _p J(p, q^*)\Big |_{p=p^*} = (0, 0)\). Hence,

$$\begin{aligned} -\frac{1}{2}\varepsilon - \frac{1}{2} + q_1^* + \frac{1}{2}q_2^*&= 0;\\ \frac{1}{2}\varepsilon - \frac{1}{2} + \frac{1}{2}q_1^* + q_2^*&= 0, \end{aligned}$$

which implies that \(q^*=(\frac{1}{3}+\varepsilon , \frac{1}{3}-\varepsilon )\), proving the claim. \(\square \)

Proof of Theorem 3

We use Theorem 2 to establish the existence of CIB-CNE: We show that for each t there always exists a pair \((\lambda _t^*, \psi _t^*)\) such that \(\lambda _t^*\) forms an equilibrium at t given \(\psi _t^*\), and \(\psi _t^*\) is consistent with \(\lambda _t^*\). We provide a constructive proof of existence of CIB-CNE by proceeding backwards in time.

Since \(d=1\), we have \(S_t^i = {\mathbf {X}}_{t-1}^i\). The CCI consists of the beliefs along with \({\mathbf {U}}_{t-1}\).

Consider the condensation of the information graph into a directed acyclic graph (DAG) whose nodes are strongly connected components. Each node may contain multiple teams. Consider one topological ordering of this DAG. Denote the nodes by \([1], [2], \cdots \) ([j] is reachable from [k] only if \(k < j\).) We use the notation \(X_t^{[k]}, \varPi _t^{[k]}\) to denote the vector of the system variables of the teams in a node. In particular, following Definition 15, we define \({\mathbf {Z}}_t^{[k]} = ({\mathbf {X}}_{t-1:t}^{[k]}, {\mathbf {W}}_t^{[k], Y})\). We also use [1 : k] as a short hand for the set \([1]\cup [2]\cup \cdots \cup [k]\). Define \(B_t^{[1:k]} = (\varPi _t^{[1:k]}, {\mathbf {U}}_{t-1}^{[1:k]})\). (Note that the usage of superscript here is different from the CCI \(B_t^i\) defined in Definition 12.)

We construct the solution first backwards in time, then in the order of the node for each stage. For this purpose, we an some induction invariant on the value functions \(V_t^i\) (as defined in Theorem 2) for the solution we are going to construct.

Induction Invariant: For each time t and each node index k,

  • \(V_t^{i}(b_t, x_{t-1}^{i})\) depends on \(b_t\) only through \((b_t^{[1:k-1]}, u_{t-1}^{i})\) for all teams \(i\in [k]\), if [k] consists of only one team. (With some abuse of notation, we write \(V_t^{i}(b_t, x_{t-1}^{i}) = V_t^{i}(b_t^{[1:k-1]}, u_{t-1}^{i}, x_{t-1}^{i})\) in this case.)

  • \(V_t^{i}(b_t, x_{t-1}^{i})\) depends on \(b_t\) only through \(b_t^{[1:k]}\) for all teams \(i\in [k]\), if [k] consists of multiple public teams. (We write \(V_t^{i}(b_t, x_{t-1}^{i}) = V_t^{i}(b_t^{[1:k]}, x_{t-1}^{i})\) in this case.)

Induction Base: For \(t=T+1\), we have \(V_{T+1}^i(\cdot )\equiv 0\) for all coordinators \(i\in {\mathcal {I}}\) hence the induction invariant is true.

Induction Step: Suppose that the induction invariant is true at time \(t+1\) for all nodes. We construct the solution so that it is also true at time t.

To complete this step, we provide a procedure to solve the stage game. We argue that one can solve a series of optimization problems or finite games following the topological order of the nodes through an inner induction step.

Inner Induction Step: Suppose that the first \(k-1\) nodes has been solved, and the equilibrium strategy \(\lambda _{t}^{*[1:k-1]}\) uses only \(b_t^{[1:k-1]}\) along with private information. Suppose that the update rules \(\psi _t^{*,[1:k-1]}\) have also been determined, and they use only \((b_t^{[1:k-1]}, y_t^{[1:k-1]}, u_t^{[1:k-1]})\). We now establish the same property for \((\lambda _{t}^{[k]}, \psi _t^{[k]})\).

  • If the k-th node contains a single coordinator i, the value to go is \(V_{t+1}^i(B_{t+1}^{[1:k-1]}, {\mathbf {U}}_t^i, {\mathbf {X}}_t^i)\) by the induction hypothesis. The instantaneous reward for a coordinator i in the k-th node can be expressed by \(r_t^i({\mathbf {X}}_t^{[1:k]}, {\mathbf {U}}_{t}^{[1:k]})\) by the information graph. In the stage game, coordinator i chooses a prescription to maximize the expected value of

    $$\begin{aligned} Q_t^i(b_{t}^{[1:k-1]}, {\mathbf {Z}}_t^{[1:k]}, \varvec{\varGamma }_t^{[1:k]})&:=r_t^i({\mathbf {X}}_t^{[1:k]}, {\mathbf {U}}_{t}^{[1:k]}) + V_{t+1}^i(B_{t+1}^{[1:k-1]}, {\mathbf {U}}_{t}^i, {\mathbf {X}}_t^i), \end{aligned}$$

    where

    $$\begin{aligned} B_{t+1}^{[1:k-1]}&= (\varPi _{t+1}^{[1:k-1]}, {\mathbf {U}}_t^{[1:k-1]} ),\\ \varPi _{t+1}^{j}&=\psi _t^{*,j}(b_t^{[1:k-1]}, {\mathbf {Y}}_t^{j}, {\mathbf {U}}_t^{[1:k-1]})\quad \forall j\in [1:k-1], \\ {\mathbf {Y}}_{t}^{j}&= \ell _t^j({\mathbf {X}}_{t}^j, {\mathbf {U}}_t^{[1:k-1]}, {\mathbf {W}}_t^{j, Y})\quad \forall j\in [1:k-1],\\ {\mathbf {U}}_{t}^j&= \varvec{\varGamma }_{t}^j({\mathbf {X}}_t^j)\quad \forall j\in [1:k]. \end{aligned}$$

    The expectation is computed using the belief \(\beta _t^i\) (defined through Eq. (4) in Definition 15) along with \(\lambda _{t}^{*[1:k-1]}\) that has already been determined. It can be written as

    $$\begin{aligned}&\sum _{{\tilde{s}}_t, {\tilde{\gamma }}_t^{[1:k-1]}} \beta _t^i({\tilde{s}}_t|x_{t-1}^i) Q_t^i(b_t^{[1:k-1]}, {\tilde{s}}_t^{[1:k]}, ({\tilde{\gamma }}_t^{[1:k-1]}, \gamma _{t}^i) )\\&\qquad \times \prod _{j\in [1:k-1]} \lambda _{t}^j({\tilde{\gamma }}_t^j|b_t^{[1:k-1]}, {\tilde{x}}_{t-1}^j) \\&\qquad =\sum _{{\tilde{s}}_{t}^{[1:k]}, {\tilde{\gamma }}_t^{[1:k-1]}} \varvec{1}_{ \{{\tilde{x}}_{t-1}^i=x_{t-1}^i \} } {\mathbb {P}}({\tilde{w}}_t^{[1:k], Y}) \left( \prod _{j\in [1:k-1]} \pi _t^{j}({\tilde{x}}_{t-1}^j){\mathbb {P}}({\tilde{x}}_t^j|{\tilde{x}}_{t-1}^j, u_{t-1}^{[1:k-1]}) \right) \\&\qquad \times \left( \prod _{j\in [1:k-1]} \lambda _{t}^{*j}({\tilde{\gamma }}_t^j|b_t^{[1:k-1]}, x_{t-1}^j) \right) \\&\qquad \times {\mathbb {P}}({\tilde{x}}_t^i|x_{t-1}^i, u_{t-1}^{[1:k]}) Q_t^i(b_t^{[1:k-1]}, {\tilde{s}}_t^{[1:k]}, ({\tilde{\gamma }}_t^{[1:k]}, \gamma _{t}^i)). \end{aligned}$$

    Therefore, the expected reward of coordinator i depends on \(b_t\) through \((b_t^{[1:k-1]}, u_{t-1}^i)\). Coordinator i can choose the optimal prescription based on \((b_t^{[1:k-1]}, u_{t-1}^i, x_{t-1}^i)\), i.e., \(\lambda _t^{*i}(b_t, x_{t-1}^i)=\lambda _t^{*i}(b_t^{[1:k-1]}, u_{t-1}^i, x_{t-1}^i)\). We then have \(V_t^i(b_t, x_{t-1}^i) = V_t^i(b_t^{[1:k-1]}, u_{t-1}^i, x_{t-1}^i)\). The update rule \(\psi _t^{*, [k]} =\psi _t^{*, i}\) is then determined to be an arbitrary update rule consistent with \(\lambda _{t}^{*, i}\), which can be chosen as a function from \({\mathcal {B}}_t^{[1:k]}\times {\mathcal {Y}}_t^{[k]}\times {\mathcal {U}}_t^{[1:k]}\) (instead of \({\mathcal {B}}_t\times {\mathcal {Y}}_t^{[k]}\times {\mathcal {U}}_t\)) to \(\varPi _{t+1}^{[k]}\).

  • If the k-th node contains a group of public teams, then update rules \({\hat{\psi }}_t^{*, [k]}\) are fixed, irrespective of the stage game strategies, i.e., there exist a unique update rule \({\hat{\psi }}_t^{*, i}\) that is compatible with any \(\lambda _{t}^{*, i}\) for a public team i. This update rule is a map from \({\mathcal {Y}}_t^{[k]}\times {\mathcal {U}}_t^{[1:k]}\) to a vector of delta measures on \(\prod _{i\in [k]} \varDelta ({\mathcal {X}}_{t-1}^i)\), i.e., the map to recover \({\mathbf {X}}_{t-1}^{[k]}\) from the observations (see Definition 18). The function takes \({\mathbf {U}}_{t}^{[1:k]}\) as its argument due to the fact that the observations of the k-th node depends on \({\mathbf {U}}_t\) only through \({\mathbf {U}}_t^{[1:k]}\).

    The value to go for each coordinator i can be expressed as \(V_{t+1}^i(B_t^{[1:k]}, {\mathbf {X}}_{t-1}^i)\) by induction hypothesis. The instantaneous reward can be written as \(r_t^i({\mathbf {X}}_t^{[1:k]}, {\mathbf {U}}_{t}^{[1:k]})\) by the definition of the information dependency graph.

    In the stage game, coordinator i in the k-th node chooses a distribution \(\eta _{t}^i\) on prescriptions to maximize the expected value of

    $$\begin{aligned} Q_t^i(b_{t}^{[1:k]}, {\mathbf {Z}}_t^{[1:k]}, \varvec{\varGamma }_{t}^{[1:k]}):=r_t^i({\mathbf {X}}_t^{[1:k]}, {\mathbf {U}}_{t}^{[1:k]}) + V_{t+1}^i(B_{t+1}^{[1:k]}, {\mathbf {X}}_t^i), \end{aligned}$$

    where

    $$\begin{aligned} B_{t+1}^{[1:k]}&= (\varPi _{t+1}^{[1:k]}, {\mathbf {U}}_t^{[1:k]} ),\\ \varPi _{t+1}^{j}&=\psi _t^{*, j}(b_t^{[1:k-1]}, {\mathbf {Y}}_t^{j}, {\mathbf {U}}_t^{[1:k-1]})\quad \forall j\in [1:k-1], \\ \varPi _{t+1}^{[k]}&={\hat{\psi }}_t^{*, [k]}(b_{t}^{[1:k]}, {\mathbf {Y}}_{t}^{[1:k]}, {\mathbf {U}}_{t}^{[1:k]} ),\\ {\mathbf {Y}}_{t}^{j}&= \ell _t^j({\mathbf {X}}_{t}^j, {\mathbf {U}}_t^{[1:k]}, {\mathbf {W}}_t^{j, Y})\quad \forall j\in [1:k],\\ {\mathbf {U}}_{t}^j&= \varvec{\varGamma }_{t}^j({\mathbf {X}}_t^j)\quad \forall j\in [1:k]. \end{aligned}$$

    The expectation is taken with respect to the belief \(\beta _t^i\) (defined through Eq. (4) in Definition 15) and the strategy prediction \(\lambda _t^{[1:k]}\). This expectation can be written as

    $$\begin{aligned}&\sum _{{\tilde{s}}_t, {\tilde{\gamma }}_t^{[1:k]}} \beta _t^i({\tilde{s}}_t|x_{t-1}^i) Q_t(b_t^{[1:k]}, {\tilde{s}}_t^{[1:k]}, {\tilde{\gamma }}_t^{[1:k]} ) \eta _{t}^{i}({\tilde{\gamma }}_t^i) \times \prod _{\begin{array}{c} j\in [1:k]\\ j\ne i \end{array}} \lambda _{t}^j({\tilde{\gamma }}_t^j|b_t^{[1:k-1]}, {\tilde{x}}_{t-1}^j) \\&\quad =\sum _{{\tilde{s}}_{t}^{[1:k]}, {\tilde{\gamma }}_t^{[1:k]}} \varvec{1}_{ \{{\tilde{x}}_{t-1}^i = x_{t-1}^i \} } {\mathbb {P}}({\tilde{w}}_t^{[1:k], Y}) \\&\qquad \times \left( \prod _{\begin{array}{c} j\in [1:k]\\ j\ne i \end{array}} \pi _{t}^j({\tilde{x}}_{t-1}^j){\mathbb {P}}({\tilde{x}}_t^j|{\tilde{x}}_{t-1}^j, u_{t-1}^{[1:k]})\lambda _{t}^{*j}({\tilde{\gamma }}_t^j|b_t^{[1:k]}, {\tilde{x}}_{t-1}^j) \right) \\&\qquad \times {\mathbb {P}}({\tilde{x}}_t^i|x_{t-1}^i, u_{t-1}^{[1:k]}) \eta _{t}^{i}({\tilde{\gamma }}_t^i) Q_t^i(b_t^{[1:k]}, {\tilde{s}}_t^{[1:k]}, {\tilde{\gamma }}_t^{[1:k]}), \end{aligned}$$

    which dependents only on \(b_t\) only through \(b_t^{[1:k]}\). Therefore, the stage game defined in Definition 15 induces a finite game between the coordinators in the k-th node (instead of all coordinators) with parameter \((b_t^{[1:k]}, (\psi _t^{*, [1:k-1]}, {\hat{\psi }}_t^{*,[k]}))\) (instead of \((b_t, \psi _t)\)), where \(\lambda _t^{*[1:k-1]}\) has been fixed. Teams in the k-th node form/play a stage game where the first \(k-1\) nodes act like nature, while the coordinators after k-th node have no effect in the payoffs of the coordinators in the k-th node. Hence, a coordinator i in the k-th node can based their decision on \((b_t^{[1:k]}, x_{t-1}^i)\), i.e., \(\lambda _t^{*i}(b_t, x_{t-1}^i)=\lambda _t^{*i}(b_t^{[1:k]}, x_{t-1}^i)\). We also have \(V_t^i(b_t, x_{t-1}^i) = V_t^i(b_t^{[1:k]}, x_{t-1}^i)\). The update rule is determined by \(\psi _t^{*,[k]} = {\hat{\psi }}_t^{*,[k]}\), which is guaranteed to be consistent with \(\lambda _t^{*[k]}\).

In summary, we determine \((\lambda _{t}^*, \psi _t^*)\) using a node-by-node approach. If the k-th node consists of one team, then we first determine \(\lambda _{t}^{*[k]}\) from an optimization problem dependent on \((\lambda _{t}^{*[1:k-1]}, \psi _t^{*,[1:k-1]})\) and then determine \(\psi _t^{*,[k]}\). If the k-th node consists of multiple public players, then we first determine \(\psi _t^{*,[k]}\) and then solve \(\lambda _{t}^{*[k]}\) from a finite game dependent on \((\lambda _{t}^{*[1:k-1]}, \psi _t^{*, [1:k]})\). Hence, we have constructed the solution and established both inner and outer induction steps, proving the theorem.

Proof of Theorem 4

We prove the Theorem for \(d=1\). The proof idea for \(d>1\) is similar.

We will prove a stronger result. For each \(\varPi _t^i\in \varDelta ({\mathcal {X}}_{t-1}^i)\), define the corresponding \({\hat{\varPi }}_t^i\in \varDelta ({\mathcal {X}}_t)\) by

$$\begin{aligned} {\overline{\varPi }}_t^i(x_t^i):= \sum _{{\tilde{x}}_{t-1}^i}\varPi _t^i({\tilde{x}}_{t-1}^i) {\mathbb {P}}(x_t^i|{\tilde{x}}_{t-1}^i). \end{aligned}$$

Define \({\hat{\psi }}_t^i\) to be the signaling-free update function, i.e., the belief update function such that

$$\begin{aligned} \varPi _{t+1}^i(x_t^i)&={\hat{\psi }}_t^i({\overline{\varPi }}_t^i, {\mathbf {Y}}_t^i)= \dfrac{{\overline{\varPi }}_t^i(x_{t}^i){\mathbb {P}}({\mathbf {Y}}_t^i|x_t^i) }{\sum _{{\tilde{y}}_t^i} {\overline{\varPi }}_t^i(x_{t}^i){\mathbb {P}}({\tilde{y}}_t^i|x_t^i)}. \end{aligned}$$

Define open-loop prescriptions as the prescriptions that simply instruct members of a team to take a certain action irrespective their private information. We will show that there exist an equilibrium where each team plays a common information-based signaling-free (CIBSF) strategy, i.e., the common belief generation system for all coordinators is given by the signaling-free update functions \({\hat{\psi }}\), and coordinator i chooses randomized open-loop prescriptions based on \({\overline{\varvec{\varPi }}}_t=({\overline{\varPi }}_t^i)_{i\in {\mathcal {I}}}\) instead of \((B_t, {\mathbf {X}}_{t-1}^i)\).

Induction Invariant: \(V_t^i(B_t, {\mathbf {X}}_{t-1}^i) = V_t^i({\overline{\varvec{\varPi }}}_t, {\mathbf {X}}_{t-1}^i)\).

Induction Base: The induction variant is true for \(t=T+1\) since \(V_{T+1}^i(\cdot ) \equiv 0\) for all \(i\in {\mathcal {I}}\).

Induction Step: Suppose that the induction variant is true for \(t+1\), prove it for time t.

Let \({\hat{\psi }}_t\) be the signaling-free update rule. We solve the stage game \(G_t(V_{t+1}, {\hat{\psi }}_t, b_t)\). In the stage game, coordinator i chooses a prescription to maximize the expectation of

$$\begin{aligned} r_t^i({\mathbf {X}}_t^{-i}, {\mathbf {U}}_{t}) + V_{t+1}^i({\overline{\varvec{\varPi }}}_{t+1}, {\mathbf {X}}_t^i), \end{aligned}$$

where

$$\begin{aligned} {\overline{\varPi }}_{t+1}^k(x_{t+1}^k)&= \sum _{{\tilde{x}}_t^k} \varPi _{t+1}^k({\tilde{x}}_t^k){\mathbb {P}}(x_{t+1}^k|{\tilde{x}}_t^k)\quad \forall x_{t+1}^k\in {\mathcal {X}}_{t+1}^k, \\ \varPi _{t+1}^k&={\hat{\psi }}_t^k({\overline{\varPi }}_{t}^k, {\mathbf {Y}}_{t}^k)\quad \forall k\in {\mathcal {I}},\\ {\mathbf {Y}}_{t}^{k}&= \ell _t^k({\mathbf {X}}_{t}^k, {\mathbf {W}}_t^{k, Y})\quad \forall k\in {\mathcal {I}},\\ U_{t}^{k, j}&= \varGamma _{t}^{k, j}(X_t^{k, j})\quad \forall (k, j)\in {\mathcal {N}}. \end{aligned}$$

Since \(V_{t+1}^i({\overline{\varvec{\varPi }}}_{t+1}, {\mathbf {X}}_t^i)\) does not depend on coordinator i’s prescriptions, coordinator i only need to maximize the expectation of \(r_t^i({\mathbf {X}}_t^{-i}, {\mathbf {U}}_{t})\), which is

$$\begin{aligned}&\sum _{{\tilde{x}}_{t-1:t}^{-i}, {\tilde{\gamma }}_t^{-i} } \left( \prod _{j\ne i} \pi _t^{j}({\tilde{x}}_{t-1}^{j}) {\mathbb {P}}({\tilde{x}}_t^{j}|{\tilde{x}}_{t-1}^{j}) \lambda _{t}^{j}({\tilde{\gamma }}_t^{j}|b_t, {\tilde{x}}_{t-1}^{j}) \right) r_t^i({\tilde{x}}_t^{-i}, ({\tilde{\gamma }}_t^{-i}({\tilde{x}}_t^{-i}), \gamma _{t}^i(x_t^i) ) ). \end{aligned}$$

Claim

In the stage game, if all coordinators \(-i\) use CIBSF strategy, then coordinator i can respond with a CIBSF strategy.

Proof of Claim

Let \(\eta _{t}^k: {\overline{\varPi }}_t \mapsto \varDelta ({\mathcal {U}}_t^k)\) be the CIBSF strategy of coordinator \(k\ne i\). Then, coordinator i’s expected payoff given \(\gamma _{t}^i\) can be written as

$$\begin{aligned}&\qquad \quad \sum _{{\tilde{x}}_{t-1:t}^{-i}, {\tilde{u}}_t^{-i} } \left( \prod _{j\ne i} \pi _t^{j}({\tilde{x}}_{t-1}^{j}) {\mathbb {P}}({\tilde{x}}_t^{j}|{\tilde{x}}_{t-1}^{j}) \eta _{t}^{j}({\tilde{u}}_t^{j}|{\overline{\pi }}_t) \right) r_t^i({\tilde{x}}_t^{-i}, ({\tilde{u}}_t^{-i}, \gamma _{t}^i(x_t^i) ) ) \\&\quad =\sum _{{\tilde{x}}_{t}^{-i}, {\tilde{u}}_t^{-i} } \left( \prod _{j\ne i} \left( \sum _{{\tilde{x}}_{t-1}^j}\pi _t^{j}({\tilde{x}}_{t-1}^{j}) {\mathbb {P}}({\tilde{x}}_t^{j}|{\tilde{x}}_{t-1}^{j})\right) \eta _{t}^{j}({\tilde{u}}_t^{j}|{\overline{\pi }}_t) \right) \times r_t^i({\tilde{x}}_t^{-i}, ({\tilde{u}}_t^{-i}, \gamma _{t}^i(x_t^i) ) ) \\&\quad =\sum _{{\tilde{x}}_{t}^{-i}, {\tilde{u}}_t^{-i} } \left( \prod _{j\ne i} {\overline{\pi }}_t^{j}({\tilde{x}}_{t}^{j}) \eta _{t}^{j}({\tilde{u}}_t^{j}|{\overline{\pi }}_t) \right) r_t^i({\tilde{x}}_t^{-i}, ({\tilde{u}}_t^{-i}, \gamma _{t}^i(x_t^i) ) ) \\&\quad =:{\overline{r}}_t^i({\overline{\pi }}_t, \eta _{t}^{-i}, \gamma _{t}^i(x_t^i)). \end{aligned}$$

Hence, coordinator i can respond with a prescription \(\gamma _{t}^i\) such that \(\gamma _{t}^i(x_t^i) = u_t^i\) for all \(x_t^i\), where

$$\begin{aligned} u_t^i \in \arg \max _{{\tilde{u}}_t^i} {\overline{r}}_t^i({\overline{\pi }}_t, \eta _{t}^{-i}, {\tilde{u}}_t^i), \end{aligned}$$

can be chosen based on \(({\overline{\pi }}_t, \eta _{t}^{-i})\), proving the claim. \(\square \)

Given the claim, we conclude that there exist a stage game equilibrium where all coordinators play CIBSF strategies: Define a new stage game where we restrict each coordinator to CIBSF strategies. A best response in the restricted stage game will be also a best response in the original stage game due to the claim. The restricted game is a finite game (It is a game of symmetrical information with parameter \({\overline{\pi }}_t\) where coordinator i’s action is \(u_t^i\) and its payoff is a function of \({\overline{\pi }}_t\) and \(u_t\).) that always has an equilibrium. The equilibrium strategy will be consistent with \({\hat{\psi }}_t\) due to Lemma 10.

Lemma 10

The signaling-free update rule \({\hat{\psi }}_t^i\) is consistent with any \(\lambda _t^i: {\mathcal {B}}_t\times {\mathcal {X}}_{t-1}^i \mapsto \varDelta ({\mathcal {A}}_{t}^i)\) that corresponds to a CIBSF strategy at time t.

Proof

It follows from standard arguments related to strategy independence of belief (See Chapter 6 of Kumar and Varaiya [25]). \(\square \)

Let \(\eta _{t}^{*}=(\eta _{t}^{*j})_{j\in {\mathcal {I}}}, \eta _{t}^{*j}: {\overline{\varPi }}_t \mapsto \varDelta ({\mathcal {U}}_t^j)\) be a CIBSF strategy profile that is a stage game equilibrium. Then, the value function

$$\begin{aligned} V_{t}^i(b_t, x_{t-1}^i)&= \left( \max _{{\tilde{u}}_t^i} {\overline{r}}_t^i({\overline{\pi }}_t, \eta _{t}^{*-i}, {\tilde{u}}_t^i) \right) \\&\quad +\sum _{{\tilde{x}}_t, {\tilde{y}}_t} V_{t+1}^i({\hat{\psi }}_t({\overline{\pi }}_{t}, {\tilde{y}}_t), {\tilde{x}}_{t}^i) {\mathbb {P}}({\tilde{y}}_t|{\tilde{x}}_t){\mathbb {P}}({\tilde{x}}_t^i|x_{t-1}^i){\overline{\pi }}_t^{-i}({\tilde{x}}_t^{-i}) \end{aligned}$$

depends on \((b_t, x_{t-1}^i)\) only through \(({\overline{\pi }}_t, x_{t-1}^i)\), establishing the induction step.

Proof of Lemma 8

In this appendix, when we specify a team’s strategy through a profile of individual strategies, for example \(\varphi ^i=(\varphi ^{i, l})_{(i, l)\in {\mathcal {N}}_i}\), we assume that members of team i apply these strategies independent of their teammates.

We first show three auxiliary results, Lemmas 1113 that forms the basis of our proof of Lemma 8.

Lemma 11

(Conditional independence among teammates) Suppose that members of team i use behavioral strategies \(\varphi ^i=(\varphi ^{i, l})_{(i, l)\in {\mathcal {N}}_i}\) where \(\varphi ^{i, l}=(\varphi _t^{i, l})_{t\in {\mathcal {T}}}, \varphi _t^{i, l}: {\mathcal {H}}_t^{i, l}\mapsto \varDelta ({\mathcal {U}}_t^{i, l})\). Suppose that all teams other than i use a behavioral coordination strategy profile \(g^{-i}\). Then, \(({\mathbf {X}}_{t-d+1:t}^{i, l})_{(i, l)\in {\mathcal {N}}_i}\) are conditionally independent given the common information \(H_t^i\). Furthermore, the conditional distribution of \({\mathbf {X}}_{t-d+1:t}^{i, j}\) given \(H_t^i\) depends on \((\varphi ^i, g^{-i})\) only through \(\varphi ^{i, j}\).

Lemma 12

Let \(\mu ^{i, j}\) be a pure strategy of agent (ij). Let \(\varphi _t^{i, -j}=(\varphi _t^{i, l})_{(i, l)\in {\mathcal {N}}_i\backslash \{(i, j) \}, t\in {\mathcal {T}} }, \varphi _t^{i, l}{:} {\mathcal {H}}_t^{i, l} \mapsto \varDelta ({\mathcal {U}}_t^{i, l})\) be behavioral strategies of all members of team i except (ij). Then, there exist a behavioral strategy \({\bar{\varphi }}^{i, j} = ({\bar{\varphi }}_t^{i, j})_{t\in {\mathcal {T}}}, {\bar{\varphi }}_t^{i, j}: {\mathcal {H}}_t^{i}\times {\mathcal {X}}_t^{i, j} \mapsto \varDelta ({\mathcal {U}}_t^{i, j})\) such that \((\mu ^{i, j}, \varphi ^{i, -j})\) is payoff-equivalent to \(({\bar{\varphi }}^{i, j}, \varphi ^{i, -j})\).

Lemma 13

Let \(\mu ^i\) be a pure strategy of team i. There exists a payoff-equivalent behavioral strategy profile \({\bar{g}}^i\) that only assigns simple prescriptions.

Based on Lemmas 1113, we proceed to complete the proof of Lemma 8 via the following steps.

  1. 1.

    Let \(\sigma ^i\) be a payoff-equivalent mixed team strategy to \(g^i\). (See Sect. 3).

  2. 2.

    For each \(\mu ^i\in \mathrm {supp}(\sigma ^i)\), let \({\bar{g}}^{i}[\mu ^i]\) be a payoff-equivalent behavioral strategy profile \({\bar{g}}^i\) that only assigns simple prescriptions (Lemma 13)

  3. 3.

    Let \({\bar{\varsigma }}^i[\mu ^i]\) be a payoff-equivalent mixed coordination strategy of \({\bar{g}}^{i}[\mu ^i]\) constructed from Kuhn’s Theorem [24].

  4. 4.

    Define a new mixed coordination strategy \({\bar{\varsigma }}^i\) by

    $$\begin{aligned} {\bar{\varsigma }}^i = \sum _{\mu ^i\in \mathrm {supp}(\sigma ^i)} \sigma ^i(\mu ^i) \cdot {\bar{\varsigma }}^i[\mu ^i]. \end{aligned}$$
  5. 5.

    Let \({\bar{g}}^i\) be a payoff-equivalent behavioral coordination strategy profile to \({\bar{\varsigma }}^i\) constructed from Kuhn’s Theorem [24].

It is clear that \({\bar{g}}^i\) will be payoff-equivalent to \(\sigma ^i\). Furthermore, \({\bar{g}}^i\) always assigns simple prescriptions since the construction in Kuhn’s Theorem does not change the set of possible prescriptions.

Proof of Lemma 11

Assume that \(h_t^i\) is admissible under \(\varphi ^i\). Let \(g^i\) be a behavioral coordination strategy defined by

$$\begin{aligned} g_t^i(\gamma _t^i|{\overline{h}}_t^i) = \prod _{(i, j)\in {\mathcal {N}}_i} \prod _{x_{t-d+1:t}^{i, j}} \varphi _t^{i, j}(\gamma _{t}^{i, j}(x_{t-d+1:t}^{i, j})|h_t^i, x_{t-d+1:t}^{i, j}), \end{aligned}$$

i.e., at time t, the coordinator generate independent prescriptions for each member of the team. If we view the prescription \(\varGamma _t^{i, j}\) as a table of actions, then it is determined as follows: Each entry of the table is determined independently, where the entry corresponding to \(x_{t-d+1:t}^{i, j}\) is randomly drawn with distribution \(\varphi _t^{i, j}(h_t^i, x_{t-d+1:t}^{i, j})\).

Using arguments similar to those in the proof of Lemma 1 one can show that \((g^i, g^{-i})\) and \((\varphi ^i, g^{-i})\) generate the same distributions of \(({\mathbf {Y}}_{1:t}, {\mathbf {U}}_{1:t}, {\mathbf {X}}_{1:t})\), hence

$$\begin{aligned} {\mathbb {P}}^{\varphi ^i, g^{-i}}(x_{t-d+1:t}^i|h_t^i) = {\mathbb {P}}^{g^i, g^{-i}}(x_{t-d+1:t}^i|h_t^i). \end{aligned}$$

By Lemma 3, we know that \({\mathbb {P}}^{g}(x_{1:t}^i, \gamma _{1:t}^i|h_t^0)\) does not depend on \(g^{-i}\). As a conditional distribution obtained from \({\mathbb {P}}^{g}(x_{1:t}^i, \gamma _{1:t}^i|h_t^0)\), \({\mathbb {P}}^{g}(x_{t-d+1:t}^i|h_t^i)\) does not depend on \(g^{-i}\) either. Therefore, we have

$$\begin{aligned} {\mathbb {P}}^{g^i, g^{-i}}(x_{t-d+1:t}^i|h_t^i) = {\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{t-d+1:t}^i|h_t^i) \end{aligned}$$

where \({\hat{g}}^{-i}\) is an open-loop strategy profile that always generates the actions \(u_{1:t-1}^{-i}\).

Again, \((g^i, {\hat{g}}^{-i})\) and \((\varphi ^i, {\hat{g}}^{-i})\) generate the same distributions on \(({\mathbf {Y}}_{1:t}, {\mathbf {U}}_{1:t}, {\mathbf {X}}_{1:t})\), hence

$$\begin{aligned} {\mathbb {P}}^{\varphi ^i, {\hat{g}}^{-i}}(x_{t-d+1:t}^i|h_t^i) = {\mathbb {P}}^{g^i, {\hat{g}}^{-i}}(x_{t-d+1:t}^i|h_t^i). \end{aligned}$$

We now have

$$\begin{aligned} {\mathbb {P}}^{\varphi ^i, g^{-i}}(x_{t-d+1:t}^i|h_t^i) = {\mathbb {P}}^{\varphi ^i, {\hat{g}}^{-i}}(x_{t-d+1:t}^i|h_t^i). \end{aligned}$$

Due to Lemma 3, we also have

$$\begin{aligned} {\mathbb {P}}^{\varphi ^i, {\hat{g}}^{-i}}(x_{t-d+1:t}^i|h_t^i) = {\mathbb {P}}^{\varphi ^i, {\hat{g}}^{-i}}(x_{t-d+1:t}^i|h_t^i, x_{t-d:t}^{-i}) \end{aligned}$$

where \(x_{t-d:t}^{-i}\in {\mathcal {X}}_{t-d:t}^{-i}\) is such that \({\mathbb {P}}^{\varphi ^i, {\hat{g}}^{-i}}(x_{t-d:t}^{-i}|h_t^i)>0\).

Let \(\tau =t-d+1\). By Bayes’ rule,

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{\varphi ^i, {\hat{g}}^{-i}}(x_{\tau :t}^i|h_t^i, x_{\tau -1:t}^{-i})\nonumber \\&\quad =\dfrac{{\mathbb {P}}^{\varphi ^i, {\hat{g}}^{-i}}(x_{\tau :t}, y_{\tau :t-1}, u_{\tau :t-1}|h_\tau ^0, x_{1:\tau -1}, x_{\tau -1}^{-i} )}{\sum _{{\tilde{x}}_{\tau :t}^i } {\mathbb {P}}^{\varphi ^i, {\hat{g}}^{-i}}({\tilde{x}}_{\tau :t}^i, x_{\tau :t}^{-i}, y_{\tau :t-1}, u_{\tau :t-1}|h_\tau ^0, x_{1:\tau -1}, x_{\tau -1}^{-i} )} \end{aligned}$$
(31)

We have

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{\varphi ^i, {\hat{g}}^{-i}}(x_{\tau :t}, y_{\tau :t-1}, u_{\tau :t-1}|h_\tau ^0, x_{1:\tau -1}, x_{\tau -1}^{-i} )\nonumber \\&\quad =\prod _{l=1}^{d-1} \left[ {\mathbb {P}}^{\varphi ^i, {\hat{g}}^{-i}}(x_{t-l+1}, y_{t-l}|y_{1:t-l-1}, u_{1:t-l} , x_{1:t-l}^i, x_{\tau -1:t-l}^{-i}) \right. \nonumber \\&\qquad ~\times \left. {\mathbb {P}}^{\varphi ^i, {\hat{g}}^{-i}}(u_{t-l}^i|y_{1:t-l-1}, u_{1:t-l-1} , x_{1:t-l}^i, x_{\tau -1:t-l}^{-i})\right] \nonumber \\&\qquad ~\times {\mathbb {P}}^{\varphi ^i, {\hat{g}}^{-i}}(x_\tau |h_\tau ^0, x_{1:\tau -1}, x_{\tau -1}^{-i})\nonumber \\&\quad =\prod _{l=1}^{d-1} \Big [ \Big (\prod _{(i, j)\in {\mathcal {N}}_i}{\mathbb {P}}(x_{t-l+1}^{i, j}|x_{t-l}^{i, j}, u_{t-l}){\mathbb {P}}(y_{t-l}^{i, j}|x_{t-l}^{i, j}, u_{t-l}) \nonumber \\&\qquad ~\times \varphi _{t-l}^{i, j}(u_{t-l}^{i, j}|h_{t-l}^{i, j})\Big ) {\mathbb {P}}(x_{t-l+1}^{-i}|x_{t-l}^{-i}, u_{t-l}){\mathbb {P}}(y_{t-l}^{-i}|x_{t-l}^{-i}, u_{t-l}) \Big ]\nonumber \\&\qquad ~\times \left( \prod _{(i, j)\in {\mathcal {N}}_i}{\mathbb {P}}(x_\tau ^{i, j}|x_{\tau -1}^{i, j}, u_{\tau -1}) \right) {\mathbb {P}}(x_\tau ^{-i}|x_{\tau -1}^{-i}, u_{\tau -1}). \end{aligned}$$
(32)

Substituting (32) into (31), we obtain

$$\begin{aligned} {\mathbb {P}}^{\varphi ^i, {\hat{g}}^{-i}}(x_{\tau :t}^i|h_t^i, x_{\tau -1:t}^{-i})&=\dfrac{\prod _{(i, j)\in {\mathcal {N}}_i} F_t^{i, j}(x_{\tau :t}^{i, j}, h_t^i)}{\sum _{{\tilde{x}}_{\tau :t}^i } \prod _{(i, j)\in {\mathcal {N}}_i} F_t^{i, j}({\tilde{x}}_{\tau :t}^{i, j}, h_t^i)}\\&=\prod _{(i, j)\in {\mathcal {N}}_i} \dfrac{F_t^{i, j}(x_{\tau :t}^{i, j}, h_t^i)}{\sum _{{\tilde{x}}_{\tau :t}^{i, j} } F_t^{i, j}({\tilde{x}}_{\tau :t}^{i, j}, h_t^i)} \end{aligned}$$

where

$$\begin{aligned}&\qquad \quad F_t^{i, j}(x_{\tau :t}^{i, j}, h_t^i) \\&\quad = \prod _{s=1}^{d-1} \Big [ {\mathbb {P}}(x_{t-l+1}^{i, j}|x_{t-l}^{i, j}, u_{t-l}){\mathbb {P}}(y_{t-l}^{i, j}|x_{t-l}^{i, j}, u_{t-l})\varphi _{t-l}^{i, j}(u_{t-l}^{i, j}|h_{t-l}^{i, j}) \Big ] \times {\mathbb {P}}(x_\tau ^{i, j}|x_{\tau -1}^{i, j}, u_{\tau -1}) \end{aligned}$$

is a function that depends on \(\varphi ^{i, j}\) but not \(\varphi ^{i, -j}\).

Therefore, we have proved that

$$\begin{aligned} {\mathbb {P}}^{\varphi ^i, g^{-i}}(x_{\tau :t}^i|h_t^i)&= \prod _{(i, j)\in {\mathcal {N}}_i} \dfrac{F_t^{i, j}(x_{\tau :t}^{i, j}, h_t^i)}{\sum _{{\tilde{x}}_{\tau :t}^{i, j} } F_t^{i, j}({\tilde{x}}_{\tau :t}^{i, j}, h_t^i)}. \end{aligned}$$
(33)

Marginaling (33) we have

$$\begin{aligned} {\mathbb {P}}^{\varphi ^i, g^{-i}}(x_{\tau :t}^{i, j}|h_t^i)&= \dfrac{F_t^{i, j}(x_{\tau :t}^{i, j}, h_t^i)}{\sum _{{\tilde{x}}_{\tau :t}^{i, j} } F_t^{i, j}({\tilde{x}}_{\tau :t}^{i, j}, h_t^i)} \end{aligned}$$

which depends on \((\varphi ^i, g^{-i})\) only through \(\varphi ^{i, j}\).

Hence, we conclude that

$$\begin{aligned} {\mathbb {P}}^{\varphi ^i, g^{-i}}(x_{t-d+1:t}^i|h_t^i) = \prod _{(i, j)\in {\mathcal {N}}_i} {\mathbb {P}}^{\varphi ^i, g^{-i}}(x_{t-d+1:t}^{i, j}|h_t^i), \end{aligned}$$

and \({\mathbb {P}}^{\varphi ^i, g^{-i}}(x_{t-d+1:t}^{i, j}|h_t^i)\) depends on \((\varphi ^i, g^{-i})\) only through \(\varphi ^{i, j}\).

Remark 14

In general, the conditional independence among teammates is not true when team members jointly randomize.

\(\square \)

Proof of Lemma 12

For notational convenience, define

$$\begin{aligned} H_t = \bigcup _{i\in {\mathcal {I}}} H_t^i = ({\mathbf {Y}}_{1:t-1}, {\mathbf {U}}_{1:t-1}, {\mathbf {X}}_{1:t-d}). \end{aligned}$$

Due to Lemma 11, \({\mathbb {P}}^{\varphi ^i, g^{-i}}({\tilde{x}}_{t-d+1:t-1}^{i, j}|h_t^i, x_t^{i, j})\) depends on the strategy profile only through \(\varphi ^{i, j}\).

Set

$$\begin{aligned} {\overline{\varphi }}_t^{i, j}(u_t^{i, j}|h_t^i, x_t^{i, j}) = \sum _{{\tilde{x}}_{t-d+1:t-1}^{i, j}} \varvec{1}_{\{ u_t^{i, j}= \mu _t^{i, j}(h_t^i, {\tilde{x}}_{t-d+1:t-1}^{i, j}, x_t^{i, j} ) \}} {\mathbb {P}}^{\mu ^{i, j}}({\tilde{x}}_{t-d+1:t-1}^{i, j}|h_t^i, x_t^{i, j}) \end{aligned}$$

for all \((h_t^i, x_t^{i, j})\) admissible under \(\mu ^{i, j}\). Otherwise, \({\overline{\varphi }}_t^{i, j}(h_t^i, x_t^{i, j})\) is set arbitrarily.

Let \(\mu ^{-i}\) be a pure team strategy profile of teams other than i. Let the superscript \(-(i, j)\) denote all agents (of all teams) other than (ij). We will prove by induction that

$$\begin{aligned} {\mathbb {P}}^{\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(u_{t}, x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t) = {\mathbb {P}}^{{\overline{\varphi }}^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(u_{t}, x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t) \end{aligned}$$
(34)

Given (34), the claim can be established with linearity of expectation similar to the proof of Lemma 5.

Induction Base: (34) is true for \(t=1\) since \({\overline{\varphi }}_1^{i, j}\) is the same strategy as \(\mu _1^{i, j}\).

Induction Step: Suppose that (34) is true for time \(t-1\). Prove the result for time t.

First,

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(u_{t}, x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t)\\&\quad =\sum _{{\tilde{x}}_{t-d+1:t-1}^{i, j}} {\mathbb {P}}^{\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(u_{t}| x_t, {\tilde{x}}_{t-d+1:t-1}^{i, j}, x_{t-d+1:t-1}^{-(i, j)}, h_t)\\&\qquad \times {\mathbb {P}}^{\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(x_t, {\tilde{x}}_{t-d+1:t-1}^{i, j}, x_{t-d+1:t-1}^{-(i, j)}, h_t)\\&\quad =\sum _{{\tilde{x}}_{t-d+1:t-1}^{i, j}} \varvec{1}_{\{u_t^{i, j} = \mu _t^{i, j}(h_t^0, {\tilde{x}}_{t-d+1:t-1}^{i, j}, x_t^{i, j} ) \} } \left( \prod _{(i, l)\in {\mathcal {N}}_i\backslash \{(i, j)\} } \varphi _t^{i, l}(u_t^{i, l}|h_t^{i, l}) \right) \\&\qquad \times \left( \prod _{(k, j)\in {\mathcal {N}}_{-i}} \varvec{1}_{\{u_t^{k, j} = \mu _t^{k, j}(h_t^{k, j}) \} } \right) {\mathbb {P}}^{\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(x_t, {\tilde{x}}_{t-d+1:t-1}^{i, j}, x_{t-d+1:t-1}^{-(i, j)}, h_t)\\&\quad =G_t^{i,j} \times \left( \prod _{(i, l)\in {\mathcal {N}}_i\backslash \{(i, j)\} } \varphi _t^{i, l}(u_t^{i, l}|h_t^{i, l}) \right) \left( \prod _{(k, j)\in {\mathcal {N}}_{-i}} \varvec{1}_{\{u_t^{k, j} = \mu _t^{k, j}(h_t^{k, j}) \} } \right) \\&\qquad \times {\mathbb {P}}^{\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t) \end{aligned}$$

where

$$\begin{aligned} G_t^{i, j}&:= \sum _{{\tilde{x}}_{t-d+1:t-1}^{i, j}}\left[ \varvec{1}_{\{u_t^{i, j} = \mu _t^{i, j}(h_t^0, {\tilde{x}}_{t-d+1:t-1}^{i, j}, x_t^{i, j} ) \} }\right. \\&\qquad \left. \times {\mathbb {P}}^{\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}({\tilde{x}}_{t-d+1:t-1}^{i, j}|x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t)\right] . \end{aligned}$$

From Lemmas 3 and 11, we know that

$$\begin{aligned} {\mathbb {P}}^{\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}({\tilde{x}}_{t-d+1:t-1}^{i, j}|x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t) = {\mathbb {P}}^{\mu ^{i, j}}({\tilde{x}}_{t-d+1:t-1}^{i, j}|x_t^{i, j}, h_t^i) \end{aligned}$$

for all \((x_t^{i, j}, h_t^i)\) admissible under \(\mu ^{i, j}\). Note that \({\mathbb {P}}^{\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t) = 0\) for \((x_t^{i, j}, h_t^i)\) not admissible under \(\mu ^{i, j}\).

Hence, we conclude that

$$\begin{aligned} G_t^{i, j}&= \sum _{{\tilde{x}}_{t-d+1:t-1}^{i, j}} \varvec{1}_{\{u_t^{i, j} = \mu _t^{i, j}(h_t^0, {\tilde{x}}_{t-d+1:t-1}^{i, j}, x_t^{i, j} ) \} } {\mathbb {P}}^{\mu ^{i, j}}({\tilde{x}}_{t-d+1:t-1}^{i, j}|x_t^{i, j}, h_t^i)\\&={\bar{\varphi }}_t^{i, j} (u_t^{i, j}|h_t^i, x_t^{i, j}) \end{aligned}$$

and

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(u_{t}, x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t)\\&\quad ={\bar{\varphi }}_t^{i, j} (u_t^{i, j}|h_t^i, x_t^{i, j}) \left( \prod _{(i, l)\in {\mathcal {N}}_i\backslash \{(i, j)\} } \varphi _t^{i, l}(u_t^{i, l}|h_t^{i, l}) \right) \left( \prod _{(k, j)\in {\mathcal {N}}_{-i}} \varvec{1}_{\{u_t^{k, j} = \mu _t^{k, j}(h_t^{k, j}) \} } \right) \\&\qquad \times {\mathbb {P}}^{\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t). \end{aligned}$$

Similarly, we have

$$\begin{aligned}&\qquad \quad {\mathbb {P}}^{{\bar{\varphi }}^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(u_{t}, x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t)\\&\quad ={\bar{\varphi }}_t^{i, j} (u_t^{i, j}|h_t^i, x_t^{i, j}) \left( \prod _{(i, l)\in {\mathcal {N}}_i\backslash \{(i, j)\} } \varphi _t^{i, l}(u_t^{i, l}|h_t^{i, l}) \right) \left( \prod _{(k, j)\in {\mathcal {N}}_{-i}} \varvec{1}_{\{u_t^{k, j} = \mu _t^{k, j}(h_t^{k, j}) \} } \right) \\&\qquad \times {\mathbb {P}}^{{\bar{\varphi }}^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t). \end{aligned}$$

Hence, it suffices to prove that

$$\begin{aligned} {\mathbb {P}}^{\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t)={\mathbb {P}}^{{\bar{\varphi }}^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t). \end{aligned}$$

Given the induction hypothesis, it suffices to show that

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}^{\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t|u_{t-1}, x_{t-1}, x_{t-d:t-2}^{-(i, j)}, h_{t-1})\\ =&\,{\mathbb {P}}^{{\bar{\varphi }}^{i, j}, \varphi _t^{i, -j}, \mu ^{-i}}(x_t, x_{t-d+1:t-1}^{-(i, j)}, h_t|u_{t-1}, x_{t-1}, x_{t-d:t-2}^{-(i, j)}, h_{t-1}) \end{aligned} \end{aligned}$$
(35)

for all \((x_{t-1}, x_{t-d:t-2}^{-(i, j)}, h_{t-1})\) admissible under \((\mu ^{i, j}, \varphi _t^{i, -j}, \mu ^{-i})\) (or admissible under \(({\bar{\varphi }}^{i, j}, \varphi _t^{i, -j}, \mu ^{-i})\), which is the same condition due to the induction hypothesis).

Since

$$\begin{aligned} {\mathbf {X}}_t^k&= f_{t-1}^k({\mathbf {X}}_{t-1}^k, {\mathbf {U}}_t, W_t^{k, X}),\quad k\in {\mathcal {I}}\\ H_t&= (H_{t-1}, {\mathbf {Y}}_{t-1}, {\mathbf {U}}_{t-1}),\\ {\mathbf {Y}}_{t-1}^k&= \ell _{t-1}^k({\mathbf {X}}_{t-1}^k, {\mathbf {U}}_{t-1}, W_{t-1}^{k, Y}),\quad k\in {\mathcal {I}} \end{aligned}$$

we have \(({\mathbf {X}}_{t}, {\mathbf {X}}_{t-d+1:t-1}^{-(i, j)}, H_t)\) to be a strategy-independent function of the random vector \(({\mathbf {U}}_{t-1}, {\mathbf {X}}_{t-1}, {\mathbf {X}}_{t-d:t-2}^{-(i, j)}, H_{t-1}, {\mathbf {W}}_{t-1}^{X}, {\mathbf {W}}_{t-1}^{Y})\), where \(({\mathbf {W}}_{t-1}^{X}, {\mathbf {W}}_{t-1}^{Y})\) is a primitive random vector independent of \(({\mathbf {U}}_{t-1}, {\mathbf {X}}_{t-1}, {\mathbf {X}}_{t-d:t-2}^{-(i, j)}, H_{t-1})\). Therefore, (35) is true and we established the induction step. \(\square \)

Proof of Lemma 13

Through iterative application of Lemma 12, we conclude that for every pure strategy \(\mu ^i\), there exist a payoff-equivalent behavioral strategy profile \({\bar{\varphi }}^{i}=({\bar{\varphi }}_t^{i, j})_{(i, j)\in {\mathcal {N}}_i, t\in {\mathcal {T}}}\), where \({\bar{\varphi }}_t^{i, j}: {\mathcal {H}}_t^{i}\times {\mathcal {X}}_t^{i, j} \mapsto \varDelta ({\mathcal {U}}_t^{i, j})\). Define \({\bar{g}}^i\) by

$$\begin{aligned} {\bar{g}}_t^i(\gamma _t^i| h_t^i) = {\left\{ \begin{array}{ll} \prod _{(i, j)\in {\mathcal {N}}_i} \prod _{x_{t}^{i, j}} {\bar{\varphi }}_t^{i, j}(\gamma _{t}^{i, j}(x_{t}^{i, j})|h_t^i, x_{t}^{i, j} ) &{}\gamma _t^i\in \bar{{\mathcal {A}}}_t^i\\ 0&{}\text {otherwise} \end{array}\right. } \end{aligned}$$

where \(\bar{{\mathcal {A}}}_t^i\subset {\mathcal {A}}_t^i\) is the set of simple prescriptions. Then, using arguments similar to those in the proof of Lemma 1 one can show that \({\bar{g}}^i\) is payoff-equivalent to \({\bar{\varphi }}^i\), and hence payoff-equivalent to \(\mu ^i\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, D., Tavafoghi, H., Subramanian, V. et al. Dynamic Games Among Teams with Delayed Intra-Team Information Sharing. Dyn Games Appl 13, 353–411 (2023). https://doi.org/10.1007/s13235-022-00424-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13235-022-00424-4

Keywords