1 Introduction

Throughout the paper, we consider a sequential two-player two-stage non-cooperative game \(\varGamma \), also called Stackelberg game: The players have a continuum of actions and one player acting in the second stage, henceforth called follower, makes a choice after having observed the choice taken by one player acting in the first stage, henceforth called leader.

We denote by X and \(L\) the set of actions and the payoff function of the leader, respectively, and by Y and \(F\) the set of actions and the payoff function of the follower, respectively, with L and F real-valued functions defined on \(X\times Y\). The solution concept we consider is the subgame perfect Nash equilibrium concept (SPNE for short), a well-known refinement of the Nash equilibrium widely used in dynamic games ([38]; see also, e.g., [20, 28]). The follower, acting after the leader, chooses a \(y\in Y\) which maximizes the function \(z\mapsto F(x,z)\) once the leader has played \(x\in X\). So, if the leader foresees the conditional optimal reaction \(\bar{\varphi }(x)\in Y\) of the follower for any \(x\in X\), namely he foresees the follower’s strategy \(\bar{\varphi }\), the leader should lead out with some \(\bar{x}\in X\) which maximizes the function \(x\mapsto L(x,\bar{\varphi }(x))\). However, this task is hard for the leader since the optimal reaction of the follower to any choice of the leader is not always unique (i.e., the follower’s best reply correspondence is not single-valued) and, consequently, multiple SPNEs could come up.

Hence, we introduce a constructive method in order to select an SPNE by using a learning approach with the following features:

  1. (i)

    it relieves the leader of learning the follower’s best reply correspondence;

  2. (ii)

    it allows to overcome the difficulties deriving from the possible non-single-valuedness of the best reply correspondence of the follower;

  3. (iii)

    it has a behavioral interpretation linked to the costs that players face when deviating from their current actions.

In fact, we recursively define a sequence \((\varGamma _n)_n\) of perturbed Stackelberg games in which the follower’s best reply correspondence is single-valued (i.e., a sequence of classical Stackelberg games, see  [4, 43]) and a sequence of strategy profiles \((x_n,\varphi _n)_n\) such that \((x_n,\varphi _n)\) is an SPNE of \(\varGamma _n\) for any \(n\in \mathbb {N}\): The payoff functions of both players in \(\varGamma _n\) are obtained by subtracting to the payoff functions of \(\varGamma \) a quadratic term depending on the SPNE reached in \(\varGamma _{n-1}\). Consequently, \((x_n,\varphi _n)\), SPNE of \(\varGamma _n\), is an update of \((x_{n-1},\varphi _{n-1})\), SPNE of \(\varGamma _{n-1}\). It will be shown that the limit of such sequence of SPNEs generates an SPNE of \(\varGamma \).

The quadratic term represents a physical and behavioral cost to move, embedding the idea that in real life changing an action or improving the quality of actions has a cost [2, 3]. The mathematical tools underlying costs to move involve the proximal point methods, a class of optimization techniques based on the Moreau–Yosida regularization ([27, 29, 36], see also [1] and the references therein). Such methods have already been used to construct Nash equilibria in one-stage games (see, e.g., [3, 17, 18, 32]) and to define a new Nash equilibrium refinement for one-stage games when there is uncertainty related to players’ strategies (see [6]).

To the best of our knowledge, a learning method based on costs to move has never been used before to construct an SPNE in Stackelberg games, whereas a first attempt to approach an SPNE in a constructive way in Stackelberg games is due to Morgan and Patrone [31] where Tikhonov regularization [41] has been exploited. Nevertheless, although such a regularization allows to generate a sequence of games where the follower’s best reply correspondence is single-valued, the method used in Morgan and Patrone [31] does not display a behavioral interpretation.

We emphasize that the idea we propose in order to approach an equilibrium is in the same spirit of the theory of equilibrium refinements for normal form games based on perturbations of the data of the game (see, e.g., [14, 19, 21, 33, 34, 39, 44]).

The paper is structured as follows. In Sect. 2, the method used to approach an SPNE is formulated and further detailed interpretations are provided. Results about the existence of an SPNE achievable via the above-mentioned method are presented in Sect. 3. Connections with the method proposed in Morgan and Patrone [31] and with other solution concepts for Stackelberg games are provided in Sect. 4. Finally, in Sect. 5 conclusions and possible directions for future research are discussed.

2 Constructive Procedure and Interpretation

Let \(\varGamma \) be a Stackelberg game. We identify \(\varGamma \) by the pair (LF), since the actions sets X and Y are fixed throughout the paper. Let us denote by \(\mathcal {Y}\) the set-valued map that associates with each \(x\in X\) the set \(\mathcal {Y}(x)\) of follower’s optimal reactions to x, that is,

$$\begin{aligned} \begin{aligned} \mathcal {Y}(x)&{:}{=}\mathop {{{\mathrm{Arg\,max}}}}\limits _{y\in Y}{F(x,y)}\\&= \{ y\in Y \mid F(x,y)\ge F(x,y^\prime ),\,\text {for any}\, y^\prime \in Y \}. \end{aligned} \end{aligned}$$
(1)

The set-valued map \(\mathcal {Y}\) is the so-called follower’s best reply correspondence. When \(\mathcal {Y}\) is a single-valued map, i.e., \(\mathcal {Y}(x)=\{br(x)\}\) for any \(x\in X\), the function br is called follower’s best reply function.

Denoted with \(Y^X{:}{=}\{\varphi \mid \varphi :X\rightarrow Y\}\) the set of follower’s strategies, we recall that a strategy profile \((\bar{x},\bar{\varphi })\in X\times Y^X\) is an SPNE of \(\varGamma \) if the following conditions are satisfied:

(SG1):

for each choice x of the leader, the follower reacts maximizing his payoff function, i.e., for any \(x\in X\):

$$\begin{aligned} \bar{\varphi }(x)\in \mathcal {Y}(x); \end{aligned}$$
(2)
(SG2):

the leader maximizes his payoff function taking into account his hierarchical advantage, i.e.,

$$\begin{aligned} \bar{x}\in \mathop {{{\mathrm{Arg\,max}}}}\limits _{x\in X}{L(x,\bar{\varphi }(x))}. \end{aligned}$$
(3)

Before presenting a constructive procedure in order to select an SPNE when the best reply correspondence \(\mathcal {Y}\) is not known to be single-valued, we define a class of games for which such an SPNE is achievable through this procedure.

Definition 1

A Stackelberg game \(\varGamma =(L,F)\) belongs to the family \(\mathcal {G}\) if the following assumptions are satisfied:

\(({\mathcal {A}})\) :

X and Y are non-empty compact subsets of the Euclidean spaces \(\mathbb {X}\) (with norm \(\Vert \cdot \Vert _{\mathbb {X}}\)) and \(\mathbb {Y}\) (with norm \(\Vert \cdot \Vert _{\mathbb {Y}}\)), respectively, and Y is also convex;

\(({\mathcal {L}})\) :

L is upper semicontinuous on \(X\times Y\) and \(L(x,\cdot )\) is lower semicontinuous on Y, for any \(x\in X\);

\(({\mathcal {F}} 1)\) :

F is upper semicontinuous on \(X\times Y\);

\(({\mathcal {F}} 2)\) :

for any \((x,y)\in X\times Y\) and for any sequence \((x_k)_k\subseteq X\) converging to x there exists a sequence \(({\tilde{y}}_k)_k\subseteq Y\) converging to y such that

$$\begin{aligned} \liminf _{k\rightarrow +\infty }F(x_k,{\tilde{y}}_k)\ge F(x,y); \end{aligned}$$
\(({\mathcal {F}} 3)\) :

\(F(x,\cdot )\) is concave on Y, for any \(x\in X\).

Remark 1

(on discontinuity) Requiring \(({\mathcal {F}} 1)\)\(({\mathcal {F}} 3)\) is weaker than requiring the continuity of F. Indeed, the function F defined on \(X\times Y\), where \(X=[1,2]\) and \(Y=\overline{B}_{((1,0),1)}\) (i.e., Y is the closed ball in \(\mathbb {R}^2\) centered in (1, 0) with radius 1), by

$$\begin{aligned} F(x,(y_1,y_2))= {\left\{ \begin{array}{ll} -\frac{y_2^2}{2y_1}x, &{} \text{ if } (y_1,y_2)\ne (0,0) \\ 0, &{} \text{ if } (y_1,y_2)=(0,0), \end{array}\right. } \end{aligned}$$

satisfies \(({\mathcal {F}} 1)\)\(({\mathcal {F}} 3)\), but \(F(x,\cdot )\) is not lower semicontinuous at (0, 0), for any \(x\in [1,2]\).

The main computations for this result are provided in the updated version of CSEF Working Paper #476 at www.csef.it.

Remark 2

(on variational convergences) Assumptions \(({\mathcal {F}} 1)\)\(({\mathcal {F}} 2)\) have implications in terms of \(\varGamma \)-convergence or epiconvergence (see, e.g., [1, 13]). Indeed, let \(x\in X\) and let \((x_k)_k\subseteq X\) be a sequence converging to x and consider the following real-valued functions defined on Y by

$$\begin{aligned} \begin{aligned} W_k(y)&=F(x_k,y)\text {, for any }k\in \mathbb {N}, \\ W(y)&=F(x,y). \end{aligned} \end{aligned}$$

Then, the sequence of functions \((W_k)_k\)\(\varGamma ^+\)-converges to W (that is, \((-W_k)_k\) epiconverges to \(-W\)).

In the following remark, some properties of the family \(\mathcal {G}\) are stated. The proofs can be obtained by using \(\varGamma \)-convergence results (see, e.g., Proposition 6.21 and Proposition 6.16 in [13]).

Remark 3

(on properties of \(\mathcal {G}\)) Assume \((U,V)\in \mathcal {G}\) and \((\hat{U},\hat{V})\in \mathcal {G}\).

  1. (i)

    The game \((h U,k V)\in \mathcal {G}\) for any \(h,k\ge 0\).

  2. (ii)

    If Z is continuous, then the game \((U+\hat{U},V+\hat{V})\in \mathcal {G}\).

  3. (iii)

    If \(\varPsi \) and \(\varPhi \) are real-valued functions defined on \(\mathbb {R}\) with \(\varPsi \) continuous and \(\varPhi \) increasing and concave, then the game \((\varPsi \circ U,\varPhi \circ V)\in \mathcal {G}\).

The Costs to Move Procedure (\({\mathcal {CM}}\)) defined below illustrates the learning method that we use to construct recursively a sequence of perturbed games \((\varGamma _n)_n\) and a sequence of strategy profiles \((\bar{x}_n,\varphi _n)_n\).

figure a

Procedure (\({\mathcal {CM}}\)) is well-defined when \(F_n(x,\cdot )\) has a unique maximizer on Y, for any \(x\in X\) and for any \(n\in \mathbb {N}\), and when \(L_n(\cdot ,\varphi _n(\cdot ))\) admits a maximizer on X, for any \(n\in \mathbb {N}\). For the class of games introduced in Definition 1, such properties are satisfied, as it is proved in the next proposition.

Proposition 1

Assume that \(\varGamma \in \mathcal {G}\). Then, Procedure (\({\mathcal {CM}}\)) is well-defined and \(\varphi _n\) is a continuous function on X, for any \(n\in \mathbb {N}\).

Proof

We prove the result by induction on n. Let \(n=1\). By Remark 3(ii), \(\varGamma _1\in \mathcal {G}\). Moreover, \(F_1(x,\cdot )\) is strictly concave for any \(x\in X\); therefore, \(\varphi _1(x)\) is well-defined and the follower’s best reply correspondence in \(\varGamma _1\) is single-valued. Since \(\varGamma _1\in \mathcal {G}\), in particular,

(a\(_1\)):

\(F_1\) is upper semicontinuous on \(X\times Y\),

(b\(_1\)):

for any \((x,y)\in X\times Y\) and for any sequence \((x_k)_k\subseteq X\) converging to x, there exists a sequence \(({\tilde{y}}_k)_k\subseteq Y\) converging to y such that

$$\begin{aligned} \liminf _{k\rightarrow +\infty }F_1(x_k,{\tilde{y}}_k)\ge F_1(x,y). \end{aligned}$$

Conditions (a\(_1\)) and (b\(_1\)) are sufficient to guarantee that \(\lim _{k\rightarrow +\infty }\varphi _1(x_k)=\varphi _1(x)\) for any sequence \((x_k)_k\) converging to x, i.e., that \(\varphi _1\) is continuous (see, e.g., Proposition 5.1 in [30]). This fact and the upper semicontinuity of \(L_1\) ensure that \(\bar{x}_1\) is well-defined. Hence, the base case is proved.

Assume that the result holds for \(n>1\), so the strategy profile \((\bar{x}_n,\varphi _n)\) is well-defined and \(\varphi _n\) is a continuous function. In light of Remark 3(ii), \(\varGamma _{n+1}\in \mathcal {G}\) since \(\varphi _n\) is continuous. Furthermore \(F_{n+1}(x,\cdot )\) is strictly concave for any \(x\in X\), so \(\varphi _{n+1}(x)\) is well-defined and \(\varphi _{n+1}\) is the follower’s best reply function in \(\varGamma _{n+1}\). As \(\varGamma _{n+1}\in \mathcal {G}\), then

(a\(_{n+1}\)):

\(F_{n+1}\) is upper semicontinuous on \(X\times Y\),

(b\(_{n+1}\)):

for any \((x,y)\in X\times Y\) and for any sequence \((x_k)_k\subseteq X\) converging to x, there exists a sequence \(({\tilde{y}}_k)_k\subseteq Y\) converging to y such that

$$\begin{aligned} \liminf _{k\rightarrow +\infty }F_{n+1}(x_k,{\tilde{y}}_k)\ge F_{n+1}(x,y). \end{aligned}$$

By (a\(_{n+1}\)) and (b\(_{n+1}\)), it follows that \(\varphi _{n+1}\) is continuous (again in light of, e.g., Proposition 5.1 in [30]). Hence \(\bar{x}_{n+1}\) is well-defined, since \(L_{n+1}\) is upper semicontinuous. So the inductive step is proved and the proof is complete. \(\square \)

Note that the second part of assumption \(({\mathcal {L}})\) in the definition of the family \(\mathcal {G}\) (i.e., the lower semicontinuity of \(L(x,\cdot )\) for any \(x\in X\)) is unnecessary in the proof of Proposition 1. We assumed \(\varGamma \in \mathcal {G}\) in the proposition only for simplicity of exposition.

Interpretation of the procedure At the generic step \(({\mathcal {S}}_n)\) of the procedure, the follower chooses his strategy \(\varphi _n\) taking into account his previous strategy \(\varphi _{n-1}\). In making such a choice, he finds an action that compromises between maximizing \(F(x,\cdot )\) and being near to \(\varphi _{n-1}(x)\), for any \(x\in X\). The latter purpose is motivated according to an anchoring effect:

agents have a (local) vision of their environment which depends on their current actions. Each action is anchored to the preceding one, which means that the perception the agents have of the quality of their subsequent actions depends on the current ones. In economics and management, one may think of actions as routines, ways of doing, while costs to change reflect the difficulty of quitting a routine or entering another one or reacting quickly [3, p. 1066].

Such an anchoring effect is formulated by subtracting a quadratic slight cost to move that reflects the difficulty of changing the previous action. The coefficient \(\gamma _{n-1}\) is linked to the per unit of distance cost to move of the follower, and it is related to the trade-off parameter between maximizing \(F(x,\cdot )\) and minimizing the distance from \(\varphi _{n-1}(x)\). Since the same arguments apply for the preceding steps until going up to step \(({\mathcal {S}}_1)\), it follows that \(\varphi _n(x)\) and the limit of \(\varphi _n(x)\) embed the willingness of being near to \(\bar{y}_0\). Analogous observations hold also for the leader, who chooses an action having in mind to be near to his previous choices, and therefore even with the purpose of being near to \(\bar{x}_0\).

The use of proximal point methods, underlying costs to move, has also the advantage to regularize even in situations where the functions are possibly non-smooth and extended real- valued (for a more detailed discussion on proximal point methods and their interpretations, see [35]).

In the proof of Proposition 1, we showed that the follower’s best reply correspondence in \(\varGamma _n\) is single-valued, i.e., \(\varGamma _n\) is a classical Stackelberg game. Moreover, the follower’s best reply function \(\varphi _n\) in \(\varGamma _n\) is continuous and the strategy profile \((\bar{x}_n,\varphi _n)\) is an SPNE of \(\varGamma _n\), for any \(n\in \mathbb {N}\). Hence, Procedure (\({\mathcal {CM}}\)) allows to define a perturbation of the game \(\varGamma \) consisting of the sequence of classical Stackelberg games \((\varGamma _n)_n\) and to construct a sequence of SPNEs related to such a perturbation.

In the next proposition, we prove that the limit of the sequence \((\varphi _n)_n\) is a selection of the follower’s best reply correspondence. The pointwise convergence of \((\varphi _n)_n\) is obtained by adapting to a parametric optimization context a classical result about the convergence of proximal point methods. Before showing the result, we state the following lemma.

Lemma 1

(on parametric proximal point methods) Let G be a real-valued function defined on \(X\times Y\) and \(\bar{G}\) be the extended real-valued function defined on \(X\times \mathbb {Y}\) by

$$\begin{aligned} \bar{G}(x,y)={\left\{ \begin{array}{ll} G(x,y), &{} \text{ if } y\in Y \\ {-}\infty , &{} \text{ if } y\notin Y. \end{array}\right. } \end{aligned}$$
(4)

Let \(x\in X\). If the function \(G(x,\cdot )\) is upper semicontinuous and concave on Y, then

  1. (i)

    the function \(\bar{G}(x,\cdot )\) is upper semicontinuous and concave on \(\mathbb {Y};\)

  2. (ii)

    \({{\mathrm{Arg\,max}}}_{y\in Y}G(x,y)={{\mathrm{Arg\,max}}}_{y\in \mathbb {Y}}\bar{G}(x,y);\)

  3. (iii)

    \({{\mathrm{Arg\,max}}}_{y\in Y}G(x,y)-\frac{1}{2\lambda }\Vert y-v\Vert _\mathbb {Y}^2={{\mathrm{Arg\,max}}}_{y\in \mathbb {Y}}\bar{G}(x,y)-\frac{1}{2\lambda }\Vert y-v\Vert _\mathbb {Y}^2\), for any \(\lambda >0\) and \(v\in \mathbb {Y};\)

  4. (iv)

    \(\varphi ^*(x)\in {{\mathrm{Arg\,max}}}_{y\in Y}G(x,y)\Longleftrightarrow \{\varphi ^*(x)\}={{\mathrm{\mathbf {prox}}}}{}_{\lambda , G(x,\cdot )}(\varphi ^*(x))\), for any \(\lambda >0\), where \({{\mathrm{\mathbf {prox}}}}{}_{\lambda , G(x,\cdot )}(v){:}{=}{{\mathrm{Arg\,max}}}_{y\in Y}G(x,y)-\frac{1}{2\lambda }\Vert y-v\Vert _\mathbb {Y}^2\), for any \(v\in \mathbb {Y}\).

Proof

Claims (i)–(iii) are immediate; the proof of claim (iv) is analogous to the one, for example, at the beginning of Section 2.3 in Parikh and Boyd [35], taking into account claims (i)–(iii). \(\square \)

Proposition 2

Assume that \(({\mathcal {A}})\), \(({\mathcal {F}} 1)\) and \(({\mathcal {F}} 3)\) hold. Then, the sequence \((\varphi _n)_n\) pointwise converges to a function \(\varphi \in Y^X\) and \(\varphi (x)\in \mathcal {Y}(x)\) for any \(x\in X\), where \(\mathcal {Y}(x)={{\mathrm{Arg\,max}}}_{y\in Y}{F(x,y)}\).

Proof

Let \(x\in X\). By assumptions \(({\mathcal {F}} 1)\) and \(({\mathcal {F}} 3)\) and Lemma 1(i), the function \(-\bar{F}(x,\cdot )\), where \(\bar{F}\) is defined on \(X\times \mathbb {Y}\) by

$$\begin{aligned} \bar{F}(x,y)={\left\{ \begin{array}{ll} F(x,y), &{} \text{ if } y\in Y \\ -\infty , &{} \text{ if } y\notin Y, \end{array}\right. } \end{aligned}$$

is lower semicontinuous and convex, is not identically \(+\infty \) and does not assume the value \(-\infty \) (i.e., \(-\bar{F}(x,\cdot )\) is a proper lower semicontinuous convex function). Moreover, in light of Lemma 1(ii), the compactness of Y and assumption \(({\mathcal {F}} 1)\),

$$\begin{aligned} \mathop {{{\mathrm{Arg\,min}}}}\limits _{y\in \mathbb {Y}}-\bar{F}(x,y)=\mathop {{{\mathrm{Arg\,max}}}}\limits _{y\in \mathbb {Y}}\bar{F}(x,y)\ne \emptyset . \end{aligned}$$

Given the above, and since \(\lim _{n\rightarrow +\infty }\gamma _n=+\infty \) with \((\gamma _n)_n\subseteq ]0,+\infty [\), the function \(-\bar{F}(x,\cdot )\) satisfies the hypotheses for the convergence of proximal point methods stated in [5], Theorem 27.1. Then, the sequence \((z_n)_n\) defined by

$$\begin{aligned} \{z_{n}\}{:}{=}\mathop {{{\mathrm{Arg\,min}}}}\limits _{y\in \mathbb {Y}}{-\bar{F}(x,y)+\frac{1}{2\gamma _{n-1}}\Vert y-z_{n-1}\Vert ^2_{\mathbb {Y}}}\quad \text {for any }n\in \mathbb {N}, \end{aligned}$$

where \(z_0{:}{=}\bar{y}_0\), converges to a point in \({{\mathrm{Arg\,min}}}_{y\in \mathbb {Y}}{-\bar{F}(x,y)}\) by Theorem 27.1 in Bauschke and Combettes [5]. So, equivalently,

$$\begin{aligned} \{z_{n}\}=\mathop {{{\mathrm{Arg\,max}}}}\limits _{y\in \mathbb {Y}}{\bar{F}(x,y)-\frac{1}{2\gamma _{n-1}}\Vert y-z_{n-1}\Vert ^2_{\mathbb {Y}}}\quad \text {for any }n\in \mathbb {N}, \end{aligned}$$

and \((z_n)_n\) converges to a point in \({{\mathrm{Arg\,max}}}_{y\in \mathbb {Y}}{\bar{F}(x,y)}\). Since the unique maximizer of \(\bar{F}(x,\cdot )-\frac{1}{2\gamma _{n-1}}\Vert \,\cdot \,-\varphi _{n-1}(x)\Vert ^2_\mathbb {Y}\) over \(\mathbb {Y}\) coincides with the unique maximizer of \(F(x,\cdot )-\frac{1}{2\gamma _{n-1}}\Vert \,\cdot \,-\varphi _{n-1}(x)\Vert ^2_\mathbb {Y}\) over Y in light of Lemma 1(iii), then \(z_n=\varphi _n(x)\) for any \(n\in \mathbb {N}\). Furthermore, since the set of maximizers of \(\bar{F}(x,\cdot )\) over \(\mathbb {Y}\) coincides with the set of maximizers of \(\bar{F}(x,\cdot )\) over Y in light of Lemma 1(ii), sequence \((\varphi _n(x))_n\) converges to a maximizer of \(F(x,\cdot )\) over Y. Hence, the function \(\varphi \) that associates with each \(x\in X\) the point \(\varphi (x){:}{=}\lim _{n\rightarrow +\infty }\varphi _n(x)\in Y\) is well-defined and \(\varphi (x)\in \mathcal {Y}(x)\) for any \(x\in X\). \(\square \)

3 SPNE Existence Result

The next theorem provides an existence result of an SPNE achievable via Procedure (\({\mathcal {CM}}\)) for \(\varGamma =(L,F)\in \mathcal {G}\). Recall that \((\bar{x}_n,\varphi _n)_n\) is the sequence of strategy profiles generated by Procedure (\({\mathcal {CM}}\)), which is well-defined in light of Proposition 1.

Theorem 1

Assume that \(\varGamma \in \mathcal {G}\) and that the sequence of action profiles \((\bar{x}_n,\varphi _n(\bar{x}_n))_n\subseteq X\times Y\) converges to \((\bar{x},\bar{y})\in X\times Y\). Then, the strategy profile \((\bar{x}, \bar{\varphi })\in X\times Y^X\), where

$$\begin{aligned} \bar{\varphi }(x){:}{=}{\left\{ \begin{array}{ll} \bar{y}, &{} \text{ if } x=\bar{x} \\ \lim _{n\rightarrow +\infty }\varphi _n(x), &{} \text{ if } x\ne \bar{x}, \end{array}\right. } \end{aligned}$$

is a subgame perfect Nash equilibrium of \(\varGamma \).

Proof

We start to prove (SG1). Let \(x\in X\) and \(\varphi (x)=\lim _{n\rightarrow +\infty }\varphi _n(x)\), as defined in Proposition 2. If \(x\ne \bar{x}\), Proposition 2 ensures that \(\bar{\varphi }(x)=\varphi (x)\in \mathcal {Y}(x)\). If \(x=\bar{x}\), pick \(y\in Y\). By assumption \(({\mathcal {F}} 2)\), there exists a sequence \((\tilde{y}_n)_n\) converging to y such that

$$\begin{aligned} \liminf _{n\rightarrow +\infty }F(\bar{x}_n,\tilde{y}_n)\ge F(\bar{x},y). \end{aligned}$$
(5)

By \(({\mathcal {F}} 1)\) we have:

$$\begin{aligned} \begin{aligned} F(\bar{x},\bar{y})&\ge \limsup _{n\rightarrow +\infty }{F(\bar{x}_n,\varphi _n(\bar{x}_n))} \\&= \limsup _{n\rightarrow +\infty }\left[ {F(\bar{x}_n,\varphi _n(\bar{x}_n))-\frac{1}{2\gamma _{n-1}}\Vert \varphi _n(\bar{x}_n)-\varphi _{n-1}(\bar{x}_n)\Vert _\mathbb {Y}^2}\right] \\&= \limsup _{n\rightarrow +\infty } F_n(\bar{x}_n,\varphi _n(\bar{x}_n)), \end{aligned} \end{aligned}$$
(6)

where the first equality holds since the second addend in the \(\limsup \) converges to 0 being \((\gamma _n)_n\) a divergent sequence of positive real numbers and Y a compact set and the second equality comes from the definition of \(F_n\) in Procedure (\({\mathcal {CM}}\)). By the definition of \(\varphi _n(\bar{x}_n)\), we get

$$\begin{aligned} \begin{aligned} \limsup _{n\rightarrow +\infty }{}&{F_n(\bar{x}_n,\varphi _n(\bar{x}_n))} \ge \limsup _{n\rightarrow +\infty } F_n(\bar{x}_n,\tilde{y}_n) \\&= \limsup _{n\rightarrow +\infty }\left[ {F(\bar{x}_n,\tilde{y}_n)-\frac{1}{2\gamma _{n-1}}\Vert \tilde{y}_n-\varphi _{n-1}(\bar{x}_n)\Vert _\mathbb {Y}^2}\right] . \end{aligned} \end{aligned}$$
(7)

Recalling the properties of \((\gamma _n)_n\) and the compactness of Y, by (5)–(7) we have

$$\begin{aligned} F(\bar{x},\bar{y})&\ge \limsup _{n\rightarrow +\infty }\left[ {F(\bar{x}_n,\tilde{y}_n)-\frac{1}{2\gamma _{n-1}}\Vert \tilde{y}_n-\varphi _{n-1} (\bar{x}_n)\Vert _\mathbb {Y}^2}\right] \\&= \limsup _{n\rightarrow +\infty }F(\bar{x}_n,\tilde{y}_n) \ge \liminf _{n\rightarrow +\infty }F(\bar{x}_n,\tilde{y}_n) \ge F(\bar{x},y). \end{aligned}$$

Hence, \(\bar{y}\in \mathcal {Y}(\bar{x})\) and (SG1) is satisfied.

In order to prove condition (SG2), we have to show that \(L(\bar{x}, \bar{y})\ge L(x, \bar{\varphi }(x))\) for any \(x\in X\). So, let \(x\in X{\setminus }\{\bar{x}\}\). In light of \(({\mathcal {L}})\), we get

$$\begin{aligned} L(\bar{x},\bar{y})&\ge \limsup _{n\rightarrow +\infty }{L(\bar{x}_n,\varphi _n(\bar{x}_n))}\\&= \limsup _{n\rightarrow +\infty }{\left[ L(\bar{x}_n,\varphi _n(\bar{x}_n))-\frac{1}{2\beta _{n-1}}\Vert \bar{x}_n-\bar{x}_{n-1}\Vert _\mathbb {X}^2\right] }\\&\ge \limsup _{n\rightarrow +\infty }{\left[ L( x,\varphi _n( x))-\frac{1}{2\beta _{n-1}}\Vert x-\bar{x}_{n-1}\Vert _\mathbb {X}^2\right] }\\&\ge \liminf _{n\rightarrow +\infty }{\left[ L( x,\varphi _n( x))-\frac{1}{2\beta _{n-1}}\Vert x-\bar{x}_{n-1}\Vert _\mathbb {X}^2\right] }\\&= \liminf _{n\rightarrow +\infty }{L(x,\varphi _n( x))}\ge L(x, \varphi (x)), \end{aligned}$$

where the first (respectively second) equality holds since the second addend in the \(\limsup \) (resp. \(\liminf \)) converges to 0 being \((\beta _n)_n\) a divergent sequence of positive real numbers and X a compact set, the second inequality comes from the definition of \(\bar{x}_n\) in Procedure (\({\mathcal {CM}}\)), and the last inequality follows by \(({\mathcal {L}})\). As \(x\in X{\setminus }\{\bar{x}\}\), then \(L(x, \varphi (x))=L(x,\bar{\varphi }(x))\) and, therefore, \(L(\bar{x},\bar{y})\ge L(x,\bar{\varphi }(x))\). Hence (SG2) holds, and the proof is complete. \(\square \)

Remark 4

(on the dependence on \((\bar{x}_0,\bar{y}_0)\)) The SPNE selected according to Theorem 1 is affected, in general, by the choice of the initial point \((\bar{x}_0,\bar{y}_0)\) in Procedure (\({\mathcal {CM}}\)): In fact, such an SPNE reflects both the leader’s willingness of being near to \(\bar{x}_0\) and the follower’s willingness of being near to \(\bar{y}_0\), as discussed in the interpretation of the procedure in Sect. 2.

The next trivial example, whose main computations are provided in the updated version of CSEF Working Paper #476 at www.csef.it, emphasizes this dependence especially from the follower’s perspective, whereas in Examples 2 and 3 these insights are more evident also from the leader’s point of view.

Example 1

Let \(X=Y=[-1,1]\) and \(\varGamma =(L,F)\) where

$$\begin{aligned} L(x,y)=x, \qquad F(x,y)=-xy. \end{aligned}$$

The follower’s best reply correspondence \(\mathcal {Y}\) is defined on \([-1,1]\) by

$$\begin{aligned} \mathcal {Y}(x)= {\left\{ \begin{array}{ll} \{1\}, &{} \text{ if } x \in [-1,0[ \\ {[}-1,1], &{} \text{ if } x=0 \\ \{-1\}, &{} \text{ if } x \in ]0,1]. \end{array}\right. } \end{aligned}$$
(8)

Let \((\bar{x}_0,\bar{y}_0)\in [-1,1]\times [-1,1]\) be the initial point of the procedure and let \(\beta _n=\gamma _n={2^n}\) for any \(n\in \mathbb {N}\cup \{0\}\). Then, Procedure (\({\mathcal {CM}}\)) generates the following sequence \((\bar{x}_n,\varphi _n)_{n}\) of strategy profiles:

$$\begin{aligned} \bar{x}_n={\left\{ \begin{array}{ll} \min \{1+\bar{x}_0,1\}, &{} \text{ if } n=1 \\ 1, &{} \text{ if } n\ge 2, \end{array}\right. }\qquad {\varphi }_n(x)={\left\{ \begin{array}{ll} 1, &{} \text{ if } x\in \big [-1,\frac{\bar{y}_0-1}{a_n}\big [ \\ \bar{y}_0-a_nx, &{} \text{ if } x\in \left[ \frac{\bar{y}_0-1}{a_n},\frac{\bar{y}_0+1}{a_n}\right] \\ -1, &{} \text{ if } x\in \big ]\frac{\bar{y}_0+1}{a_n},1\big ], \end{array}\right. } \end{aligned}$$
(9)

where the sequence \((a_n)_n\) is recursively defined by

$$\begin{aligned} {\left\{ \begin{array}{ll} a_1=1 \\ a_{n+1}=a_n+2^n \quad \text {for any } n\ge 1. \end{array}\right. } \end{aligned}$$

The SPNE of \(\varGamma \) selected according to Theorem 1 is \((\bar{x},\bar{\varphi })\), where

$$\begin{aligned} \bar{x}=1,\qquad \bar{\varphi }(x)={\left\{ \begin{array}{ll} 1, &{} \text{ if } x \in [-1,0[ \\ \bar{y}_0, &{} \text{ if } x=0\\ -1, &{} \text{ if } x\in ]0,1]. \end{array}\right. } \end{aligned}$$
(10)

Let us note that all the SPNEs of \(\varGamma \) are obtained when varying \(\bar{y}_0\in [-1,1]\) in (10). Hence, \(\bar{\varphi }\) is, among all the follower’s strategies being part of an SPNE, the follower’s strategy such that \(\bar{\varphi }(x)\) minimizes the distance from the follower’s initial point \(\bar{y}_0\), for any \(x\in [-1,1]\). Therefore, the SPNE constructed by our method is the nearest SPNE to the initial point \((\bar{x}_0,\bar{y}_0)\) in the sense illustrated in Sect. 2, Interpretation of the procedure.

Remark 5

(on the pointwise limit of \((\varphi _n)_n\)) The follower’s strategy \(\bar{\varphi }\) in the SPNE defined according to Theorem 1 differs from the pointwise limit \(\varphi \) of sequence \((\varphi _n)_n\) at most in one point. In fact, if the two limits

$$\begin{aligned} \lim _{n\rightarrow +\infty }{\varphi _n(\bar{x}_n)}\quad \text {and}\quad \lim _{n\rightarrow +\infty }{\varphi _n(\bar{x})}, \end{aligned}$$
(11)

where \(\bar{x}=\lim _{n\rightarrow +\infty }\bar{x}_n\), coincide, then \(\bar{\varphi }(x)=\varphi (x)\) for any \(x\in X\) and the strategy profile \((\bar{x},\varphi )\) is an SPNE of \(\varGamma \) in light of Theorem 1. Instead, if the two limits in (11) do not coincide, then \(\bar{\varphi }(\bar{x})\ne \varphi (\bar{x})\) and the strategy profile \((\bar{x},\varphi )\) could be not an SPNE of \(\varGamma \); hence, we need the follower’s strategy \(\bar{\varphi }\) as in statement of Theorem 1 in order to get an SPNE. The following two examples illustrate the two cases described above: In the first one the two limits in (11) are equal, whereas in the second one the two limits in (11) are different.

The main computations of both examples are provided in the updated version of CSEF Working Paper #476 at www.csef.it.

Example 2

Let \(X=Y=[-1,1]\) and \(\varGamma =(L,F)\) where

$$\begin{aligned} L(x,y)=y, \qquad F(x,y)=-xy. \end{aligned}$$

The follower’s best reply correspondence \(\mathcal {Y}\) is defined on \([-1,1]\) by

$$\begin{aligned} \mathcal {Y}(x)= {\left\{ \begin{array}{ll} \{1\}, &{} \text{ if } x \in [-1,0[ \\ {[}-1,1], &{} \text{ if } x=0 \\ \{-1\}, &{} \text{ if } x \in ]0,1]. \end{array}\right. } \end{aligned}$$
(12)

Let \((\bar{x}_0,\bar{y}_0)=(1,1)\) be the initial point of the procedure and let \(\beta _n=\gamma _n={2^n}\) for any \(n\in \mathbb {N}\cup \{0\}\). Then Procedure (\({\mathcal {CM}}\)) generates the following sequence \((\bar{x}_n,\varphi _n)_{n}\) of strategy profiles:

$$\begin{aligned} \bar{x}_n=0,\qquad {\varphi }_n(x)={\left\{ \begin{array}{ll} 1, &{} \text{ if } x\in [-1,0[ \\ 1-a_nx, &{} \text{ if } x\in \left[ 0,{2}/{a_n}\right] \\ -1, &{} \text{ if } x\in ]{2}/{a_n},1], \end{array}\right. } \end{aligned}$$
(13)

where the sequence \((a_n)_n\) is recursively defined by

$$\begin{aligned} {\left\{ \begin{array}{ll} a_1=1 \\ a_{n+1}=a_n+2^n \quad \text {for any } n\ge 1. \end{array}\right. } \end{aligned}$$

Hence, the SPNE of \(\varGamma \) selected according to Theorem 1 is \((\bar{x},\bar{\varphi })\), where

$$\begin{aligned} \bar{x}=0,\qquad \bar{\varphi }(x)={\left\{ \begin{array}{ll} 1, &{} \text{ if } x \in [-1,0] \\ -1, &{} \text{ if } x\in ]0,1]. \end{array}\right. } \end{aligned}$$

In this case, \(\bar{\varphi }\) coincides with the pointwise limit of \((\varphi _n)_n\) since \(\lim _n{\varphi _n(\bar{x}_n)}=1=\lim _n{\varphi _n(\lim _n\bar{x}_n)}\).

Let us note that \(\varGamma \) has infinitely many SPNEs. In fact, denoted with \(\hat{\varphi }^\alpha \) the function defined on \([-1,1]\) by

$$\begin{aligned} \hat{\varphi }^\alpha (x){:}{=}{\left\{ \begin{array}{ll} 1, &{} \text{ if } x\in [-1,0[ \\ \alpha , &{} \text{ if } x=0 \\ -1, &{} \text{ if } x\in ]0,1], \end{array}\right. } \end{aligned}$$

the set of SPNEs of \(\varGamma \) is \(\{(\hat{x},\hat{\varphi }^\alpha )\mid \hat{x}\in [-1,0[,\,\alpha \in [-1,1]\}\cup \{(0,\hat{\varphi }^1)\}\), only one of which is obtained via our method. Hence, the selection method defined by means of Procedure (\({\mathcal {CM}}\)) is effective.

Moreover, \(\bar{x}=0\) is the nearest leader’s action to \(\bar{x}_0=1\) among all the actions takeable by the leader in an SPNE, and analogously \(\bar{\varphi }\) is, among all the follower’s strategies being part of an SPNE, the follower’s strategy such that \(\bar{\varphi }(x)\) minimizes the distance from the follower’s initial point \(\bar{y}_0\), for any \(x\in [-1,1]\). So the insights illustrated in Remark 4 fit this case.

Example 3

Let \(X=[1/2,2]\), \(Y=[-1,1]\) and \(\varGamma =(L,F)\) where

$$\begin{aligned} L(x,y)=-x-y,\qquad F(x,y)={\left\{ \begin{array}{ll} 0, &{} \text{ if } x \in [1/2,1] \\ (1-x)y, &{} \text{ if } x\in ]1,2]. \end{array}\right. } \end{aligned}$$

The follower’s best reply correspondence \(\mathcal {Y}\) is given by

$$\begin{aligned} \mathcal {Y}(x)= {\left\{ \begin{array}{ll} [-1,1], &{} \text{ if } x \in [1/2,1] \\ \{-1\}, &{} \text{ if } x \in ]1,2]. \end{array}\right. } \end{aligned}$$
(14)

Let \((\bar{x}_0,\bar{y}_0)=(1,1)\) and \(\beta _n=\gamma _n=n+1\) for any \(n\in \mathbb {N}\cup \{0\}\). Then, Procedure (\({\mathcal {CM}}\)) generates the following sequence \((\bar{x}_n,\varphi _n)_{n}\) of strategy profiles:

$$\begin{aligned} \bar{x}_n={\left\{ \begin{array}{ll} {1}/{2}, &{} \text{ if } n=1 \\ 1+{2}/{a_n}, &{} \text{ if } n\ge 2, \end{array}\right. }\quad {\varphi }_n(x)={\left\{ \begin{array}{ll} 1, &{} \text{ if } x\in \left[ {1}/{2},1\right] \\ a_n+1-a_nx, &{} \text{ if } x\in ]1,1+{2}/{a_n}] \\ -1, &{} \text{ if } x\in ]1+{2}/{a_n},2], \end{array}\right. } \end{aligned}$$
(15)

where the sequence \((a_n)_n\) is recursively defined by

$$\begin{aligned} {\left\{ \begin{array}{ll} a_1=1 \\ a_{n+1}=a_n+n+1 \quad \text {for any } n\ge 1. \end{array}\right. } \end{aligned}$$

Hence, the SPNE of \(\varGamma \) selected according to Theorem 1 is \((\bar{x},\bar{\varphi })\), where

$$\begin{aligned} \bar{x}=1,\qquad \bar{\varphi }(x)={\left\{ \begin{array}{ll} 1, &{} \text{ if } x \in [{1}/{2},1[ \\ -1, &{} \text{ if } x\in [1,2]. \end{array}\right. } \end{aligned}$$
(16)

As mentioned in Remark 5, in this case

$$\begin{aligned} \lim _{n}{\varphi _n(\bar{x}_n)}=-1\ne 1=\lim _{n}{\varphi _n(\lim _{n}{\bar{x}_n})} \end{aligned}$$

and, furthermore, the strategy profile \((1,\varphi )\), where \(\varphi \) is the pointwise limit of \((\varphi _n)_n\), is not an SPNE of \(\varGamma \) since \({{\mathrm{Arg\,max}}}_{x\in [1/2,2]}{L(x,\varphi (x))}=\emptyset \).

Finally, it is worth to note that even in this example the SPNE obtained is the nearest SPNE to the initial point \((\bar{x}_0,\bar{y}_0)=(1,1)\), in the sense described in Remark 4.

Remark 6

(on the implementation of the method) The method based on Procedure (\({\mathcal {CM}}\)) could be clearly implemented in any finite game in mixed strategies and for any game where the players have a continuum of actions and the functions \(\varphi _n\) can be analytically determined for any \(n\in \mathbb {N}\).

Remark 7

(on lower semicontinuity of the correspondence \(\mathcal {Y}\)) If the sequence \((\bar{x}_n,\varphi _n(\bar{x}_n))_n\) in the statement of Theorem 1 does not converge, the thesis of Theorem 1 still holds replacing \((\bar{x},\bar{y})\) with the limit of a convergent subsequence \((\bar{x}_{n_k},\varphi _{n_k}(\bar{x}_{n_k}))_{k}\subseteq (\bar{x}_n,\varphi _n(\bar{x}_n))_n\), whose existence is guaranteed by the compactness of X and Y. Therefore, assumption \(\varGamma \in \mathcal {G}\) ensures the existence of SPNEs regardless of the lower semicontinuity of the follower’s best reply correspondence. Indeed, in the examples above, the follower’s best reply correspondences in (8), (12) and (14) are not lower semicontinuous set-valued maps.

Remark 8

(on leader’s costs to move) An existence result for SPNEs analogous to Theorem 1 can be obtained if the leader’s payoff function is not modified in Procedure (\({\mathcal {CM}}\)), that is, if the learning approach via costs to move only concerns the follower stage (i.e., \(L_n=L\), for any \(n\in \mathbb {N}\)).

The definition of \((\varphi _n)_n\) in Procedure (\({\mathcal {CM}}\)) is based on a parametric proximal point method. Since proximal point methods require that an initial point has to be fixed, we have taken in Procedure (\({\mathcal {CM}}\)) the constant function \(\varphi _0\in Y^X\) defined by \(\varphi _0(x)=\bar{y}_0\) as the follower’s initial point. However, Procedure (\({\mathcal {CM}}\)) could be also defined choosing any continuous function \(\varphi _0\in Y^X\) as follower’s initial point and all the results of Sects. 2 and 3 would be still valid (in particular, Propositions 12 and Theorem 1).

The next two propositions state some further properties of our constructive method when in Procedure (\({\mathcal {CM}}\)) the initial constant function defined by \(\bar{y}_0\) is replaced with a continuous function \(\varphi _0\in Y^X\). For the sake of simplicity, we continue to refer to \((\varphi _n)_n\) as the sequence generated by this modified procedure.

Proposition 3

Let \(\varGamma \in \mathcal {G}\) and let the follower’s initial point \(\varphi _0\in Y^X\) be a continuous function. Assume that \(\varphi _0(x)\in \mathcal {Y}(x)\) for any \(x\in X\). Then \(\varphi _n=\varphi _0\) for any \(n\in \mathbb {N}\). Moreover, \(\varphi _0\) is the strategy chosen by the follower in the SPNE selected according to Theorem 1.

Proof

We prove the first part of the result by induction. Firstly, note that the function F satisfies the assumptions of Lemma 1 as \(\varGamma \in \mathcal {G}\).

Let \(n=1\). Since \(\varphi _0(x)\in \mathcal {Y}(x)\) for any \(x\in X\), in light of Lemma 1(iv) and the definition of \(\varphi _1\), we have

$$\begin{aligned} \{\varphi _0(x)\}=\mathop {{{\mathrm{Arg\,max}}}}\limits _{y\in Y}{F(x,y)-\frac{1}{2\gamma _0}\Vert y-\varphi _0(x)\Vert _\mathbb {Y}^2}=\{\varphi _1(x)\}\text {, for any }x\in X, \end{aligned}$$

so, the base case is satisfied. Let \(n>1\) and suppose that \(\varphi _n=\varphi _0\). Then \(\varphi _n(x)\in \mathcal {Y}(x)\) for any \(x\in X\) and, by Lemma 1(iv) and definition of \(\varphi _{n+1}\), we get

$$\begin{aligned} \{\varphi _{n}(x)\}=\mathop {{{\mathrm{Arg\,max}}}}\limits _{y\in Y}{F(x,y)-\frac{1}{2\gamma _n}\Vert y-\varphi _n(x)\Vert _\mathbb {Y}^2}=\{\varphi _{n+1}(x)\}\text {, for any }x\in X, \end{aligned}$$

and thus, the inductive step is proved. Hence, \(\varphi _n=\varphi _0\) for any \(n\in \mathbb {N}\) and the first part of the proof is complete.

Since \(\varphi _n=\varphi _0\) for any \(n\in \mathbb {N}\) and \(\varphi _0\) is continuous, then for any sequence \((x_n)_n\subseteq X\) converging to \(x\in X\), the sequence \(\varphi _n(x_n)\) converges to \(\varphi _0(x)\). So, \(\varphi _0\) is the follower’s strategy in the SPNE selected according to Theorem 1. \(\square \)

Proposition 4

Let \(\varGamma \in \mathcal {G}\) and let the follower’s initial point \(\varphi _0\in Y^X\) be a continuous function. Assume that there exists \( \nu \in \mathbb {N}\) such that \(\varphi _{ \nu }=\varphi _{\nu -1}\). Then \(\varphi _{\nu }(x)\in \mathcal {Y}(x)\) for any \(x\in X\) and \(\varphi _n=\varphi _{\nu }\) for any \(n>\nu \). Moreover, \(\varphi _{\nu }\) is the strategy chosen by the follower in the SPNE selected according to Theorem 1.

Proof

By the definition of \(\varphi _{\nu }\) and since \(\varphi _{\nu }=\varphi _{\nu -1}\), we have

$$\begin{aligned} \begin{aligned} \{\varphi _{\nu }(x)\}=&\mathop {{{\mathrm{Arg\,max}}}}\limits _{y\in Y}{F(x,y)-\frac{1}{2\gamma _{\nu -1}}\Vert y-\varphi _{\nu -1}(x)\Vert _\mathbb {Y}^2} \\ =&\mathop {{{\mathrm{Arg\,max}}}}\limits _{y\in Y}{F(x,y)-\frac{1}{2\gamma _{\nu -1}}\Vert y-\varphi _{\nu }(x)\Vert _\mathbb {Y}^2}\text {, for any } x\in X. \end{aligned} \end{aligned}$$

Then, in light of Lemma 1(iv) we get \(\varphi _{\nu }(x)\in \mathcal {Y}(x)\) for any \(x\in X\).

Consider the new constructive procedure whose follower’s initial point is the continuous function \(\varphi _{\nu }\) and with \((\gamma _{\nu +n})_{n\in \mathbb {N}\cup \{0\}}\) instead of \((\gamma _n)_{n\in \mathbb {N}\cup \{0\}}\) (such a procedure is nothing but the original procedure taken away the first \(\nu -1\) steps). Applying Proposition 3, we have \(\varphi _n=\varphi _{\nu }\) for any \(n>\nu \). Given the above and by the continuity of \(\varphi _{\nu }\), arguing as in the last part of the proof of Proposition 3, it follows that \(\varphi _{\nu }\) is the strategy chosen by the follower in the SPNE selected according to Theorem 1. \(\square \)

4 Connections with Another Constructive Method and Other Solution Concepts

In this section, firstly we analyze the relation between our learning method based on costs to move and the method proposed in Morgan and Patrone [31], and then we compare the SPNE achievable via Theorem 1 with the SPNEs obtainable through the weak Stackelberg equilibrium and the strong Stackelberg equilibrium. We just investigate the connections with the above-mentioned three methods since, to our knowledge, only these ones provide the construction of an SPNE in games of perfect information where the players have a continuum of actions and, hence, also in Stackelberg games.

Before addressing such issues, we discuss whether the results in Attouch et al. [3], where an alternating algorithm involving costs to move is introduced in simultaneous one-stage games, can be used in a Stackelberg type framework to construct SPNEs. The method proposed in Attouch et al. [3], fixed an initial point \((\hat{x}_0,\hat{y}_0)\in X\times Y\), generates a sequence \((\hat{x}_n,\hat{y}_n)_n\subseteq X\times Y\) defined by

$$\begin{aligned} {\left\{ \begin{array}{ll} \,\,\hat{y}_n\in {{\mathrm{Arg\,max}}}_{y\in Y}{F(\hat{x}_{n-1},y)-\frac{1}{2\gamma _{n-1}}\Vert y-\hat{y}_{n-1}\Vert ^2}, \\ \,\,\hat{x}_n\in {{\mathrm{Arg\,max}}}_{x\in X}{L(x,\hat{y}_n)-\frac{1}{2\beta _{n-1}}\Vert x-\hat{x}_{n-1}\Vert ^2}, \end{array}\right. } \end{aligned}$$
(17)

for any \(n\in \mathbb {N}\). Remind that in Procedure (\({\mathcal {CM}}\)) at each step it is defined a strategy profile \((\bar{x}_n,\varphi _n)\in X\times Y^X\) made of a leader’s action and a follower’s strategy. Differently, the algorithm schematized in (17) constructs at each step an action profile \((\hat{x}_n,\hat{y}_n)\in X\times Y\) composed by a leader’s action and a follower’s action and, moreover, the limit \((\hat{x},\hat{y})\) is not, in general, connected to an SPNE of the Stackelberg game \(\varGamma =(L,F)\), as highlighted in the following example.

Example 4

Let \(X=Y=[-1,1]\) and \(\varGamma =(L,F)\) where

$$\begin{aligned} L(x,y)=x+y, \qquad F(x,y)=-xy. \end{aligned}$$

The follower’s best reply correspondence \(\mathcal {Y}\) is defined on \([-1,1]\) by

$$\begin{aligned} \mathcal {Y}(x)= {\left\{ \begin{array}{ll} \{1\}, &{} \text{ if } x \in [-1,0[ \\ {[}-1,1], &{} \text{ if } x=0 \\ \{-1\}, &{} \text{ if } x \in ]0,1]. \end{array}\right. } \end{aligned}$$

The game \(\varGamma \) has a unique SPNE, namely \((\bar{x},\bar{\varphi })\), where

$$\begin{aligned} \bar{x}=0,\qquad \varphi (x)={\left\{ \begin{array}{ll} 1, &{} \text{ if } x \in [-1,0] \\ -1, &{} \text{ if } x\in ]0,1]. \end{array}\right. } \end{aligned}$$

Let \((\hat{x}_0,\hat{y}_0)=(0,0)\) and \(\beta _n=\gamma _n={2^n}\) for any \(n\in \mathbb {N}\cup \{0\}\). Then, the sequence defined in (17) converges to \((\hat{x},\hat{y})=(1,-1)\), which is not related to the SPNE of \(\varGamma \) (being \(x=1\) not chosen by the leader in the SPNE).

4.1 Connections with Morgan and Patrone [31]

In Morgan and Patrone [31], a constructive method based on Tikhonov regularization is used in order to approach an SPNE in Stackelberg games. More precisely, the authors consider the following regularized second-level problem

$$\begin{aligned} P_{\alpha _n}(x):\quad \min _{y\in Y}F(x,y)+\alpha _n\Vert y\Vert ^2, \end{aligned}$$

where \(x\in X\) and \((\alpha _n)_n\) is a decreasing sequence of positive real numbers such that \(\lim _{n\rightarrow +\infty }\alpha _n=0\). Denoted by \(\bar{\rho }_n(x)\) the unique solution to \(P_{\alpha _n}(x)\) and by \(\hat{\rho }(x)\), the unique minimum norm solution to the problem

$$\begin{aligned} P(x):\quad \min _{y\in Y}F(x,y), \end{aligned}$$

classical results on Tikhonov regularization [41] ensure that the sequence \((\bar{\rho }_n(x))_n\) converges to \(\hat{\rho }(x)\). Let \(\bar{x}_n\) be a solution to the regularized problem

$$\begin{aligned} S_{\alpha _n}:\quad \min _{x\in X}L(x,\bar{\rho }_n(x)) \end{aligned}$$

and assume that the sequence \((\bar{x}_n,\bar{\rho }_n(\bar{x}_n))_n\) converges to \((\bar{x},\bar{y})\), then, under suitable assumptions, the strategy profile \((\bar{x},{\tilde{\rho }})\in X\times Y^X\) where

$$\begin{aligned} {\tilde{\rho }}(x)={\left\{ \begin{array}{ll} \bar{y}, &{} \text{ if } x=\bar{x} \\ \hat{\rho }(x), &{} \text{ if } x\ne \bar{x} \end{array}\right. } \end{aligned}$$

is an SPNE of the initial game (see [31, Theorem 3.1]).

We note that the way in which the SPNE is constructed via the method described above does not involve any task of learning step by step. Indeed, \(P_{\alpha _n}(x)\) is not recursively defined and therefore, at a given step n, neither the follower’s strategy \(\bar{\rho }_n\) is an updating of his previous strategy \(\bar{\rho }_{n-1}\) nor \(\bar{x}_n\) is an updating of \(\bar{x}_{n-1}\). Hence, the anchoring effects arising in Procedure (\({\mathcal {CM}}\)) do not appear in this framework, as well as other kinds of behavioral motivation. As a matter of fact, in general, Procedure (\({\mathcal {CM}}\)) and procedure in Morgan and Patrone [31] (adapted to maximization frameworks) do not generate the same SPNE, as shown in the next example.

Example 5

Let \(\varGamma \) be the game defined in Example 3. The SPNE constructed by using the approach in Morgan and Patrone [31] is \((1,{\tilde{\rho }})\), where

$$\begin{aligned} {\tilde{\rho }}(x)={\left\{ \begin{array}{ll} 0, &{} \text{ if } x\in [1/2,1[ \\ -1, &{} \text{ if } x\in [1,2], \end{array}\right. } \end{aligned}$$

that does not coincide with the SPNE found out in (16).

4.2 Connections with Weak and Strong Stackelberg Equilibria

In Stackelberg games where the follower’s best reply correspondence is not always single-valued, two extreme behaviors of the leader could arise regarding his beliefs about how the follower chooses inside his own set of optimal actions in response to each action chosen by the leader. In the first case, the leader is optimistic and believes that the follower chooses the best action for the leader; whereas in the second one, the leader is pessimistic and believes that the follower could choose the worst action for the leader. These behaviors lead to two widely investigated problems (originally named generalized Stackelberg problems, see [22]): the strong Stackelberg, also called optimistic Stackelberg (see, e.g., [8, 12, 15, 24, 42], and references therein), and the weak Stackelberg, also called pessimistic Stackelberg (see, e.g., [16, 23, 25, 26, 30, 45], and references therein) problems, respectively, described as follows:

$$\begin{aligned} \begin{array}{cc} \text{(s-S) }\,{\left\{ \begin{array}{ll} \max _{x\in X}\max _{y\in \mathcal {Y}(x)} L(x,y)\\ \text{ where } \mathcal {Y}(x) \text{ is } \text{ defined } \text{ in } (1), \end{array}\right. } &{} \text{(w-S) }\,{\left\{ \begin{array}{ll} \max _{x\in X}\min _{y\in \mathcal {Y}(x)} L(x,y)\\ \text{ where } \mathcal {Y}(x) \text{ is } \text{ defined } \text{ in } (1). \end{array}\right. } \end{array} \end{aligned}$$

An action profile \((x^*,y^*)\in X\times Y\) is said to be

  1. (i)

    strong Stackelberg equilibrium (or optimistic equilibrium) if

    $$\begin{aligned} x^*\in \mathop {{{\mathrm{Arg\,max}}}}\limits _{x\in X}\max _{y\in \mathcal {Y}(x)} L(x,y)\text { and }y^*\in \mathop {{{\mathrm{Arg\,max}}}}\limits _{y\in \mathcal {Y}(x^*)}L(x^*,y), \end{aligned}$$
  2. (ii)

    weak Stackelberg equilibrium (or pessimistic equilibrium) if

    $$\begin{aligned} x^*\in \mathop {{{\mathrm{Arg\,max}}}}\limits _{x\in X}\min _{y\in \mathcal {Y}(x)} L(x,y)\text { and }y^*\in \mathcal {Y}(x^*). \end{aligned}$$

Starting from a strong or a weak Stackelberg equilibrium, one could derive an SPNE according to the two different behaviors of the leader. In fact,

  1. (i)

    if the action profile \((x^*,y^*)\) is a strong Stackelberg equilibrium, then the strategy profile \((x^*,\varphi ^*)\) is an SPNE when \(\varphi ^*(x)\in {{\mathrm{Arg\,max}}}_{y\in \mathcal {Y}(x)}L(x,y)\) for any \(x\in X\);

  2. (ii)

    if the action profile \((x^*,y^*)\) is a weak Stackelberg equilibrium, then the strategy profile \((x^*,\varphi ^*)\) is an SPNE when \(\varphi ^*(x)\in {{\mathrm{Arg\,min}}}_{y\in \mathcal {Y}(x)}L(x,y)\) for any \(x\in X\).

Nevertheless, in the optimistic (resp. pessimistic) situation, the computation of strong (resp. weak) Stackelberg equilibria, and related SPNEs, would require the leader to know the best reply correspondence of the follower. Instead, an SPNE obtainable through the learning approach with costs to move described in Procedure (\({\mathcal {CM}}\)) relieves the leader of knowing the follower’s best reply correspondence. Moreover, let us note that the SPNE obtained via Procedure (\({\mathcal {CM}}\)) does not coincide, in general, with the SPNEs associated with optimistic or pessimistic equilibria. To show this fact, it is sufficient to check whether the limit \((\bar{x},\bar{y})\) of the sequence of actions \((\bar{x}_n,\varphi _n(\bar{x}_n))_n\) obtained through Procedure (\({\mathcal {CM}}\)) is a strong or a weak Stackelberg equilibrium. This lack of connection is exhibited in the following example.

Example 6

Let \(\varGamma \) be the game defined in Example 3. The follower’s best reply correspondence M is given in (14). Since for any \(x\in [{1}/{2},2]\)

$$\begin{aligned} \max _{y\in M(x)}{L(x,y)}=-x+1, \qquad \min _{y\in M(x)}{L(x,y)}={\left\{ \begin{array}{ll} -x-1, &{} \text{ if } x\in [1/2,1] \\ -x+1, &{} \text{ if } x\in ]1,2], \end{array}\right. } \end{aligned}$$

then

$$\begin{aligned} \mathop {{{\mathrm{Arg\,max}}}}\limits _{x\in [1/2,2]}{\max _{y\in \mathcal {Y}(x)}{L(x,y)}}=\{1/2\}, \qquad \mathop {{{\mathrm{Arg\,max}}}}\limits _{x\in [1/2,2]}{\min _{y\in \mathcal {Y}(x)}{L(x,y)}}=\emptyset . \end{aligned}$$

Hence, the strong Stackelberg equilibrium is the action profile \((1/2,-1)\) as \(\{-1\}={{\mathrm{Arg\,max}}}_{y\in \mathcal {Y}(1/2)}{L(1/2,y)}\). Instead, the weak Stackelberg equilibrium does not exist.

Procedure (\({\mathcal {CM}}\)) generates the sequence \((\bar{x}_n,\varphi _n)_n\) defined in (15). The sequence of actions \((\bar{x}_n,\varphi _n(\bar{x}_n))_{n\ge 2}=(1+{2}/{a_n},1)_{n\ge 2}\) converges to \((1,-1)\), which is neither a strong nor a weak Stackelberg equilibrium.

5 Conclusion

In this paper, we presented a theoretical method to construct a subgame perfect Nash equilibrium of a one-leader one-follower two-stage game by using a learning approach via costs to move. The method is based on a procedure that allows to overcome the difficulties occurring when the follower’s best reply correspondence is not single-valued. In fact, we constructed recursively a sequence of SPNEs of classical Stackelberg games whose payoff functions are obtained by subtracting to the payoff functions of the initial game a cost to move term depending on the SPNE reached at the previous step. Hence, we showed the existence of an SPNE achievable via this learning method under mild assumptions on the data of the game.

The analysis for one-leader two-follower two-stage games is presently in progress. In this case, the non-uniqueness of the parametric Nash equilibria obtained as the optimal reaction of the followers will be possibly overcome by applying a learning method based on costs to move and known results about uniqueness of Nash equilibria as Rosen [37], Ceparano and Quartieri [10] or Caruso et al. [9].

Another direction for future research is the extension of our learning method to Stackelberg differential games. In fact, starting from Chen and Cruz [11] and Simaan and Cruz [40], the literature on Stackelberg differential games has dealt essentially with situations where, for any control path chosen by the leader, the follower’s optimal control path is unique. Using a generalization of the proposed constructive procedure with costs to move, we aim to approach an SPNE even in Stackelberg differential games whose follower’s optimal control path is not uniquely determined.

Furthermore, we purpose to adapt the method presented in this paper to semivectorial bilevel optimal control problems [7] that are differential games with hierarchical play where one leader in the first stage faces a scalar optimal control problem and more followers in the second stage solve a cooperative differential game. In fact, our learning approach via costs to move could be useful to construct SPNEs when the followers’ Pareto control paths is not unique requiring only convexity assumptions, whereas in Bonnel and Morgan [7] the non-single-valuedness of the followers’ best reply correspondence is overcome in the optimistic and the pessimistic situations associated with the problem by means of some strict convexity assumptions.