Uniform value for recursive games with compact action sets
Introduction
Zero-sum stochastic games were introduced by [10], and the model is as follows: is the finite state space, (resp. ) is the action set for player 1 (resp. player 2), is the stage payoff function and is a probability transition function ( stands for the set of probabilities on a measurable set ). We assume throughout that and are compact metric sets and that both and are separately continuous on (this implies their measurability, cf. I.1.Ex.7a in [8]).
The game is played as follows. Let be the initial state. At each stage , after observing a -stage history , player 1 chooses an action and player 2 chooses an action . This profile induces a current stage payoff and a probability which is the law of , the state at stage .
Let be the set of -stage histories for each , and be the set of infinite histories. We endow with the product sigma-algebra (discrete on , Borel on and ) and endow with the product sigma-algebra spanned by . A behavior strategy for player 1 is a sequence where for each , is a measurable map from to . A behavior strategy for player 2 is defined analogously. The set of player 1’s (resp. player 2’s) behavior strategies is denoted by (resp. ).
Given an initial state , any strategy profile induces a unique probability distribution over the histories . The corresponding expectation is denoted by .
In a -discounted game (for ), the (global) payoff is defined as and the corresponding (minmax) value is . The -stage game (for ) is defined analogously by taking the -stage averaged payoff, and its value is denoted by .
In the finite actions setup, [10] introduced the operator where , with . He proved that and moreover that stationary optimal strategies exist for each , i.e. depending at each stage only on the current state . These results extend to the current framework (cf. VII.1.a in [8]).
We are interested in long-run properties of . A first notion corresponds to the existence of an asymptotic value: convergence of as tends to zero and convergence of as tends to infinity, to the same limit. Moreover, one can ask for the existence of -optimal strategies for both players that guarantee the asymptotic value in all sufficiently long games, explicitly: Definition 1.1 Let . Play 1 can guarantee in if, for any and for every , there exist and a behavior strategy for player 1 s.t.: , A similar definition holds for player 2. is the uniform value of if both players can guarantee it.
Remark The existence of a uniform value implies the existence of an asymptotic value, equal to . For stochastic games with a finite state space and finite action sets, [1] proved the convergence of as tends to zero (and later deduced the convergence of as tends to infinity to the same limit), relying on an algebraic argument. Using the property that the function has bounded variation, that follows from [1], [6] proved the existence of a uniform value. Actually, Mertens and Neyman’s main result is even applicable to a stochastic game with compact action sets under the following form:
Theorem 1.2 [6] Let be a function defined on . Player 1 can guarantee in the stochastic game if satisfies: for some integrable function , , ; for every sufficiently small, .
Remark In the construction of an -optimal strategy in [6], is taken to be , so condition (i) is implied by the bounded variation property of and condition (ii) is implied by Eq. (2).
Below we focus on two important classes of stochastic games: absorbing games and recursive games.
An absorbing state is such that the probability of leaving it is zero. Without loss of generality one assumes that at any absorbing state, the payoff is constant (equal to the value of the static game to be played after absorption), as long as one states that both players are informed of the current state.
A stochastic game is an absorbing game if all states but one are absorbing. [7] used Theorem 1.2 to prove the existence of a uniform value for absorbing games with a finite state space and compact action sets, extending a result of [4] for the finite actions case.
Recursive games, introduced by [3], are stochastic games where the stage payoff is zero in all nonabsorbing states.
This note proves the existence of a uniform value for recursive games with a finite state space and compact action sets, using an approach analogous to [7] for absorbing games. Moreover, due to the specific payoff structure, we show that -optimal strategies in recursive games can be taken stationary. This is not the case for a general stochastic game, in which an -optimal strategy has to be usually a function of the whole past history, even in the finite actions case (cf. [2] for the “Big match” as an example).
[3] proved the existence of stationary -optimal strategies for the “limiting-average value” (property in Definition 1.1). As our proof relies on his characterization of this value (and on its existence), we describe here the result.
Given , let denote its closure.
Let be the set of nonabsorbing states.
refers to the operator with in Eq. (1).
When working with the operator or , it is sufficient to consider those vectors identical to the absorbing payoffs on the absorbing states . Whenever no confusion is caused, we identify with its projection on .
Theorem 1.3 [3] A recursive game has a limiting-average value and both players have stationary -optimal strategy, in the sense that , there are stationary strategies s.t.: for any ,Moreover, the limiting-average value is characterized by , where
Section snippets
Main results and the proof
We prove that (as characterized in Theorem 1.3) is also the uniform value of , and that players have stationary -optimal strategies. Theorem 2.1 A recursive game has a uniform value. Moreover, both players can guarantee the uniform value in stationary strategies.
Remark We emphasize that our definition of uniform value includes that of limiting-average value, thus our results extend [3] to a much stronger set-up.
Proof We first prove that is the uniform value of using Theorem 1.2. Let be any vector in . An
Concluding remarks
- 1.
To have a better understanding of the existence of stationary -optimal strategy in recursive games, one can compare our construction to the one in [7] for absorbing games. Indeed, they have also chosen to be some constant function . However, there is no such equality for absorbing games, so the optimal strategy at each stage for depends on , hence on the whole history. On the other hand, choosing as (like in [13]) will induce strategies
Acknowledgments
The authors are grateful to Guillaume Vigeral for helpful comments. Part of Xiaoxi Li’s research is done when he was an ATER (teaching and research fellow) at THEMA, Université Cergy-Pontoise during the academic year 2015–2016.
References (14)
- et al.
The asymptotic theory of stochastic games
Math. Oper. Res.
(1976) - et al.
The big match
Ann. Math. Statist.
(1968) Recursive games
Repeated games with absorbing states
Ann. Statist.
(1974)- et al.
Recursive games: uniform value, Tauberian theorem and the Mertens conjecture “”
Internat. J. Game Theory
(2016) - et al.
Stochastic games
Internat. J. Game Theory
(1981) - et al.
Absorbing games with compact action spaces
Math. Oper. Res.
(2009)