Elsevier

Operations Research Letters

Volume 44, Issue 5, September 2016, Pages 575-577
Operations Research Letters

Uniform value for recursive games with compact action sets

https://doi.org/10.1016/j.orl.2016.06.005Get rights and content

Abstract

Mertens, Neyman and Rosenberg (Mertens et al., 2009) used the Mertens and Neyman theorem (Mertens and Neyman, 1981) to prove the existence of a uniform value for absorbing games with a finite state space and compact action sets. We provide an analogous proof for another class of stochastic games, recursive games with a finite state space and compact action sets. Moreover, both players have stationary ε-optimal strategies.

Introduction

Zero-sum stochastic games were introduced by  [10], and the model Γ=K,I,J,g,q is as follows: K is the finite state space, I (resp. J) is the action set for player 1 (resp. player 2), g:K×I×JR is the stage payoff function and q:K×I×JΔ(K) is a probability transition function (Δ(S) stands for the set of probabilities on a measurable set S). We assume throughout that I and J are compact metric sets and that both g and q are separately continuous on I×J (this implies their measurability, cf. I.1.Ex.7a in  [8]).

The game is played as follows. Let k1K be the initial state. At each stage t1, after observing a t-stage history ht=(k1,i1,j1,,kt1,it1,jt1,kt), player 1 chooses an action itI and player 2 chooses an action jtJ. This profile (kt,it,jt) induces a current stage payoff gtg(kt,it,jt) and a probability q(kt,it,jt) which is the law of kt+1, the state at stage t+1.

Let Ht=K×(I×J×K)t1 be the set of t-stage histories for each t1, and H=(K×I×J) be the set of infinite histories. We endow Ht with the product sigma-algebra Ht (discrete on K, Borel on I and J) and endow H=(K×I×J) with the product sigma-algebra H spanned by t1Ht. A behavior strategy σ for player 1 is a sequence σ=(σt)t1 where for each t1, σt is a measurable map from (Ht,Ht) to Δ(I). A behavior strategy τ for player 2 is defined analogously. The set of player 1’s (resp. player 2’s) behavior strategies is denoted by Σ (resp. T).

Given an initial state k1, any strategy profile (σ,τ)Σ×T induces a unique probability distribution Pσ,τk1 over the histories (H,H). The corresponding expectation is denoted by Eσ,τk1.

In a λ-discounted game Γλ (for λ(0,1]), the (global) payoff is defined as γλ(k1,σ,τ)=Eσ,τk1[t1λ(1λ)t1g(it,jt,kt)] and the corresponding (minmax) value is vλ. The n-stage game Γn (for n1) is defined analogously by taking the n-stage averaged payoff, and its value is denoted by vn.

In the finite actions setup,  [10] introduced the operator Φ where fR|K|, Φ(λ,f)(k)=valxΔ(I),yΔ(J)Ex,yk[λg(i,j,k)+(1λ)f(k)], with valxΔ(I),yΔ(J)=maxxΔ(I)minyΔ(J)=minyΔ(J)maxxΔ(I). He proved that vλ=Φ(λ,vλ) and moreover that stationary optimal strategies exist for each λ, i.e. depending at each stage t only on the current state kt. These results extend to the current framework (cf. VII.1.a in  [8]).

We are interested in long-run properties of Γ. A first notion corresponds to the existence of an asymptotic value: convergence of vλ as λ tends to zero and convergence of vn as n tends to infinity, to the same limit. Moreover, one can ask for the existence of ε-optimal strategies for both players that guarantee the asymptotic value in all sufficiently long games, explicitly:

Definition 1.1

Let wR|K|. Play 1 can guarantee w in Γ if, for any ε>0 and for every k1K, there exist N(ε)N and a behavior strategy σΣ for player 1 s.t.:   τT, (A)1nEσ,τk1[t=1ngt]w(k1)ε,nN(ε),(B)Eσ,τk1[lim infn1nt=1ngt]w(k1)ε. A similar definition holds for player 2.

v is the uniform value of Γ if both players can guarantee it.

Remark

The existence of a uniform value v implies the existence of an asymptotic value, equal to v.

For stochastic games with a finite state space and finite action sets,  [1] proved the convergence of vλ as λ tends to zero (and later deduced the convergence of vn as n tends to infinity to the same limit), relying on an algebraic argument. Using the property that the function λvλ has bounded variation, that follows from  [1], [6] proved the existence of a uniform value. Actually, Mertens and Neyman’s main result is even applicable to a stochastic game Γ with compact action sets under the following form:

Theorem 1.2 [6]

Let λwλR|K| be a function defined on (0,1]. Player  1  can guarantee lim supλ0+wλ in the stochastic game Γ if wλ satisfies:

  • (i)

    for some integrable function ϕ:(0,1]R+, wλwλλλϕ(x)dx,λ,λ(0,1), λ<λ ;

  • (ii)

    for every λ(0,1) sufficiently small, Φ(λ,wλ)wλ.

Remark

In the construction of an ε-optimal strategy in  [6], wλ is taken to be vλ, so condition (i) is implied by the bounded variation property of vλ and condition (ii) is implied by Eq. (2).

Below we focus on two important classes of stochastic games: absorbing games and recursive games.

An absorbing state is such that the probability of leaving it is zero. Without loss of generality one assumes that at any absorbing state, the payoff is constant (equal to the value of the static game to be played after absorption), as long as one states that both players are informed of the current state.

A stochastic game Γ is an absorbing game if all states but one are absorbing.  [7] used Theorem 1.2 to prove the existence of a uniform value for absorbing games with a finite state space and compact action sets, extending a result of  [4] for the finite actions case.

Recursive games, introduced by  [3], are stochastic games where the stage payoff is zero in all nonabsorbing states.

This note proves the existence of a uniform value for recursive games with a finite state space and compact action sets, using an approach analogous to  [7] for absorbing games. Moreover, due to the specific payoff structure, we show that ε-optimal strategies in recursive games can be taken stationary. This is not the case for a general stochastic game, in which an ε-optimal strategy has to be usually a function of the whole past history, even in the finite actions case (cf.  [2] for the “Big match” as an example).

[3] proved the existence of stationary ε-optimal strategies for the “limiting-average value” (property (B) in Definition 1.1). As our proof relies on his characterization of this value (and on its existence), we describe here the result.

Given SRd, let S¯ denote its closure.

Let K0K be the set of nonabsorbing states.

Φ(0,) refers to the operator Φ(λ,) with λ=0 in Eq. (1).

When working with the operator Φ(0,) or Φ(λ,), it is sufficient to consider those vectors uRK identical to the absorbing payoffs on the absorbing states KK0. Whenever no confusion is caused, we identify u with its projection on R|K0|.

Theorem 1.3 [3]

A recursive game Γ has a limiting-average value v and both players have stationary ε-optimal strategy, in the sense that ε>0, there are stationary strategies (σ,τ)Σ×T s.t.: for any (σ,τ)Σ×T,Eσ,τk1[lim infn1nt=1ngt]v(k1)εandEσ,τk1[lim supn1nt=1ngt]v(k1)+ε.Moreover, the limiting-average value v is characterized by {v}=ξ+¯ξ¯, whereξ+={uR|K0|:Φ(0,u)u,   and   Φ(0,u)(k)>u(k)   whenever   u(k)>0},ξ={uR|K0|:Φ(0,u)u,   and   Φ(0,u)(k)<u(k)   whenever   u(k)<0}.

[3]’s proof of the above result consists of the following two arguments: first, any vector uξ+ (resp. uξ) can be guaranteed by player 1 (resp. player 2); second, the intersection of ξ+¯ and ξ¯ is nonempty.

Section snippets

Main results and the proof

We prove that v (as characterized in Theorem 1.3) is also the uniform value of Γ, and that players have stationary ε-optimal strategies.

Theorem 2.1

A recursive game has a uniform value. Moreover, both players can guarantee the uniform value in stationary strategies.

Remark

We emphasize that our definition of uniform value includes that of limiting-average value, thus our results extend  [3] to a much stronger set-up.

Proof

We first prove that v is the uniform value of Γ using Theorem 1.2. Let u be any vector in ξ+. An

Concluding remarks

  • 1.

    To have a better understanding of the existence of stationary ε-optimal strategy in recursive games, one can compare our construction to the one in  [7] for absorbing games. Indeed, they have also chosen wλ to be some constant function u. However, there is no such equality Φ(λ,u)=(1λ)Φ(0,u) for absorbing games, so the optimal strategy xλt(wλt,kt) at each stage t for Φ(λt,u) depends on λt, hence on the whole history. On the other hand, choosing wλ as vλ (like in  [13]) will induce strategies

Acknowledgments

The authors are grateful to Guillaume Vigeral for helpful comments. Part of Xiaoxi Li’s research is done when he was an ATER (teaching and research fellow) at THEMA, Université Cergy-Pontoise during the academic year 2015–2016.

References (14)

  • T. Bewley et al.

    The asymptotic theory of stochastic games

    Math. Oper. Res.

    (1976)
  • D. Blackwell et al.

    The big match

    Ann. Math. Statist.

    (1968)
  • H. Everett

    Recursive games

  • E. Kohlberg

    Repeated games with absorbing states

    Ann. Statist.

    (1974)
  • X. Li et al.

    Recursive games: uniform value, Tauberian theorem and the Mertens conjecture “Maxmin=limnvn=limλ0vλ

    Internat. J. Game Theory

    (2016)
  • J.-F. Mertens et al.

    Stochastic games

    Internat. J. Game Theory

    (1981)
  • J.-F. Mertens et al.

    Absorbing games with compact action spaces

    Math. Oper. Res.

    (2009)
There are more references available in the full text version of this article.

Cited by (0)

View full text