Elsevier

Expert Systems with Applications

Volume 46, 15 March 2016, Pages 474-484
Expert Systems with Applications

Convergence analysis for pure stationary strategies in repeated potential games: Nash, Lyapunov and correlated equilibria

https://doi.org/10.1016/j.eswa.2015.11.006Get rights and content

Highlights

  • We formally introduce Lyapunov games for Markov chains.

  • We provide the convergence analysis for pure stationary strategies in Lyapunov games.

  • We provide an algorithm for the numerical realization of the best-reply strategy.

  • We prove under mild assumptions that the Nash, Lyapunov and Correlated equilibria coincide.

Abstract

In game theory the interaction among players obligates each player to develop a belief about the possible strategies of the other players, to choose a best-reply given those beliefs, and to look for an adjustment of the best-reply and the beliefs using a learning mechanism until they reach an equilibrium point. Usually, the behavior of an individual cost-function, when such best-reply strategies are applied, turns out to be non-monotonic and concluding that such strategies lead to some equilibrium point is a non-trivial task. Even in repeated games the convergence to a stationary equilibrium is not always guaranteed. The best-reply strategies analyzed in this paper represent the most frequent type of behavior applied in practice in problems of bounded rationality of agents considered within the Artificial Intelligence research area. They are naturally related with the, so-called, fixed-local-optimal actions or, in other words, with one step-ahead optimization algorithms widely used in the modern Intelligent Systems theory.

This paper shows that for an ergodic class of finite controllable Markov games the best-reply strategies lead necessarily to a Lyapunov/Nash equilibrium point. One of the most interesting properties of this approach is that an expedient (or absolutely expedient) behavior of an ergodic system (repeated game) can be represented by a Lyapunov-like function non-decreasing in time. We present a method for constructing a Lyapunov-like function: the Lyapunov-like function replaces the recursive mechanism with the elements of the ergodic system that model how players are likely to behave in one-shot games. To show our statement, we first propose a non-converging state-value function that fluctuates (increases and decreases) between states of the Markov game. Then, we prove that it is possible to represent that function in a recursive format using a one-step-ahead fixed-local-optimal strategy. As a result, we prove that a Lyapunov-like function can be built using the previous recursive expression for the Markov game, i.e., the resulting Lyapunov-like function is a monotonic function which can only decrease (or remain the same) over time, whatever the initial distribution of probabilities. As a result, a new concept called Lyapunov games is suggested for a class of repeated games. Lyapunov games allow to conclude during the game whether the applied strategy provides the convergence to an equilibrium point (or not). The time for constructing a Potential (Lyapunov-like) function is exponential. Our algorithm tractably computes the Nash, Lyapunov and the correlated equilibria: a Lyapunov equilibrium is a Nash equilibrium, as well it is also a correlated equilibrium. Validity of the proposed method is successfully demonstrated both theoretically and practically by a simulated experiment related to the Duel game.

Introduction

An agent blindly takes directives without thinking: it is an entity capable executing particular tasks without explicit instruction. An agent is considered intelligent if it can be autonomous, flexible, and social. To behave intelligently an agent requires decision making. In general, artificial intelligence (AI) makes emphasis in methods related to machine learning, knowledge representation and reasoning, decision making under uncertainty, planning, and other well-studied areas. Game theory attempts to model the principles of rational interaction among players. It is the same goal of the modern AI research which is focusing on studying modern multiagent intelligent systems and how to represent intelligent behavior. In repeated games the interaction among players obligates each agent to look for an adjustment of the best-reply strategies and the beliefs using a learning mechanism until they reach an equilibrium point. The best-reply strategy approach is frequently applied, for example, in repeated games related to intelligent systems such as “Nim game” , “Three Cards two persons game” , “Red-Black card” , “Russian Roulette” , “A Pursuit game” , “Fighter-Bomber Duel” , “Simplified 2-Person Poker” , “United Nations Security Council” , “Bargaining game” , “Battle of the Sexes” (see (Jones, 1980)) and “ Duoligopolistic Market” , “Taxation and Provision of Government Service” , “Oil Price Arrangement” , “Capitalist Worker Treatment” , “Consumption Stock Pollution Equilibrium” , “Innovation Production Dilemma (R&D competition)” , “ Nonrenewable Resources: The Doomsday Problem” (see (Dockner, Jorgensen, Van Long, & Sorger, 2000)). The fundamental problem facing best-reply dynamics is that it does not predict how players arrive at an equilibrium point.

The process of finding an equilibrium point can be justified as a mathematical shortcut represented by the result of a learning algorithm (Fudenberg, Levine, 1999, Poznyak, Najim, Gomez-Ramirez, 2000) or an evolutionary process. But, the learning or evolutionary justifications logically imply that beliefs and choices will not be consistent if players do not have time to learn or evolve.

A realization of any rational (expedient) strategy in a conflict situation (or game) is naturally related with its execution by a computer algorithm which is the heart of the Artificial Intelligence area. What is Artificial Intelligence? It is the search for a way to map intelligence into mechanical hardware and enable a structure into that intelligent system to formalize thought. Following Russell and Norvig (1995) Artificial Intelligence is the study of human intelligence and actions replicated artificially, such that the resultant bears to its design a reasonable level of rationality. The best-reply strategies analyzed in this paper represent the most frequent type of an artificial intelligence algorithm applied in practice and realized within bounded rationality.

The best-reply dynamics results in a natural implementation of the behavior of a Lyapunov-like function. The dynamics begins by choosing an arbitrary strategy profile of the players (Myerson, 1978, Nash, 1951, Nash, 1996, Nash, 2002, Selten, 1975). Then, in each step of the process some player exchanges his strategy to be the best-reply to the current strategies of the other players. A Lyapunov-like function monotonically decreases and it results in the elimination of a strictly-dominated strategy from the strategy space. As a consequence, the complexity of the problem is reduced. In the next step, are eliminated the strategies that survived the first elimination round and are not best-reply to some strategy profile, and so forth. This process ends when the best-reply (Lyapunov-like function) converges to a Lyapunov equilibrium point. Therefore, a Lyapunov game has also the benefit that it is common knowledge of the players that only best-reply are chosen. By the natural evolution of a Lyapunov-like function a strategy played once is not played again, no matter what.

The best-known solution concept of the best-reply dynamics is the Nash equilibrium (Nash, 1951, Nash, 1996, Nash, 2002), in which each player chooses a randomized strategy, and each player is not able to increase her/his expected utility by unilaterally deviating to a different strategy. The correlated equilibrium (Aumann, 1974, Aumann, 1987) is an alternative solution concept. While in a Nash equilibrium players randomize independently, in a correlated equilibrium players are allowed to coordinate their behavior “based on signals” from an intermediary. Applied mathematicians, operation researchers, electrical engineers and mathematical economists have studied the computation of solution concepts since the early days of game theory (Goldberg, Papadimitriou, 2006, Govindan, Wilson, 2003, Jiang, Leyton-Brown, 2015, Jiang, Leyton-Brown, Bhat, 2011, van der Laan, Talman, van der Heyden, 1987, Lemke, Howson, 1964, von Neumann, Morgenstern, 1944, Papadimitriou, 2005, Papadimitriou, Roughgarden, 2008, Papadimitriou, Roughgarden, Scarf, 1967) .

Potential games were introduced by Monderer and Shapley (1996). However, several definitions of potential games have been introduced in the literature. Voorneveld (2000) suggested the best-reply potential games allowing infinite improvement paths by imposing restrictions only on paths in which players that can improve actually deviate to a best-reply. Dubey, Haimanko, and Zapechelnyuk (2006) presented the notions of pseudo-potential games. All these classes of potential games start with an arbitrary strategy profile, and using a single real-valued function on the strategy space a player that can improve deviate to a better strategy. The iteration process converges to a Nash equilibrium point. Potential games embrace many practical application domains including dominance-solvable games, routing games and shortest-path games (Engelberg, Schapira, 2011, Fabrikant, Jaggard, Schapira, 2013, Fabrikant, Papadimitriou, 2008). In general, all the classes of potential games reported in the literature are contained into the definition of Lyapunov games.

In this paper we show that for a ergodic class of finite controllable Markov chains games the best-reply strategies lead to one of the Lyapunov/Nash equilibrium points obligatory. As well, we show that the Lyapunov/Nash equilibrium point solution is a correlated equilibrium. This conclusion is done by the Lyapunov Games concept which is based on the designing of an individual Lyapunov function (related with an individual cost function) which monotonically decreases (non-increases) during the game.

In Lyapunov games (Clempner, 2006, Clempner, Poznyak, 2011, Clempner, Poznyak, 2015) a natural existence of the equilibrium point is ensured by definition. Clempner (2015) suggested that the stability conditions and the equilibrium point properties of Cournot and Lyapunov meet in potential games. In general, convergence to an equilibrium point is also guaranteed to exist. A Lyapunov-like function monotonically decreases and converges to a Lyapunov equilibrium point tracking the state-space in a forward direction. The best-reply dynamics result in a natural implementation of the behavior of a Lyapunov-like function. As a result, a Lyapunov game has also the benefit that it is common knowledge of the players that only best-reply is chosen. In addition, a Lyapunov equilibrium point presents properties of stability that are not necessarily presented in a Nash equilibrium point.

A game is said to be stable with respect to a set of strategies if the iterated process of strategies (Guesnerie, 1996, Hofbauer, Sandholm, 2009, Pearce, 1984, Tan, Costa Da Werlang, 1988) (in our case, the best-reply dynamics) selection converges to an equilibrium point, without considering what are the initial strategies the players start with. To converge to an equilibrium point every player selects his/her strategies by optimizing his/her individual cost function looking at the available strategies of other players (Brgers, 1993, Hilas, Jansen, Potters, Vermeulen, 2003, Osborne, Rubinstein, 1994). Any deviation from such an equilibrium point would return back to the same equilibrium point. This is because the natural evolution of the iterated process of strategies selection that tries to follow the optimal strategies and rectifies the trajectory to reach a stable equilibrium point (this is the case when the equilibrium point is unique) (Bernheim, 1984, Moulin, 1984, Osborne, Rubinstein, 1994, Pearce, 1984). In this sense, we can state that a Lyapunov equilibrium point is a strategy once being in the stable state of the strategies choices it is no player’s interest to unilaterally change strategy. An important advantage of the Lyapunov games is that every ergodic system can be represented by a Lyapunov-like function. For a repeated (ergodic) game a recursive mechanism is implemented to justify an equilibrium play (Clempner, Poznyak, 2011, Clempner, Poznyak, 2013). If the ergodic process of the stochastic game converges, then we have reached an equilibrium point, and moreover, a highly justifiable one (Poznyak et al., 2000).

We present a method for the construction of a Lyapunov-like function (with a monotonic behavior) that has a one-to-one relationship with a given cost-function. Being bounded from below, a decreasing Lyapunov-like function provides the existence of an equilibrium point for the applied pure and stationary local-optimal strategies (Gimbert, Zielonka, 2009, Gimbert, Zielonka, 2012) and, besides, ensures the convergence of the cost-function to a minimal value (Clempner & Poznyak, 2011). The resulting vector Lyapunov-like function is a monotonic function whose components can only decrease over time. As a result, a repeated game may be represented by a one-shot game. It is important to note that in our case, the problem becomes more complicated to justify because repeated games are transformed in one-shot games replacing the recursive mechanism by a Lyapunov-like function.

The Lyapunov-like functions are used as forward trajectory-tracking functions. Each applied local-optimal action produces a monotonic progress toward the equilibrium point. Tracking the state-space in a forward direction allows the decision maker to avoid invalid states that occur in the space generated by a backward search. In most cases (when probabilistic characteristics are unknown or incomplete (Poznyak et al., 2000)), the forward search gives the impression of being more useful than the backward search. The explanation is that in the backward direction, when the case of incomplete final states arises, invalid states appear, which cause obvious problems. Certainly, a feed-forward strategy cannot guarantee that the global minimization process is achieved: it usually leads to a local optimal solution. But in many practical situations (such as the weights-adjustment process in Neural Networks (Poznyak, Sanchez, & Yu, 2001) or in Petri-nets (Murata, 1989)) such strategies significantly improve the behavior of the controlled Markov process.

We will investigate the class of the, so-called, pure and stationary local-optimal policies (strategies). Such strategies realizes a local (one-step) predicted optimization assuming that the past history Fn (states (s) and actions (a)) cannot be changed evermore: a policy {dn}n ≥ 0 is said to be local-optimal if it minimizes the conditional mathematical expectation of the cost-function V(sn+1) such that argmindnE{Vl(sn+1)Fn}. The behavior of an individual cost-function, when such strategies are applied, turns out to be non-monotonic and, as a result, to make the conclusion that such strategies lead to some equilibrium point (usually, the Nash equilibrium (Goemans, Mirrokni, & Vetta, 2005) ) is a hard task requiring a special additional analysis. Even in repeated games, the convergence to a stationary equilibrium is not always guaranteed (see (Chen, Deng, 2006, Daskalakis, Goldberg, Papadimitriou, 2006)).

In summary, this paper makes the following contributions:

  • 1.

    we show that the behavior of the cost sequence corresponding to the local-optimal (best-reply) strategy, has a non-monotonic character that does not permit to prove exactly the existence of a limit point;

  • 2.

    we suggest a “one-to-one” mapping between the current cost-function and a new “energy function” (Lyapunov-like function) which is monotonically non-increasing on the trajectories of the system under the local-optimal (best-reply) strategy application;

  • 3.

    we change the classical behavior of a repeated game for a Potential game in terms of the Lyapunov theory;

  • 4.

    we show that a Lyapunov equilibrium point is a Nash equilibrium point, but in addition it also presents several advantages: (a) a natural existence of the equilibrium point is ensured by definition, (b) a Lyapunov-like function can be constructed to respect the constraints imposed by the Markov game, (c) a Lyapunov-like function definitely converges to a Lyapunov equilibrium point, and (d) a Lyapunov equilibrium point presents properties of stability;

  • 5.

    as well, we prove that the Lyapunov equilibrium point is also a correlated equilibrium;

  • 6.

    the convergence of the pure and stationary local-optimal (best-reply) strategy is also obtained for a class of ergodic controllable finite Markov chains;

  • 7.

    we provide an algorithm in terms of an analytical formula for the numerical realization of the local-optimal (best-reply) strategy and we also analyze the complexity of the algorithm.

The paper is structured in the following manner. The next section introduces the necessary mathematical background and terminology needed to understand the rest of the paper. Section 3 suggests the formulation of the decision model where all the structural assumptions are introduced, giving a detailed analysis of the game. A method for the construction of a Lyapunov-like function as well the analysis of the convergence is described in Section 4, which is the main result of this paper. Section 5 presents the proves of coincidence of the Lyapunov equilibrium with the Nash and the correlated equilibria. Section 6 presents a simulated experiments related to the repeated Duel game. Finally, in Section 7 some concluding remarks and future work are outlined.

Section snippets

Markov chains games

As usual let the set of real numbers be denoted by R and let the set of non-negative integers be denoted by N. The inner product for two vectors u, v in Rn is denoted by u,v=vTu. Let Sbe a finite set, called the state space, consisting of all positive integers NN of states {s(1),,s(N)}. A Stationary Markov chain (Clempner & Poznyak, 2014) is a sequence of S -valued random variables sn, nN, satisfying the Markov condition: P(s(n+1)=s(j)|s(n)=s(i),s(n1)=s(in1),,s(1)=s(i1))=P(s(n+1)=s(j)|s(

Problem formulation

To tackle this problem we proposed representing the state-value function V using a linear (with respect to the control dΔ) model. After that we obtain the policy d that results in the minimum trajectory value. Finally, we present V in a recursive matrix format.

Lyapunov-like function analysis and design

The aim of this section is to associate to any cost function Vnl, governed by (17), a Lyapunov-like function which monotonically decreases (non-increases) on the trajectories of the given system.

Nash, Lyapunov and correlated equilibria

Definition 13

A Lyapunov game is a tuple G=N,S,(Δl)lN,(Υl)lN,Π,(Vl)lNwhere Vl is a Lyapunov-like function (monotonically decreasing in time).

Theorem 14

Let G=N,S,(Δl)lN,(Υl)lN,Π,(Vl)lNbe a Lyapunov game. Suppose the players make their decision given any individually rational strategy. Then, there exists a Lyapunov strategy that is a Nash equilibrium.

Proof

Let suppose that d* is a Lyapunov equilibrium point. It can be shown that in this Lyapunov equilibrium, the payoff for player l would be lower or equal than V(i1

Numerical examples

In this example we consider the repeated “Duel game” where Player I and Player II each have a gun loaded with exactly one bullet and stand 10 steps apart. Starting with Player I, they take turns deciding whether to fire or not. Each time a player chooses not to fire, the other player takes one step forward before choosing whether to fire in turn. In other words, they start 10 steps apart facing each other and Player I decides whether to take a shot at Player II. If Player I does not, Player II

Conclusion

This paper is a theoretical and practical contribution to feed-forward repeated Markov games. The proposed optimization framework and formalism provides a significant difference in the conceptualization of the problem domain. By the introduction of a Lyapunov-like function as a solution concept for a Markov game we propose a model that is natural, guarantees the existence of an equilibrium point and it is computationally tractable. The proposed method solves a game via the elimination of

References (53)

  • ChenX. et al.

    Setting the complexity of 2-player nash equilibrium

    Proceedings of ieee focs

    (2006)
  • ClempnerJ.B.

    Modeling shortest path games with petri nets: a lyapunov based theory

    International Journal of Applied Mathematics and Computer Science

    (2006)
  • ClempnerJ.B.

    Setting cournot vs. lyapunov games stability conditions and equilibrium point properties

    International Game Theory Review

    (2015)
  • ClempnerJ.B. et al.

    Convergence method, properties and computational complexity for lyapunov games

    International Journal of Applied Mathematics and Computer Science

    (2011)
  • ClempnerJ.B. et al.

    Analysis of best-reply strategies in repeated finite markov chains games

    Ieee conference on decision and control

    (2013)
  • ClempnerJ.B. et al.

    Simple computing of the customer lifetime value: a fixed local-optimal policy approach

    Journal of Systems Science and Systems Engineering

    (2014)
  • ClempnerJ.B. et al.

    Stackelberg security games: computing the shortest-path equilibrium

    Expert Systems With Applications

    (2015)
  • DaskalakisC. et al.

    The complexity of computing a nash equilibrium

    Proceedings of acm stoc

    (2006)
  • DocknerE.J. et al.

    Differential games in economics and management science

    (2000)
  • DubeyP. et al.

    Strategic complements and substitutes, and potential games

    Games and Economics Behavior

    (2006)
  • EngelbergR. et al.

    Weakly-acyclic (internet) routing games

    (2011)
  • FabrikantA. et al.

    The complexity of game dynamics: Bgp oscillations, sink equilibria, and beyond

    Acm-siam symposium on discrete algorithms (soda)

    (2008)
  • FudenbergD. et al.

    The theory of learning in games

    (1999)
  • GimbertH. et al.

    Pure and Stationary Optimal Strategies in Perfect-Information Stochastic Games

    Technical Report

    (2009)
  • GimbertH. et al.

    Backwell-optimal strategies in priority mean-payoff games

    International Journal of Foundations of Computer Science

    (2012)
  • GoemansM. et al.

    Sink equilibria and convergence

    Proceedings of the 46th ieee symposium on foundations of computer science

    (2005)
  • Cited by (20)

    • Reveling misleading information for defenders and attackers in repeated Stackelberg Security Games

      2022, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      The work comes to a close with some closing notes in Section 7. In this section, we construct the relation of the defenders and attackers using a Stackelberg game (Trejo et al., 2015b; Clempner and Poznyak, 2016a). The defenders consider what the best-reply of the attackers and then, they commit to a randomized strategy that maximizes their social welfare, anticipating the predicted response of the attackers.

    • A nucleus for Bayesian Partially Observable Markov Games: Joint observer and mechanism design

      2020, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      □ We consider the Nash equilibrium problem (Clempner and Poznyak, 2015, 2016). To this end, we first recall the definition of the (standard) Nash equilibrium problem.

    • Measuring the emotional state among interacting agents: A game theory approach using reinforcement learning

      2018, Expert Systems with Applications
      Citation Excerpt :

      Emotional compatibility detection has attracted high interest in social media because emotional information is a main component of human communication. Artificial Intelligence (AI) is the study of human intelligence and actions replicated artificially, such that the resultant bears to its design a reasonable level of rationality (Clempner & Poznyak, 2016b; 2017). However, rationality alone is insufficient for successful applications.

    • Multiobjective Markov chains optimization problem with strong Pareto frontier: Principles of decision making

      2017, Expert Systems with Applications
      Citation Excerpt :

      One interesting technical challenge is that of extending the regularization method for the Lagrange principle. Moreover, the main future goal is to apply this technique to game theory in terms to ensure the existence of a unique equilibrium point (Clempner & Poznyak, 2016b). An interesting empirical challenge would be to run a long-term real controlled experiment and evaluate the behavior of the regularized solution proposed in this paper.

    • Transfer Pricing as Bargaining

      2024, Studies in Systems, Decision and Control
    • Joint Observer and Mechanism Design

      2024, Studies in Systems, Decision and Control
    View all citing articles on Scopus
    View full text