Using temporal-difference learning for multi-agent bargaining

https://doi.org/10.1016/j.elerap.2007.04.001Get rights and content

Abstract

This research treats a bargaining process as a Markov decision process, in which a bargaining agent’s goal is to learn the optimal policy that maximizes the total rewards it receives over the process. Reinforcement learning is an effective method for agents to learn how to determine actions for any time steps in a Markov decision process. Temporal-difference (TD) learning is a fundamental method for solving the reinforcement learning problem, and it can tackle the temporal credit assignment problem. This research designs agents that apply TD-based reinforcement learning to deal with online bilateral bargaining with incomplete information. This research further evaluates the agents’ bargaining performance in terms of the average payoff and settlement rate. The results show that agents using TD-based reinforcement learning are able to achieve good bargaining performance. This learning approach is sufficiently robust and convenient, hence it is suitable for online automated bargaining in electronic commerce.

Introduction

Bargaining and negotiation are generally used interchangeably to describe the interaction between two or more parties attempting to agree on a mutually acceptable outcome from their negatively correlated preferences [1]. The application of agent technologies to automated bargaining emerges from the growing interest in applying software agents to remove obstacles hindering the success of electronic commerce [2]. Effort has been spent on designing agents that can adopt dynamic strategies for automated negotiation. They used rule-based [3], [4], [5], [6], machine learning [7], [8], [9], [10], [11], [12], [13], [14], or hybrid approaches [15], [16]. However, these approaches may not be sufficiently robust and convenient for most sellers and buyers in e-commerce transactions. For example, users must define bargaining rules when using rule-based approaches, or must encode each proposal in a bit string (chromosome) and model a fitness function for proposal evaluation when using a genetic algorithm [7], [8], [9], [10]. With reference to Bayesian probability approaches [11], [12] or case-based reasoning approaches [13], [14], bargaining performance decreases when few bargaining experiences or similar cases exist.

This research aims to design agents on behalf of a seller or a buyer to deal with online bilateral bargaining over price. An agent can observe the prices offered by their opponent but they cannot know their opponent’s reservation prices. In Game Theory, this bargaining game is a dynamic game of incomplete information. The seller agent wants the price to be high, whereas the buyer agent wants it to be low. The seller has a reservation price srp and the buyer has a reservation price brp. If the final-contract price p is greater than srp, p-srp is the seller’s surplus. If p is less than brp, brp-p is the buyer’s surplus. An agreement can only be reached if brp > srp. The agents want to maximize their masters’ surpluses in the bargaining process.

This research treats the bargaining process as a Markov decision process in which an agent perceives distinct states in the bargaining process and decides actions to respond to them. At each discrete time step, the agent can sense the current state and choose an action to perform. After that, the environment responds with a transition to the next state and rewards the agent, which indicates the desirability of the succeeding state. The agent’s goal is to learn the optimal policy to maximize its total rewards. Reinforcement learning is a machine learning technique and best fits this process when the numbers of states and actions are finite [17], [18]. The advantages of this technique are that an agent can learn from its own experience rather than from examples provided by a knowledgeable supervisor. Moreover, an agent can learn from simulation games if it has no real-world experience. For example, Gerry Tesauro’s TD-Gammon program learns by playing backgammon games against itself, and from this learning experience, it can play as well as the best human players [19]. Temporal-difference (TD) learning is a novel method of reinforcement learning. Agents using TD methods can learn directly from raw experience without a model of the environment’s dynamics, and these methods update estimates based in part on other learned estimates without waiting for a final outcome [20]. This research adopts TD-based reinforcement learning to design bargaining agents which can learn how to offer and counteroffer on their own, and conducts several simulation games to test these agents’ bargaining performance. We expect that using TD-based reinforcement learning is not only a robust and convenient way for online bargaining, but it also achieves a high bargaining performance in terms of average payoff and settlement rate.

Section snippets

Markov decision process

A Markov decision process (MDP) is a stochastic decision process on a discrete time Markov chain, where the decisions of each epoch and the returns, are associated with each state a decision maker has observed [21]. In a MDP, a decision maker perceives a set S of distinct states of their environment and has a set A of actions that they can perform. At each discrete time step t, the decision maker senses the current state st, chooses an action at from its set of actions, and performs it. The

TD-based bargaining agent

An agent should calculate its own utility and perceive its opponent’s bargaining power along the bargaining process to determine an effective bargaining strategy. An agent’s utility can be calculated according to its reservation price, its current offered price (agent’s position) and the opponent’s current offered price (opponent’s position) [25]. The perceived opponent’s bargaining power can be estimated by analyzing an opponent’s concession behavior [26], e.g. analyzing an opponent’s average

Experimental design

In the electronic commerce environment, a seller may delegate a software agent on a Web site to bargain with numerous human buyers. We would like to understand how a TD-based bargaining agent acts for a seller to bargain with buyers who have different risk-attitudes. A seller agent formulates its price position as (spt  srp)/(sp1  srp), where srp is the seller agent’s reservation price, spt is the seller agent’s offer at time step t, and sp1 is the seller agent’s initial offer. A seller agent

Experimentation

The seller and buyer agents are built on the Java Agent Development framework (JADE); http://jade.cselt.it) platform that complies with the FIPA (Foundation for Intelligent Physical Agents; http://www.fipa.org) specifications. The TD-Bargain mechanism fixes the discount factor γ = 1 because the interaction intervals in a bargaining lesson are very short. The parameters of the neural network are set as learning rate 0.25, momentum 0.05, and the initial weights are generated randomly. The

Conclusions and future research

This study proposes using bargaining agents gifted with a TD-based reinforcement learning mechanism to perform bilateral bargaining over price with incomplete information. We use back-propagation neural networks to implement the Q-functions for learning dynamic strategies. The λ parameter controls the temporal credit assignment by determining how an error detected at a given time step feeds back to correct previous predictions. Four sets of bargaining games are designed to measure the agents’

References (33)

  • S. Guan et al.

    A factory-based approach to support e-commerce agent fabrication

    Electronic Commerce Research and Applications

    (2004)
  • S. Kraus et al.

    Reaching agreements through argumentation: A logical model and implementation

    Artificial Intelligence Journal

    (1998)
  • M. Dumas et al.

    A formal approach to negotiating agents development

    Electronic Commerce Research and Applications

    (2002)
  • W.C. Hammer et al.

    The Economics of Bargaining

    (1969)
  • F. Sadri, F. Toni, P. Torroni, Dialogues for Negotiation: Agent Varieties and Dialogue Sequences, in: Pre-proceedings...
  • S.-l. Huang, Y. Yuan, F.-r. Lin, Adding Persuasion into On-line Bargaining Process, in: Proceedings of the 6th Pacific...
  • S. Matwin et al.

    Genetic algorithm approach to a negotiation support system

    IEEE Transactions on Systems, Man, and Cybernetics

    (1991)
  • J.R. Oliver

    A machine learning approach to automated negotiation and prospects for electronic commerce

    Journal of Management Information Systems

    (1997)
  • G. Dworman, S.O. Kimbrough, J.D. Laing, On Automated Discovery of Models Using Genetic Programming in Game Theoretic...
  • G. Dworman, S.O. Kimbrough, J.D. Laing, Bargaining by Artificial Agents in Two Coalition Games: A Study in Genetic...
  • D. Zeng, K. Sycara, Bayesian Learning in Negotiation, in: Working Notes of the AAAI ’96 Stanford Spring Symposium...
  • S. Buffett, B. Spencer, Learning Opponents’ Preferences in Multi-object Automated Negotiation, in: Proceedings of the...
  • F.-r. Lin et al.

    A multi-agent framework for automated online bargaining

    IEEE Intelligent Systems

    (2001)
  • L.-K. Soh, C. Tsatsoulis, Agent-Based Argumentative Negotiations with Case-Based Reasoning, in: Working Notes of the...
  • H. Ouchiyama, R. Huang, J. Ma, K.M. Sim, An Experience-based Evolutionary Negotiation Model, in: Proceedings of the 5th...
  • S. Zhang, S. Ye, F. Makedon, J. Ford, A Hybrid Negotiation Strategy Mechanism in an Automated Negotiation System, in:...
  • Cited by (6)

    • An intelligent negotiator agent design for bilateral contracts of electrical energy

      2014, Expert Systems with Applications
      Citation Excerpt :

      Some authors presented a Q-learning method that defined the belief of the agent about its offer/bid as its state vector to learn the optimal decision (Qu & Chen, 2010). Huang and Lin (2008) proposed a temporal difference (TD) learning method for multi agent bargaining problem and four years later, Jamali and Faez (2012) improved the proposed method by applying Simulated Annealing (SA) to overcome the challenge of finding a balance between exploration and exploitation. Both works used neural network (NN) as a function approximator for Q-value function in TD method.

    • Learning pareto optimal solution of a multi-attribute bilateral negotiation using deep reinforcement

      2020, Electronic Commerce Research and Applications
      Citation Excerpt :

      These approaches analyze the possible optimal solutions of the negotiation (Game Theory) or design an intelligent agent that learns to maximize its profit during the negotiation process (AI). In many researches in the AI field like (Hajimiri et al., 2014; Baek et al., 2007; S. Jamali, 2012; li Huang et al., 2008), only single attribute bilateral negotiation is studied, while in many applications, agents need to go through multi-attribute negotiation. Furthermore, the single attribute negotiation naturally results in a ’win-lose’ situation.

    • Exploring the Trust and Innovation Mechanisms in M&A of China's State Owned Enterprises with Mixed Ownership

      2021, Exploring the Trust and Innovation Mechanisms in M and A of China's State Owned Enterprises with Mixed Ownership
    • Learning methodologies to support e-business in the automated negotiation process

      2013, Intelligent Technologies and Techniques for Pervasive Computing
    • Applying SAQ-learning algorithm for trading agents in bilateral bargaining

      2012, Proceedings - 2012 14th International Conference on Modelling and Simulation, UKSim 2012
    View full text