Pricing in Agent Economies Using Multi-Agent Q-Learning

Tesauro, Gerald; Kephart, Jeffrey O.

doi:10.1023/A:1015504423309

Pricing in Agent Economies Using Multi-Agent Q-Learning

Published: September 2002

Volume 5, pages 289–304, (2002)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Gerald Tesauro¹ &
Jeffrey O. Kephart¹

1274 Accesses
76 Citations
3 Altmetric
Explore all metrics

Abstract

This paper investigates how adaptive software agents may utilize reinforcement learning algorithms such as Q-learning to make economic decisions such as setting prices in a competitive marketplace. For a single adaptive agent facing fixed-strategy opponents, ordinary Q-learning is guaranteed to find the optimal policy. However, for a population of agents each trying to adapt in the presence of other adaptive agents, the problem becomes non-stationary and history dependent, and it is not known whether any global convergence will be obtained, and if so, whether such solutions will be optimal. In this paper, we study simultaneous Q-learning by two competing seller agents in three moderately realistic economic models. This is the simplest case in which interesting multi-agent phenomena can occur, and the state space is small enough so that lookup tables can be used to represent the Q-functions. We find that, despite the lack of theoretical guarantees, simultaneous convergence to self-consistent optimal solutions is obtained in each model, at least for small values of the discount parameter. In some cases, exact or approximate convergence is also found even at large discount parameters. We show how the Q-derived policies increase profitability and damp out or eliminate cyclic price “wars” compared to simpler policies based on zero lookahead or short-term lookahead. In one of the models (the “Shopbot” model) where the sellers' profit functions are symmetric, we find that Q-learning can produce either symmetric or broken-symmetry policies, depending on the discount parameter and on initial conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

R. H. Crites and A. G. Improving elevator performance using reinforcement learning,” in D. Touretzky et al. (eds.), Advances in Neural Information Processing Systems, MIT Press, 1996, vol. 8, pp. 1017-1023.
D. Fudenberg and J. Tirole, Game Theory, MIT Press: Cambridge, MA: 1991.
Google Scholar
A. Greenwald and J. O. Kephart, “Shopbots and pricebots,” to appear in: Proc. IJCAI-99, 1999.
J. Hu and M. P.Wellman, “Multiagent reinforcement learning: theoretical framework and an algorithm,” Proc. ICML-98, 1998.
J. O. Kephart, J. E. Hanson and, J. Sairamesh, “Price-war dynamics in a free-market economy of software agents Proc. ALIFE-VI, Los Angeles, 1998.
D. Kreps, A Course in Microeconomic Theory, Princeton Univ. Press: Princeton, NJ, 1990.
Google Scholar
M. L. Littman, “Markov games as a framework for multi-agent reinforcement learning,” Proc. Eleventh Int. Conf. Machine Learning, Morgan Kaufmann, 1994, pp. 157-163.
J. Sairamesh and J. O. Kephart, “Dynamics of price and quality differentiation in information and computational markets,” Proc. First Int. Conf. Information and Computation Economics (ICE-98), ACM Press, 1998, pp. 28-36.
T. W. Sandholm and R. H. Crites, “On multiagent Q-Learning in a semi-competitive domain,” 14th Int. Joint Conf. Artificial Intelligence (IJCAI-95) Workshop on Adaptation and Learning in Multiagent Systems, Montreal, Canada, 1995, pp. 71–77.
M. Sridharan and G. Tesauro, “Multi-agent Q-learning and regression trees for automated pricing decisions,” Proc. ICML-00, to appear, 2000.
G. Tesauro, “Temporal difference learning and TD-Gammon,” Comm. of the ACM, vol. 38, no. 3, pp. 58-67, 1995.
Google Scholar
G. J. Tesauro and J. O. Kephart, “Foresight-based pricing algorithms in an economy of software agents,” Proc. First Int. Conf. Information and Computation Economics (ICE-98), ACM Press, 1998, pp. 37-44.
G. J. Tesauro and J. O. Kephart, “Foresight-based pricing algorithms in agent economies,” Decision Support Sciences, to appear, 1999.
J. M. Vidal and E. H. Durfee, “Learning nested agent models in an information economy,” J. Experimental and Theoretical AI, to appear, 1998.
C. J. C. H. Watkins, “Learning from delayed rewards,” Ph.D. thesis, Cambridge University, 1989.
C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, pp. 279-292, 1992.
Google Scholar
W. Zhang and T. G. Dietterich, “High-performance job-shop scheduling with a time-delay TD(λ) network.” in D. Touretzky et al. (eds.), Advances in Neural Information Processing Systems, am Press, 1996, vol. 8, pp. 1024-1030.

Download references

Author information

Authors and Affiliations

IBM Institute for Advanced Commerce, IBM Thomas J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Gerald Tesauro & Jeffrey O. Kephart

Authors

Gerald Tesauro
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey O. Kephart
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tesauro, G., Kephart, J.O. Pricing in Agent Economies Using Multi-Agent Q-Learning. Autonomous Agents and Multi-Agent Systems 5, 289–304 (2002). https://doi.org/10.1023/A:1015504423309

Download citation

Issue Date: September 2002
DOI: https://doi.org/10.1023/A:1015504423309

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pricing in Agent Economies Using Multi-Agent Q-Learning

Abstract

Access this article

Similar content being viewed by others

Self-Adaptive Agents in a Dynamic Pricing Duopoly: Competition, Collusion, and Risk Considerations

Multi-agent Dynamic Pricing Using Reinforcement Learning and Asymmetric Information

Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Pricing in Agent Economies Using Multi-Agent Q-Learning

Abstract

Access this article

Similar content being viewed by others

Self-Adaptive Agents in a Dynamic Pricing Duopoly: Competition, Collusion, and Risk Considerations

Multi-agent Dynamic Pricing Using Reinforcement Learning and Asymmetric Information

Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation