Skip to main content

Approximate Nash Equilibrium Learning for n-Player Markov Games in Dynamic Pricing

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14967))

Included in the following conference series:

  • 118 Accesses

Abstract

We investigate Nash equilibrium learning in a competitive Markov Game (MG) environment, where multiple agents compete, and multiple Nash equilibria can exist. In particular, for an oligopolistic dynamic pricing environment, exact Nash equilibria are difficult to obtain due to the curse-of-dimensionality. We develop a new model-free method to find approximate Nash equilibria. Gradient-free black box optimization is then applied to estimate \(\epsilon \), the maximum reward advantage of an agent unilaterally deviating from any joint policy, and to also estimate the \(\epsilon \)-minimizing policy for any given state. The policy-\(\epsilon \) correspondence and the state to \(\epsilon \)-minimizing policy are represented by neural networks, the latter being the Nash Policy Net. During batch update, we perform Nash Q learning on the system, by adjusting the action probabilities using the Nash Policy Net. We demonstrate that an approximate Nash equilibrium can be learned, particularly in the dynamic pricing domain where exact solutions are often intractable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Briesch, R.A., et al.: A comparative analysis of reference price models. J. Cons. Res. 24(2), 202–214 (1997)

    Google Scholar 

  2. Ceppi, S., et al.: Local Search Methods for Finding a Nash Equi-librium in Two-Player Games, pp. 335–342 (2010). https://doi.org/10.1109/WI-IAT.2010.57

  3. Daskalakis, C., Goldberg, P.W., Pa-padimitriou, C.H.: The Complexity of Computing a Nash Equilibrium . In: Proceedings of the Thirty-Eighth Annual ACM Symposium on Theory of Computing, STOC 2006, pp. 71–78. Association for Computing Machinery, Seattle, WA, USA (2006). https://doi.org/10.1145/1132516.1132527, isbn: 1595931341

  4. Diouane, Y., et al.: TREGO: a Trust-Region Framework for Efficient Global Optimization. arXiv: org/abs/2101.06808 (2021)

  5. Eriksson, D., et al.: Scalable Global Optimization via Local Bayesian Optimization. arXiv: 1910.01739 (2019)

  6. Ferreira, K.J., Alex Lee, B.H., Simchi-Levi, D.: Analytics for an online retailer: Demand forecasting and price optimization. Manufact. Serv. Operations Manag. 18(1), 69–88 (2016)

    Google Scholar 

  7. Goeree, J.K., Holt, C.A., Palfrey, T/R.: Stochastic game theory for social science: a primer on quantal response equilibrium. Cheltenham. Edward Elgar Publishing, UK (2020). https://www.elgaronline.com/view/edcoll/9781785363320/9781785363320.xml, isbn: 9781785363320

  8. Michael Harrison, J., Bora Keskin, N., Zeevi, A.: Bayesian dynamic pricing policies: learning and earning under a binary prior distribution. Manag. Sci. 58(3), 570–586 (2012)

    Google Scholar 

  9. Junling, H., Wellman, M.P.: Nash Q-Learning for General-Sum Stochastic Games. J. Mach. Learn. Res. 4, 1039–1069 (2003), issn: 1532-4435

    Google Scholar 

  10. Janiszewski, C., Lichtenstein, D.R.: A range theory ac-count of price perception. J. Consumer Res. 25(4), 353–368 (1999)

    Google Scholar 

  11. Liu, J., et al.: Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning arXiv: 1912.02572 (2019)

  12. Liu, L.: Approximate Nash Equilibrium Learning for n-Player Markov Games in Dynamic Pricing . arXiv preprint arXiv: 2207.06492 (2022)

  13. Jue, Liu., Zhan, Pang., Linggang, Qi.: Dynamic pricing and in-ventory management with demand learning: a bayesian approach. Comput. Operations Re. 124, 105078 (2020). https://doi.org/10.1016/j.cor.2020.105078

  14. Duncan Luce, R.: Individual Choice Behavior: A Theoretical Analysis. Wiley, New York, NY, USA (1959)

    Google Scholar 

  15. Mnih, V., et al.: Human-level control through deep reinforcement learning . Nature 518(7540), 529–533 ( 2015), issn: 00280836

    Google Scholar 

  16. Nash, J.F.: Equilibrium points in n-person games. Proc. Nat. Acad. Sci. 36(1), 48–49 (1950). https://doi.org/10.1073/pnas.36.1.48. https://www.pnas.org/content/36/1/48issn: 0027-8424

  17. Porter, R., Nudelman, E., Shoham, Y.: Simple search methods for finding a Nash equilibrium. Games Econ. Behav. 63(2), 642–662 (2008). Second World Congress of the Game Theory Society

    Google Scholar 

  18. Ramponi, G., et al.: Learning in non-cooperative configurable markov decision processes . In: Beygelzimer, A., et al. (ed.) Advances in Neural Information Processing Systems (2021). https://openreview.net/forum?id=t-0eCf8L4-a

  19. Rao, V.R.: Pricing models in marketing. Handbooks Operat. Res. Manag. Sci. 5, 517–552 (1993)

    Google Scholar 

  20. Raman, K., Bass, F.M., et al.: A general test of reference price theory in the presence of threshold effects. In: Tijdschrift voor Economie en management 47(2), 205–226 (2002)

    Google Scholar 

  21. Regis, R.G.: Trust regions in Kriging-based optimization with expected improvement. Eng. Optimiz. 48(6), 1037–1059 (2016). https://doi.org/10.1080/0305215X.2015.1082350

  22. Sayin, M.O., et al.: Decentralized Q-Learning in Zero-sum Markov Games, arXiv: 2106.02748 (2021)

  23. Thompson, W.R.: On the likelihood that one unknown prob-ability exceeds another in view of evidence of two samples. Biometrika 25(3–4), 285–294 (1933). https://doi.org/10.1093/biomet/25.3-4.285. https://academic.oup.com/biomet/article-pdf/25/3-4/285/513725/25-3-4-285.pdf

  24. Taudes, A., Rudloff, C.: Integrating inventory control and a price change in the presence of reference price effects: a two-period model. Mathem. Methods Operat. Res. 75(1), 29–65 (2012)

    Google Scholar 

  25. Christopher, J.C.H.: Watkins and Peter Dayan Q-learning. Mach. Learn. 8(3), 279–292 (1992). https://doi.org/10.1007/BF00992698

    Article  Google Scholar 

  26. Wang, L., Fonseca, R., Tian, Y.: Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search, arXiv: 2007.00708 (2020)

  27. Wang, X., Sandholm, T.: Reinforcement learning to play an optimal nash equilibrium in team markov games . In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems. Ed. by . Vol. 15. MIT Press (2003). https://proceedings.neurips.cc/paper/2002/file/f8e59f4b2fe7c5705bf878bbd494ccdf-Paper.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Larkin Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, L. (2025). Approximate Nash Equilibrium Learning for n-Player Markov Games in Dynamic Pricing. In: Santos, M.F., Machado, J., Novais, P., Cortez, P., Moreira, P.M. (eds) Progress in Artificial Intelligence. EPIA 2024. Lecture Notes in Computer Science(), vol 14967. Springer, Cham. https://doi.org/10.1007/978-3-031-73497-7_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73497-7_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73496-0

  • Online ISBN: 978-3-031-73497-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics