A Bayesian reinforcement learning approach in markov games for computing near-optimal policies

Clempner, Julio B.

doi:10.1007/s10472-023-09860-3

A Bayesian reinforcement learning approach in markov games for computing near-optimal policies

Regular Submission
Published: 10 June 2023

Volume 91, pages 675–690, (2023)
Cite this article

Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Julio B. Clempner ORCID: orcid.org/0000-0002-5918-4671¹

188 Accesses
2 Citations
Explore all metrics

Abstract

Bayesian Learning is an inference method designed to tackle exploration-exploitation trade-off as a function of the uncertainty of a given probability model from observations within the Reinforcement Learning (RL) paradigm. It allows the incorporation of prior knowledge, as probabilistic distributions, into the algorithms. Finding the resulting Bayes-optimal policies is notorious problem. We focus our attention on RL of a special kind of ergodic and controllable Markov games. We propose a new framework for computing the near-optimal policies for each agent, where it is assumed that the Markov chains are regular and the inverse of the behavior strategy is well defined. A fundamental result of this paper is the development of a theoretical method that, based on the formulation of a non-linear problem, computes the near-optimal adaptive-behavior strategies and policies of the game under some restrictions that maximize the expected reward. We prove that such behavior strategies and the policies satisfy the Bayesian-Nash equilibrium. Another important result is that the RL process learn a model through the interaction of the agents with the environment, and shows how the proposed method can finitely approximate and estimate the elements of the transition matrices and utilities maintaining an efficient long-term learning performance measure. We develop the algorithm for implementing this model. A numerical empirical example shows how to deploy the estimation process as a function of agent experiences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning in the Presence of Multiple Agents

Trading Utility and Uncertainty: Applying the Value of Information to Resolve the Exploration–Exploitation Dilemma in Reinforcement Learning

A Dynamic Mechanism Design for Controllable and Ergodic Markov Games

Article 21 February 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availibility Statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

Araya-López V.M. Thomas, Buffet O.: Near-optimal brl using optimistic local transitions. In: ICML’12: Proceedings of the 29th International Coference on Machine Learning, Omnipres, Edinburgh, Scotland, pp 97–104 (2012)
Asiain, E., Clempner, J.B., Poznyak, A.S.: Controller exploitation-exploration: A reinforcement learning architecture. Soft Computing 23(11), 3591–3604 (2019)
Article MATH Google Scholar
Asmuth J., Li L., Littman M., Nouri A., Wingate D.: A bayesian sampling approach to exploration in reinforcement learning. In: UAI ’09: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, AUAI Press, Montreal, Quebec, Canada, pp 19–26 (2009)
Bellman R.: (1961) Adaptive Control Processes: A Guided Tour. Princeton University Press
Besson, R., Le Pennec, E.: Allassonnière S,: Learning from both experts and data. Entropy 21(12), 1208 (2019). https://doi.org/10.3390/e21121208
Article Google Scholar
Castro P.S., Precup D.: Using linear programming for bayesian exploration in markov decision processes. In: Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, Hyderabad, India, pp 2437–2442 (2007)
Chalkiadakis G., Boutilier C.: Coordination in multiagent reinforcementlearning: A bayesian approach. In: Proceedings of the 2nd InternationalJoint Conference on Autonomous Agents and Multiagent Systems, Association for Computing Machinery, Melbourne, Australia, pp 709–716 (2013)
Choi J., Kim K.E.: Map inference for bayesian inverse reinforcement learning. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, Granada, Spain, p 1989–1997 (2011)
Clempner, J.B.: A markovian stackelberg game approach for computing an optimal dynamic mechanism. Computational and Applied Mathematics 40(6), 1–25 (2021)
Article MathSciNet MATH Google Scholar
Clempner, J.B.: A proximal/gradient approach for computing the nash equilibrium in controllable markov games. J Optim Theory Appl 188(3), 847–862 (2021)
Article MathSciNet MATH Google Scholar
Clempner, J.B.: A dynamic mechanism design for controllable and ergodic markov games. Computational Economics To be published (2022). https://doi.org/10.1007/s10614-022-10240-y
Article Google Scholar
Clempner, J.B., Poznyak, A.S.: A tikhonov regularization parameter approach for solving lagrange constrained optimization problems. Engineering Optimization (2018). https://doi.org/10.1080/0305215X.2017.1418866, to be published
Clempner, J.B., Poznyak, A.S.: A tikhonov regularized penalty function approach for solving polylinear programming problems. J. Comput. Appl. Math. 328, 267–286 (2018)
Article MathSciNet MATH Google Scholar
Clempner, J.B., Poznyak, A.S.: A nucleus for bayesian partially observable markov games: Joint observer and mechanism design. Engineering Applications of Artificial Intelligence 95, 103876 (2020)
Article Google Scholar
Clempner, J.B., Poznyak, A.S.: Analytical method for mechanism design in partially observable markov games. Mathematics 9(4), 1–15 (2021)
Article Google Scholar
Clempner, J.B., Poznyak, A.S.: Computing a mechanism for a bayesian and partially observable markov approach. To be published, Int. J. Appl. Math. Comp. Sci (2023)
MATH Google Scholar
Dearden R., Friedman N., Andre D.: Model based bayesian exploration. In: UAI’99: Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., Stockholm, Sweden (1999)
Feldbaum A.A.: Dual control theory, parts i and ii. Automation and Remote Control 21:874–880 and 1033–1039 (1961)
Filatov, N., Unbehauen, H.: Survey of adaptive dual control methods. IEEE Control Theoryand Applications 147, 118–128 (2000)
Article Google Scholar
van Geen C., Gerraty R.T.: Hierarchical bayesian models of reinforcement learning: Introduction and comparison to alternative methods, bioRxiv 2020.10.19.345512, https://doi.org/10.1101/2020.10.19.345512 (2020)
Ghavamzadeh, M., Engel, Y.: Bayesian actor-critic algorithms. In: International Conference on Machine Learning, pp. 297–304. Coravallis, Oregon, USA (2007)
Chapter Google Scholar
Ghavamzadeh, M., Engel, Y.: Bayesian policy gradient algorithms. Neural Information Processing Systems 19, 457–464 (2007)
Google Scholar
Ghavamzadeh, M., Mannor, S., Pineau, J., Tamar, A.: Bayesian reinforcement learning: A survey. Foundations and Trends in Machine Learning 8(5–6), 359–492 (2015)
Article MATH Google Scholar
Grover D., Basu D., Dimitrakakis C.: Bayesian reinforcement learning via deep, sparse sampling. In: Chiappa S, Calandra R (eds) Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR, 108, 3036–3045 (2020)
Harsanyi, J.C., Selten, R.: A general theory of equilibrium selection in games. MIT Press, Cambridge, Massachusetts (1988)
MATH Google Scholar
Kassab R., Simeone O.: Federated generalized bayesian learning via distributed stein variational gradient descent, arXiv 2020, arXiv:2009.06419 (2020)
Klenske, E.D., Hennig, P.: Dual controlfor approximate bayesian reinforcement learning. Journal of Machine Learning Research 17, 1–30 (2016)
MATH Google Scholar
Kolter J., Ng A.Y.: Near-bayesian exploration in polynomial time. In: ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, Quebec, Canada, 513–520 (2009)
Kottke, D., Herde, M., Cea, Sandrock: oward optimal probabilistic active learning using a bayesian approach. Mach Learn 110, 1199–1231 (2021)
Article MathSciNet MATH Google Scholar
Nolan S., Smerzi A., Pezzè L.: A machine learning approach to bayesian parameter estimation, arXiv:2006.02369v2 (2020)
Osband I., Roy B.V., Russo D.: (more) efficient reinforcement learning via posterior sampling. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, Curran Associates Inc., Lake Tahoe, Nevada, 3003–3011 (2013)
Poupart P., Vlassis N., Hoey J., Regan K.: An analytic solution to discrete bayesian reinforcement learning. In: ICML ’06: Proceedings of the 23rd international conference on Machine learning, Association for Computing Machinery, Pittsburgh, Pennsylvania, USA, 697–704 (2006)
Senda, K., Hishinuma, T., Tani, Y.: Approximate bayesian reinforcement learning based on estimation of plant. Autonomous Robots 44, 845–857 (2020)
Article Google Scholar
Sutton, R.S., Barto, A.: Reinforcement learning: An introduction. MIT Press, Cambridge, MA, Introduction (1998)
MATH Google Scholar
Trejo, K.K., Clempner, J.B., Poznyak, A.S.: Computing the stackelberg/nash equilibria using the extraproximal method: Convergence analysis and implementation details for markov chains games. International Journal of Applied Mathematics and Computer Science 25(2), 337–351 (2015)
Article MathSciNet MATH Google Scholar
Trejo, K.K., Clempner, J.B., Poznyak, A.S.: Computing the bargaining approach for equalizing the ratios of maximal gains in continuous-time markov chains games. Computational Economics 54, 933–955 (2019). https://doi.org/10.1007/s10614-018-9859-9
Article Google Scholar
Trejo K.K., Juarez R., Clempner J.B., Poznyak A.S.: Non-cooperative bargaining with unsophisticated agents. Computational Economics 1–38 (2020)
Vasilyeva, M., Tyrylgin, A., Brown, D., Mondal, A.: Preconditioning markov chain monte carlo method for geomechanical subsidence using multiscale method and machine learning technique. Journal of Computational and Applied Mathematics 392, 113420 (2021)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Escuela Superior de Física y Matemáticas (School of Physics and Mathematics, Instituto Politécnico Nacional (National Polytechnic Institute), Building 9, Av. Instituto Politécnico Nacional, Gustavo A. Madero, San Pedro Zacatenco, 07738, Mexico City, Mexico
Julio B. Clempner

Authors

Julio B. Clempner
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Julio B. Clempner.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Clempner, J.B. A Bayesian reinforcement learning approach in markov games for computing near-optimal policies. Ann Math Artif Intell 91, 675–690 (2023). https://doi.org/10.1007/s10472-023-09860-3

Download citation

Accepted: 09 May 2023
Published: 10 June 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10472-023-09860-3

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Bayesian reinforcement learning approach in markov games for computing near-optimal policies

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning in the Presence of Multiple Agents

Trading Utility and Uncertainty: Applying the Value of Information to Resolve the Exploration–Exploitation Dilemma in Reinforcement Learning

A Dynamic Mechanism Design for Controllable and Ergodic Markov Games

Explore related subjects

Data Availibility Statement

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Subscribe and save

Buy Now