Abstract
A reinforcement learning algorithm OP-Q for multi-agent systems based on Hurwicz’s optimistic-pessimistic criterion which allows to embed preliminary knowledge on the degree of environment friendliness is proposed. The proof of its convergence to stationary policy is given. Thorough testing of the developed algorithm against well-known reinforcement learning algorithms has shown that OP-Q can function on the level of its opponents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: ICML, pp. 157–163 (1994)
Littman, M.L.: Friend-or-foe q-learning in general-sum games. In: Brodley, C.E., Danyluk, A.P. (eds.) ICML, pp. 322–328. Morgan Kaufmann, San Francisco (2001)
Hu, J., Wellman, M.P.: Multiagent reinforcement learning: theoretical framework and an algorithm. In: Proc. 15th International Conf. on Machine Learning, pp. 242–250. Morgan Kaufmann, San Francisco (1998)
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: AAAI 1998/IAAI 1998: Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, pp. 746–752. American Association for Artificial Intelligence, Menlo Park (1998)
Bowling, M.H., Veloso, M.M.: Multiagent learning using a variable learning rate. Artificial Intelligence 136(2), 215–250 (2002)
Szepesvári, C., Littman, M.L.: Generalized markov decision processes: Dynamic-programming and reinforcement-learning algorithms. Technical report, Providence, RI, USA (1996)
Robbins, H., Monro, S.: A stochastic approximation method. Annals of Mathematical Statistics 22(3), 400–407 (1951)
Arrow, K.: Hurwiczs optimality criterion for decision making under ignorance. Technical Report 6, Stanford University (1953)
Nudelman, E., Wortman, J., Shoham, Y., Leyton-Brown, K.: Run the gamut: A comprehensive approach to evaluating game-theoretic algorithms. In: AAMAS 2004, pp. 880–887. IEEE Computer Society, Los Alamitos (2004)
Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, England (1989)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Akchurina, N. (2008). Optimistic-Pessimistic Q-Learning Algorithm for Multi-Agent Systems. In: Bergmann, R., Lindemann, G., Kirn, S., Pěchouček, M. (eds) Multiagent System Technologies. MATES 2008. Lecture Notes in Computer Science(), vol 5244. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87805-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-87805-6_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87804-9
Online ISBN: 978-3-540-87805-6
eBook Packages: Computer ScienceComputer Science (R0)