Abstract
Learning Automata (LA) were recently shown to be valuable tools for designing Multi-Agent Reinforcement Learning algorithms. One of the principal contributions of LA theory is that a set of decentralized, independent learning automata is able to control a finite Markov Chain with unknown transition probabilities and rewards. This result was recently extended to Markov Games and analyzed with the use of limiting games. In this paper we continue this analysis but we assume here that our agents are fully ignorant about the other agents in the environment, i.e. they can only observe themselves; they do not know how many other agents are present in the environment, the actions these other agents took, the rewards they received for this, or the location they occupy in the state space. We prove that in Markov Games, where agents have this limited type of observability, a network of independent LA is still able to converge to an equilibrium point of the underlying limiting game, provided a common ergodic assumption and provided the agents do not interfere each other’s transition probabilities.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Nowé, A., Verbeeck, K., Peeters, M.: Learning automata as a basis for multi-agent reinforcement learning. In: Tuyls, K., ’t Hoen, P.J., Verbeeck, K., Sen, S. (eds.) LAMAS 2005. LNCS (LNAI), vol. 3898, pp. 71–85. Springer, Heidelberg (2006)
Littman, M.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning, pp. 322–328 (1994)
Osborne, J., Rubinstein, A.: A Course in Game Theory. MIT Press, Cambridge (1994)
Vrancx, P., Verbeeck, K., Nowé, A.: Decentralized learning of markov games. Technical Report COMO/12/2006, Computational Modeling Lab, Vrije Universiteit Brussel, Brussels, Belgium (2006)
Wheeler, R., Narendra, K.: Decentralized learning in finite markov chains. IEEE Transactions on Automatic Control AC-31, 519–526 (1986)
Boutilier, C.: Planning, learning and coordination in multiagent decision processes. In: Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge, Renesse, Holland, pp. 195–210 (1996)
Thathachar, M., Phansalkar, V.: Learning the global maximum with parameterized learning automata. IEEE Transactions on Neural Networks 6(2), 398–406 (1995)
Narendra, K., Thathachar, M.: Learning Automata: An Introduction. Prentice-Hall International, Inc, Englewood Cliffs (1989)
Thathachar, M., Sastry, P.: Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic Publishers, Dordrecht (2004)
Sastry, P., Phansalkar, V., Thathachar, M.: Decentralized learning of nash equilibria in multi-person stochastic games with incomplete information. IEEE Transactions on Systems, Man, and Cybernetics 24(5), 769–777 (1994)
Hu, J., Wellman, M.: Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research 4, 1039–1069 (2003)
Wang, X., Sandholm, T.: Reinforcement learning to play an optimal nash equilibrium in team markov games. In: Proceedings of the Neural Information Processing Systems: Natural and Synthetic (NIPS) conference (2002)
Chang, Y.H., Ho, T., Kaelbling, L.P.: All learning is local: Multi-agent learning in global reward games. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16, MIT Press, Cambridge (2004)
Verbeeck, K., Nowé, A., Parent, J., Tuyls, K.: Exploring selfish reinforcement learning in repeated games with stochastic rewards. Technical report, Accepted at Journal of Autonomous Agents and Multi-Agent Systems (to appear 2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vrancx, P., Verbeeck, K., Nowé, A. (2008). Networks of Learning Automata and Limiting Games. In: Tuyls, K., Nowe, A., Guessoum, Z., Kudenko, D. (eds) Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning. AAMAS ALAMAS ALAMAS 2005 2007 2006. Lecture Notes in Computer Science(), vol 4865. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77949-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-77949-0_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77947-6
Online ISBN: 978-3-540-77949-0
eBook Packages: Computer ScienceComputer Science (R0)