Abstract
Episodic Q-learning is successfully applied to a multi-agent cooperative task, which is strongly non-Markovian and for which Q-learning is believed to have poor performance. The 3-hunter game, which is a modified version of the pursuit problem, is employed and the time necessary for hunters to capture the escapee is measured. By restricting the amount of the history used for learning, a significant increase in the speed of learning is realized. The success is not accidental, but based on the mind-reading algorithm we have proposed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Sutton, R. S.: “Learning to Predict by the Methods of Temporal Differences,” Machine Learning 3, pp.9–44(1988).
Watkins, C.J.C.H. and Dayan, O.: “Technical Note: Q-learning,” Machine Learning 8, pp.55–68 (1992).
Grefenstette, J.J.: “Credit Assignment in Rule Discovery Systems Based on Genetic Algorithms,” Machine Learning, Vol.3, pp.225–245(1988)
Arai, S., Miyazaki, K. and Kobayashi, S.: “Methodology in Multi-Agent Reinforcement Learning —Approaches by Q-learning and Profit Sharing —, J. Japanese Society for Artificial Intelligence, Vol.13, No.4, p609–618(1998) (in Japanese).
Arai, S., Sycara, K., and Payne, T.R.: “Multi-agent Reinforcement learning for Planning and Scheduling Multiple Goals”, Proc. 4th Intern. Conf. on MultiAgent Systems (ICMAS2000), pp.359–360 (2000).
Sutton, R.S and Barto, A.G., Reinforcement learning, an introduction, A Bradford Book, The MIT Press(1998).
Ito, A. and Kanabuchi, M., Speeding Up Multiagent Reinforcement Learning by Coarse-Graining of Perception: The Hunter Game, Electronics and Communications in Japan, Vol.84, No.12, pp.37–45 (2001).
Ito, A., The Emergence of Mindreading Ability —Multi-Player Prisoner’s Dilemma Game —, Cognitive Studies, Vol.6, No. 1, pp.77–87 (1999) (in Japanese).
Benda, M., Jagannathan, V., and Dodhiawalla, R.: “On Optimal Cooperation of Knowledge Sources”, Technical Report BCS-G2010-28, Boeing AI Center (1985).
Ito, A. and Yano, H.: “The Emergence of Cooperation in a Society of Autonomous Agents,” First Intl. Conf. on Multi Agent Systems, (ICMAS’95), pp. 201–208, San Fransisco, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ito, A. (2002). Application of Episodic Q-Learning to a Multi-agent Cooperative Task. In: Ishizuka, M., Sattar, A. (eds) PRICAI 2002: Trends in Artificial Intelligence. PRICAI 2002. Lecture Notes in Computer Science(), vol 2417. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45683-X_22
Download citation
DOI: https://doi.org/10.1007/3-540-45683-X_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44038-3
Online ISBN: 978-3-540-45683-4
eBook Packages: Springer Book Archive