Multi-agent Adaptive Dynamic Programming

Mukhopadhyay, Snehasis; Varghese, Joby

doi:10.1007/10720076_52

Snehasis Mukhopadhyay⁹ &
Joby Varghese⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1793))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

763 Accesses
1 Citations

Abstract

Dynamic programming offers an exact, general solution method for completely known sequential decision problems, formulated as Markov Decision Processes (MDP), with a finite number of states. Recently, there has been a great amount of interest in the adaptive version of the problem, where the task to be solved is not completely known a priori. In such a case, an agent has to acquire the necessary knowledge through learning, while simultaneously solving the optimal control or decision problem. A large variety of algorithms, variously known as Adaptive Dynamic Programming (ADP) or Reinforcement Learning (RL), has been proposed in the literature. However, almost invariably such algorithms suffer from slow convergence in terms of the number of experiments needed. In this paper we investigate how the learning speed can be considerably improved by exploiting and combining knowledge accumulated by multiple agents. These agents operate in the same task environment but follow possibly different trajectories. We discuss methods of combining the knowledge structures associated with the multiple agents and different strategies (with varying overheads) for knowledge communication between agents. Results of simulation experiments are also presented to indicate that combining multiple learning agents is a promising direction to improve learning speed. The method also performs significantly better than some of the fastest MDP learning algorithms such as the prioritized sweeping.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barto, A.G., Bradtke, S.J., Singh, S.P.: Learning to Act using Real-Time Dynamic Programming. Artificial Intelligence, Special Volume; Computational Research on Interaction and Agency 72(1), 81–138 (1995)
Google Scholar
Laskari, Y., Metral, M., Maes, P.: Collaborative interface agents. In: Proceedings of AAAI conference (1994)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Narendra, K.S., Balakrishnan, J.: Adaptive control using multiple models. IEEE Transactions on Automatic Control 42(2) (February 1997)
Google Scholar
Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master level play. Neural Computation 6(2), 215–219 (1994)
Article Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Moore, A.W., Atkeson, C.G.: Prioritized Sweeping: Reinforcement Learning with Less Data and Less Real Time. Machine Learning 13 (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, Indiana University Purdue University Indianapolis, SL280, 723W Michigan St., Indianapolis, IN, 46202, USA
Snehasis Mukhopadhyay & Joby Varghese

Authors

Snehasis Mukhopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Joby Varghese
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science Instituto Tecnológico Autónomo de México (ITAM), Río Hondo 1, 01080, México DF, México
Osvaldo Cairó
Monterrey Institute of Technology (ITESM), Campus Morelos, Av. Reforma 182-A, Lomas de Cuernvaca, Temixco, 62589, Morelos, Mexico
L. Enrique Sucar
ITESM Campus Monterrey, Research and Graduate Studies Office, Monterrey, N.L., México
Francisco J. Cantu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mukhopadhyay, S., Varghese, J. (2000). Multi-agent Adaptive Dynamic Programming. In: Cairó, O., Sucar, L.E., Cantu, F.J. (eds) MICAI 2000: Advances in Artificial Intelligence. MICAI 2000. Lecture Notes in Computer Science(), vol 1793. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10720076_52

Download citation

DOI: https://doi.org/10.1007/10720076_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67354-5
Online ISBN: 978-3-540-45562-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics