Hierarchical multi-agent reinforcement learning

Ghavamzadeh, Mohammad; Mahadevan, Sridhar; Makar, Rajbala

doi:10.1007/s10458-006-7035-4

Hierarchical multi-agent reinforcement learning

Published: 04 April 2006

Volume 13, pages 197–229, (2006)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Mohammad Ghavamzadeh¹,
Sridhar Mahadevan² &
Rajbala Makar³

1913 Accesses
69 Citations
Explore all metrics

Abstract

In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multi-agent tasks. We introduce a hierarchical multi-agent reinforcement learning (RL) framework, and propose a hierarchical multi-agent RL algorithm called Cooperative HRL. In this framework, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized, with each agent learning three interrelated skills: how to perform each individual subtask, the order in which to carry them out, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. A fundamental property of the proposed approach is that it allows agents to learn coordination faster by sharing information at the level of cooperative subtasks, rather than attempting to learn coordination at the level of primitive actions. We study the empirical performance of the Cooperative HRL algorithm using two testbeds: a simulated two-robot trash collection task, and a larger four-agent automated guided vehicle (AGV) scheduling problem. We compare the performance and speed of Cooperative HRL with other learning algorithms, as well as several well-known industrial AGV heuristics. We also address the issue of rational communication behavior among autonomous agents in this paper. The goal is for agents to learn both action and communication policies that together optimize the task given a communication cost. We extend the multi-agent HRL framework to include communication decisions and propose a cooperative multi-agent HRL algorithm called COM-Cooperative HRL. In this algorithm, we add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before an agent makes a decision at a cooperative subtask, it decides if it is worthwhile to perform a communication action. A communication action has a certain cost and provides the agent with the actions selected by the other agents at a cooperation level. We demonstrate the efficiency of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multi-agent taxi problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Askin R., Standridge C. (1993). Modeling and analysis of manufacturing systems. John Wiley and Sons.
Balch T. and Arkin R. (1998). Behavior-based formation control for multi-robot Teams. IEEE Transactions on Robotics and Automation, 14: 1–15
Article Google Scholar
Barto A. and Mahadevan S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Systems Special Issue on Reinforcement Learning, 13: 41–77
MATH MathSciNet Google Scholar
Bernstein D., Zilberstein S., Immerman N. (2000). The complexity of decentralized control of markov decision processes. In Proceedings of the sixteenth international conference on uncertainty in artificial intelligence (pp. 32–37).
Boutilier, C. (1999). Sequential optimality coordination in multi-agent systems. In Proceedings of the sixteenth international joint conference on artificial intelligence (pp. 478–485).
Bowling M. and Veloso M. (2002). Multiagent learning using a variable learning rate. Artificial Intelligence, 136: 215–250
Article MATH MathSciNet Google Scholar
Bui H., Venkatesh S. and West G. (2002). Policy recognition in the abstract hidden markov model. Journal of Artificial Intelligence Research, 17: 451–499
MATH MathSciNet Google Scholar
Crites R. and Barto A. (1998). Elevator group control using multiple reinforcement learning agents. Machine Learning, 33: 235–262
Article MATH Google Scholar
Dietterich T. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13: 227–303
MATH MathSciNet Google Scholar
Filar, J., Vrieze, K. (1997). Competitive Markov decision processes. Springer Verlag.
Ghavamzadeh, M., Mahadevan, S. (2003). Hierarchical policy gradient algorithms. In Proceedings of the twentieth international conference on machine learning (pp. 226–233).
Ghavamzadeh, M., Mahadevan, S. (2004). Learning to communicate and act using hierarchical reinforcement learning. In Proceedings of the third international joint conference on autonomous agents and multiagent systems (pp. 1114–1121).
Guestrin, C., Lagoudakis, M., Parr, R. (2002). Coordinated reinforcement learning. In Proceedings of the nineteenth international conference on machine learning (pp. 227–234).
Howard, R. (1971). Dynamic probabilistic systems: Semi-Markov and decision processes. John Wiley and Sons.
Hu, J., Wellman, M. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the fifteenth international conference on machine learning (pp. 242–250).
Kearns, M., Littman, M., Singh, S. (2001). Graphical models for game theory. In Proceedings of the seventeenth international conference on uncertainty in artificial intelligence (pp. 253–260).
Klein C. and Kim J. (1996). AGV dispatching. International Journal of Production Research, 34: 95–110
MATH Google Scholar
Koller D., Milch B. Multiagent influence diagrams for representing and solving games. In Proceedings of the seventeenth international joint conference on artificial intelligence (pp. 1027–1034).
La Mura, P. (2000). Game Networks. In Proceedings of the sixteenth international conference on uncertainty in artificial intelligence.
Lee J. (1996). Composite dispatching rules for multiple-vehicle agv systems. Simulation, 66: 121–130
Google Scholar
Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the eleventh international conference on machine learning (pp. 157–163).
Littman, M. (2001). Friend-or-Foe Q-learning in general-sum games. In Proceedings of the eighteenth international conference on machine learning (pp. 322–328).
Littman, M., Kearns, M., Singh, S. (2001). An efficient exact algorithm for singly connected graphical games. In Proceedings of neural information processing systems (pp. 817–824).
Makar, R., Mahadevan, S., Ghavamzadeh, M. (2001). Hierarchical multi-agent reinforcement learning. In Proceedings of the fifth international conference on autonomous agents (pp. 246–253).
Mataric M. (1997). Reinforcement learning in the multi-robot domain (1997). Autonomous Robots, 4: 73–83
Article Google Scholar
Ortiz, L., Kearns, M. (2002). Nash propagation for loopy graphical games. In Proceedings of neural information processing systems.
Owen, G. (1995). Game theory. Academic Press.
Parr, R. (1998). Hierarchical control and learning for Markov decision processes, PhD thesis, University of California, Berkeley.
Peshkin, L., Kim, K., Meuleau, N., Kaelbling, L. (2000). Learning to cooperate via policy search. In Proceedings of the sixteenth international conference on uncertainty in artificial intelligence (pp. 489–496).
Puterman, M. (1994). Markov decision processes. Wiley Interscience.
Pynadath D. and Tambe M. (2002). The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research, 16: 389–426
MATH MathSciNet Google Scholar
Rohanimanesh, K., Mahadevan, S. Learning to take concurrent actions. In Proceedings of the sixteenth annual conference on neural information processing systems.
Saria, S., Mahadevan, M. (2004). Probabilistic plan recognition in multiagent systems. In Proceedings of the fourteenth international conference on automated planning and scheduling (pp. 12–22).
Schneider, J., Wong, W., Moore, A., Riedmiller, M. Distributed value functions. In Proceedings of the sixteenth international conference on machine learning (pp. 371–378).
Singh, S., Kearns, M., Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games. In Proceedings of the sixteenth international conference on uncertainty in artificial intelligence (pp. 541–548).
Stone, P., Veloso, M. (1999). Team-partitioned, opaque-transition reinforcement learning. In Proceedings of the third international conference on autonomous agents (pp. 206–212).
Sugawara T., Lesser V. Learning to improve coordinated actions in cooperative distributed problem-solving environments. Machine Learning, 33: 129–154.
Sutton R., Precup D. and Singh S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112: 181–211
Article MATH MathSciNet Google Scholar
Tadepalli, P., Ok, D. Scaling up average reward reinforcement learning by approximating the domain models and the value function. In Proceedings of the thirteenth international conference on machine learning (pp. 471–479).
Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning (pp. 330–337).
Vickrey, D., Koller, D. (2002). Multiagent algorithms for solving graphical games. In Proceedings of the national conference on artificial intelligence (pp. 345–351).
Watkins, C. (1989). Learning from delayed rewards, PhD thesis, Kings College, Cambridge, England.
Weiss, G. (1999). Multi-agent systems: A modern approach to distributed artificial intelligence. MIT Press.
Xuan, P., Lesser, V., Zilberstein, S. (2001). Communication decisions in multi-agent cooperation: Model and experiments. In Proceedings of the fifth international conference on autonomous agents (pp. 616–623).
Xuan, P., Lesser, V. (2002). Multiagent policies: From centralized ones to decentralized ones. In Proceedings of the first international joint conference on autonomous agents and multiagent systems (pp. 1098–1105).

Download references

Author information

Authors and Affiliations

Department of Computing Science, University of Alberta, AB T6G 2E8, Canada
Mohammad Ghavamzadeh
Department of Computer Science, University of Massachusetts Amherst, MA 01003, USA
Sridhar Mahadevan
Agilent Technologies, Santa Rosa, CA, 95403, USA
Rajbala Makar

Authors

Mohammad Ghavamzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Sridhar Mahadevan
View author publications
You can also search for this author in PubMed Google Scholar
Rajbala Makar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Ghavamzadeh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghavamzadeh, M., Mahadevan, S. & Makar, R. Hierarchical multi-agent reinforcement learning. Auton Agent Multi-Agent Syst 13, 197–229 (2006). https://doi.org/10.1007/s10458-006-7035-4

Download citation

Published: 04 April 2006
Issue Date: September 2006
DOI: https://doi.org/10.1007/s10458-006-7035-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical multi-agent reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Pacesetter Learning for Large Scale Cooperative Multi-Agent Reinforcement Learning

Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes

Distributed Reinforcement Learning for Robot Teams: a Review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hierarchical multi-agent reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Pacesetter Learning for Large Scale Cooperative Multi-Agent Reinforcement Learning

Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes

Distributed Reinforcement Learning for Robot Teams: a Review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation