Abstract
Multiagent learning involves acquisition of cooperative behavior among intelligent agents in order to satisfy the joint goals. Reinforcement Learning (RL) is a promising unsupervised machine learning technique inspired from the earlier studies in animal learning. In this paper, we propose a new RL technique called the Two Level Reinforcement Learning with Communication (2LRL) method to provide cooperative action selection in a multiagent environment. In 2LRL, learning takes place in two hierarchical levels; in the first level agents learn to select their target and then they select the action directed to their target in the second level. The agents communicate their perception to their neighbors and use the communication information in their decision-making. We applied 2LRL method in a hunter-prey environment and observed a satisfactory cooperative behavior.
Similar content being viewed by others
References
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press
Stone P, Veloso M (1997) Multiagent systems: a survey from a machine learning perspective. Tech Rep, Mellon University
Sen S, Sekaran M, Hale J (1994) Learning to coordinate without sharing information. In: Proceedings of the 12th national conference on artificial intelligence, pp 426–431
Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning, pp 330–337
Kuter U, Polat F (2000) Learning better in dynamic, partially observable environment. In: Lindemann G (eds) Proc. of European conf. on artificial intelligence (ECAI) workshop on modeling artificial societies and hybrid organization, Berlin, pp 50–68
Polat F, Abul O (2002) Learning sequences of compatible actions among agents. Artif Intell Rev 17(1): 21–37
Weiss G (1993) Learning to coordinate actions in multiagent systems. In: Proceedings of the 13th international joint conference on artificial intelligence, pp 311–316
Korf RE (1992) A simple solution to pursuit games. In: Working papers of the 11th international workshop on distributed artificial intelligence, pp 183–194
Durfee E, Vidal MJ (1995) Recursive agent modeling using limited rationality. In: Proceedings of the first international conference on multiagent systems (ICMAS’95), pp 376–383
Haynes T, Sen S (1996) Evolving behavioral strategies in predators and prey. In: Weiss G, Sen S (eds) Adaptation and learning in multiagent systems, Springer Verlag, Berlin, pp 113–126
Tusscher KHWJ, Hagen SHG, Wiering MA (2000) The influence of communication on the choice to behave cooperatively. In: Proc. of 10th Belgian-Dutch conference on machine learning (BENELEARN’2000)
Senkul S, Polat F (2002) Learning intelligent behavior in a non-stationary and partially observable environment. Artif Intell Rev 18(2):97–115
Abul O, Polat F (2000) Multi-agent reinforcement learning using function approximation. IEEE Trans Syst, Man and Cybern, Part C, 30(4):485–497
Park M, Choi J (2002) New reinforcement learning method using multiple q-tables. In: world multiconference on systemics, cybernetics and informatics, pp 88–92
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Bertsekas DP (1987) Dynamic programming: deterministic and stochastic models. Prentice-Hall, Englewood Cliffs, NJ
Bertsekas DP (1995) Dynamic programming and optimal control. Athena scientific, Belmont, MA
MacKay DJC (1999) Introduction to monte carlo methods. In: Jordan M (eds) Learning in graphical models. MIT Press, Cambridge, MA: pp 175–204
Watkins CJCH (1989) Learning from delayed rewards. Ph.D. dissertation, Cambridge University
Ono N, Fukomoto K (1997) A modular approach to multiagent reinforcement learning. In: Weiss G (eds) Distributed artificial intelligence meets machine learning–learning in multiagent systems, vol. 1221, Springer-Verlag, Berlin, Germany, pp 25–39
Park K, Kim YJ, Kim JH (2001) Modular Q-learning based multiagent cooperation for robot soccer. Robot Auton Syst 35:109–122
Kaya M, Alhajj R (2004) Modular fuzzy-reinforcement learning approach with internal model capabilities for multiagent systems. IEEE Trans. on Syst, Man, and Cybern, Part B 34(2): 1210–1223
Fudenberg D, Levine D (1998) The theory of learning in games. MIT Press, Cambridge, MA
Littman ML (1994) Markov games as a framework for multiagent learning. In: Proceedings of the international conference on machine learning. San francisco, CA, pp 157–163
Szepesvari C, Littman ML (1999) A unified analysis of value-function-based reinforcement learning algorithms. Neur Comput 8:2017–2059
Hu J, Wellman MP (2001) Learning about other agents in a dynamic multiagent system. J Cognit Syst Res 2:67–79
Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the national conference on artificial intelligence, pp 746–752
Hu J, Wellman M (2003) Nash Q-learning for general-sum stochastic games. J Mach Learn Res 4:1039–1069
Littman ML (2001) Friend-or-foe: Q-Learning in general-sum games. In: Proceedings of the international conference on machine learning, pp 322–328
Shoham Y, Powers R, Grenager T (2003) Multiagent reinforcement learning: a critical survey. Computer Science Department, Tech Rep Stanford University
Hu J (2003) Best-response algorithm for multiagent reinforcement learning. In: Proceedings of the international conference on machine learning
Weinberg M, Rosenschein JS (2004) Best-response multiagent learning in non-stationary environments. In: Proceedings of international joint conference on autonomous agents and multiagent systems, pp 506–513
Bowling M, Veloso M (2001) Multiagent learning using a variable learning rate. Artif Intell 136:215–250
Strens M (2000) A bayesian framework for reinforcement learning. In: Proceedings of international conference on machine learning, Stanford University, CA
Chalkiadakis G, Boutilier C (2003) Coordination in multiagent reinforcement learning: a bayesian approach. In: Proceedings of international joint conference on autonomous agents and multiagent systems, Melbourne, Australia, pp 709–716
Barto A, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discr Event Dynam Syst 13(4):341–379
Marthi B, Russell S, Latham D, Guestrin C (2005) Concurrent hierarchical reinforcement learning. In: The twentieth international joint conference on artificial intelligence, IJ CAI, (accepted for presentation), Edinburgh, Scotland
Sutton R, Precup D, Singh S (1999) Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif intell 112(1–2):181–211
Parr R, Russell S (1998) Reinforcement learning with hierarchies of machines. In: Advances in neural information processing systems: Proc. of the 1997 conference. MIT Press, Cambridge, MA
Dietterich T (2000) Hierarchical reinforcement learning with the maxq value function decomposition. J Artif Intell Res 9:227–303
Menache I, Mannor S, Shimkin N (2002) Q-cut dynamic discovery of sub-goals in reinforcement learning. In: Proc. of European conference on machine learning, ECML’02, Springer-Verlag, London, UK, pp 295–306
Stolle M, Precup D (2002) Learning options in reinforcement learning. In Proc. of the int’l symposium on abstarction, reformulation and approximation. Springer-Verlag, London, UK, pp 212–223
Simsek O, Barto A (2004) Using relative novelty to identify useful temporal abstractions in reinforcement learning. In: Proc. of int’l conference on machine learning, ICML’04, Banff, Canada
Lin L (1992) Self-improving reactive agents based on in reinforcement learning, planning and teaching. Mach Learn 8(3–4):293–321
Picklett M, Barto A (2002) An algorithm for creating useful macro-actions in reinforcement learning. In: Proc. of int’l conference on machine learning, ICML’02
Girgin S, Polat F, Alhajj R (2006) Learning by automatic option discovery from conditionally terminating sequences. In: Proc. of the 17th European conference on artificial intelligence (ECAI), Riva del garda, Italy
Girgin S, Polat F (2005) Option discovery in reinforcement learning using frequent subsequences of actions. In: Proc. of international conference on intelligent agents web technologies and internet commerce, IAWTIC. IEEE, Vienna, Austria
Girgin S, Polat F, Alhajj R (2007) State similarity based approach for improving performance in rl. In: The twentieth international joint conference on artificial intelligence IJCAI, (Accepted for presentation), Hyderabad, India
Ghavamzadeh M, Mahadevan S, Makar R (2006) Hierarchical multiagent reinforcement learning. J Auton Agents Multiagent Syst 13(2):197–229, DOI: 10.1007/s10458-006-7035-4
Rovatsos M, Fischer F, Weiss G (2004) Hierarchical reinforcement learning for communicating agents. In: Hoe WVD (eds) Proceedings of the 2nd European Workshop on Multiagent Systems (EUMAS), pp 593–604
Author information
Authors and Affiliations
Corresponding author
Additional information
Guray Erus received the B.S. degree in computer engineering in 1999, and the M.S. degree in cognitive sciences, in 2002, from Middle East Technical University (METU), Ankara, Turkey. He is currently a teaching and research assistant in Rene“ Descartes University, Paris, France, where he prepares a doctoral dissertation on object detection on satellite images, as a member of the intelligent perception systems group (SIP-CRIP5). His research interests include multi-agent systems and image understanding.
Faruk Polat is a professor in the Department of Computer Engineering of Middle East Technical University, Ankara, Turkey. He received his B.Sc. in computer engineering from the Middle East Technical University, Ankara, in 1987 and his M.S. and Ph.D. degrees in computer engineering from Bilkent University, Ankara, in 1989 and 1993, respectively. He conducted research as a visiting NATO science scholar at Computer Science Department of University of Minnesota, Minneapolis in 1992–93. His research interests include artificial intelligence, multi-agent systems and object oriented data models.
Rights and permissions
About this article
Cite this article
Erus, G., Polat, F. A layered approach to learning coordination knowledge in multiagent environments. Appl Intell 27, 249–267 (2007). https://doi.org/10.1007/s10489-006-0034-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-006-0034-y