Skip to main content
Log in

A layered approach to learning coordination knowledge in multiagent environments

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multiagent learning involves acquisition of cooperative behavior among intelligent agents in order to satisfy the joint goals. Reinforcement Learning (RL) is a promising unsupervised machine learning technique inspired from the earlier studies in animal learning. In this paper, we propose a new RL technique called the Two Level Reinforcement Learning with Communication (2LRL) method to provide cooperative action selection in a multiagent environment. In 2LRL, learning takes place in two hierarchical levels; in the first level agents learn to select their target and then they select the action directed to their target in the second level. The agents communicate their perception to their neighbors and use the communication information in their decision-making. We applied 2LRL method in a hunter-prey environment and observed a satisfactory cooperative behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press

  2. Stone P, Veloso M (1997) Multiagent systems: a survey from a machine learning perspective. Tech Rep, Mellon University

  3. Sen S, Sekaran M, Hale J (1994) Learning to coordinate without sharing information. In: Proceedings of the 12th national conference on artificial intelligence, pp 426–431

  4. Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning, pp 330–337

  5. Kuter U, Polat F (2000) Learning better in dynamic, partially observable environment. In: Lindemann G (eds) Proc. of European conf. on artificial intelligence (ECAI) workshop on modeling artificial societies and hybrid organization, Berlin, pp 50–68

  6. Polat F, Abul O (2002) Learning sequences of compatible actions among agents. Artif Intell Rev 17(1): 21–37

    Article  MATH  Google Scholar 

  7. Weiss G (1993) Learning to coordinate actions in multiagent systems. In: Proceedings of the 13th international joint conference on artificial intelligence, pp 311–316

  8. Korf RE (1992) A simple solution to pursuit games. In: Working papers of the 11th international workshop on distributed artificial intelligence, pp 183–194

  9. Durfee E, Vidal MJ (1995) Recursive agent modeling using limited rationality. In: Proceedings of the first international conference on multiagent systems (ICMAS’95), pp 376–383

  10. Haynes T, Sen S (1996) Evolving behavioral strategies in predators and prey. In: Weiss G, Sen S (eds) Adaptation and learning in multiagent systems, Springer Verlag, Berlin, pp 113–126

    Google Scholar 

  11. Tusscher KHWJ, Hagen SHG, Wiering MA (2000) The influence of communication on the choice to behave cooperatively. In: Proc. of 10th Belgian-Dutch conference on machine learning (BENELEARN’2000)

  12. Senkul S, Polat F (2002) Learning intelligent behavior in a non-stationary and partially observable environment. Artif Intell Rev 18(2):97–115

    Article  Google Scholar 

  13. Abul O, Polat F (2000) Multi-agent reinforcement learning using function approximation. IEEE Trans Syst, Man and Cybern, Part C, 30(4):485–497

    Article  Google Scholar 

  14. Park M, Choi J (2002) New reinforcement learning method using multiple q-tables. In: world multiconference on systemics, cybernetics and informatics, pp 88–92

  15. Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Google Scholar 

  16. Bertsekas DP (1987) Dynamic programming: deterministic and stochastic models. Prentice-Hall, Englewood Cliffs, NJ

    MATH  Google Scholar 

  17. Bertsekas DP (1995) Dynamic programming and optimal control. Athena scientific, Belmont, MA

    MATH  Google Scholar 

  18. MacKay DJC (1999) Introduction to monte carlo methods. In: Jordan M (eds) Learning in graphical models. MIT Press, Cambridge, MA: pp 175–204

    Google Scholar 

  19. Watkins CJCH (1989) Learning from delayed rewards. Ph.D. dissertation, Cambridge University

  20. Ono N, Fukomoto K (1997) A modular approach to multiagent reinforcement learning. In: Weiss G (eds) Distributed artificial intelligence meets machine learning–learning in multiagent systems, vol. 1221, Springer-Verlag, Berlin, Germany, pp 25–39

    Google Scholar 

  21. Park K, Kim YJ, Kim JH (2001) Modular Q-learning based multiagent cooperation for robot soccer. Robot Auton Syst 35:109–122

    Article  MATH  Google Scholar 

  22. Kaya M, Alhajj R (2004) Modular fuzzy-reinforcement learning approach with internal model capabilities for multiagent systems. IEEE Trans. on Syst, Man, and Cybern, Part B 34(2): 1210–1223

    Google Scholar 

  23. Fudenberg D, Levine D (1998) The theory of learning in games. MIT Press, Cambridge, MA

    MATH  Google Scholar 

  24. Littman ML (1994) Markov games as a framework for multiagent learning. In: Proceedings of the international conference on machine learning. San francisco, CA, pp 157–163

  25. Szepesvari C, Littman ML (1999) A unified analysis of value-function-based reinforcement learning algorithms. Neur Comput 8:2017–2059

    Article  Google Scholar 

  26. Hu J, Wellman MP (2001) Learning about other agents in a dynamic multiagent system. J Cognit Syst Res 2:67–79

    Article  Google Scholar 

  27. Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the national conference on artificial intelligence, pp 746–752

  28. Hu J, Wellman M (2003) Nash Q-learning for general-sum stochastic games. J Mach Learn Res 4:1039–1069

    Article  MathSciNet  Google Scholar 

  29. Littman ML (2001) Friend-or-foe: Q-Learning in general-sum games. In: Proceedings of the international conference on machine learning, pp 322–328

  30. Shoham Y, Powers R, Grenager T (2003) Multiagent reinforcement learning: a critical survey. Computer Science Department, Tech Rep Stanford University

  31. Hu J (2003) Best-response algorithm for multiagent reinforcement learning. In: Proceedings of the international conference on machine learning

  32. Weinberg M, Rosenschein JS (2004) Best-response multiagent learning in non-stationary environments. In: Proceedings of international joint conference on autonomous agents and multiagent systems, pp 506–513

  33. Bowling M, Veloso M (2001) Multiagent learning using a variable learning rate. Artif Intell 136:215–250

    Article  MathSciNet  Google Scholar 

  34. Strens M (2000) A bayesian framework for reinforcement learning. In: Proceedings of international conference on machine learning, Stanford University, CA

  35. Chalkiadakis G, Boutilier C (2003) Coordination in multiagent reinforcement learning: a bayesian approach. In: Proceedings of international joint conference on autonomous agents and multiagent systems, Melbourne, Australia, pp 709–716

  36. Barto A, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discr Event Dynam Syst 13(4):341–379

    Article  MathSciNet  Google Scholar 

  37. Marthi B, Russell S, Latham D, Guestrin C (2005) Concurrent hierarchical reinforcement learning. In: The twentieth international joint conference on artificial intelligence, IJ CAI, (accepted for presentation), Edinburgh, Scotland

  38. Sutton R, Precup D, Singh S (1999) Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif intell 112(1–2):181–211

    Article  MATH  MathSciNet  Google Scholar 

  39. Parr R, Russell S (1998) Reinforcement learning with hierarchies of machines. In: Advances in neural information processing systems: Proc. of the 1997 conference. MIT Press, Cambridge, MA

  40. Dietterich T (2000) Hierarchical reinforcement learning with the maxq value function decomposition. J Artif Intell Res 9:227–303

    MathSciNet  Google Scholar 

  41. Menache I, Mannor S, Shimkin N (2002) Q-cut dynamic discovery of sub-goals in reinforcement learning. In: Proc. of European conference on machine learning, ECML’02, Springer-Verlag, London, UK, pp 295–306

  42. Stolle M, Precup D (2002) Learning options in reinforcement learning. In Proc. of the int’l symposium on abstarction, reformulation and approximation. Springer-Verlag, London, UK, pp 212–223

  43. Simsek O, Barto A (2004) Using relative novelty to identify useful temporal abstractions in reinforcement learning. In: Proc. of int’l conference on machine learning, ICML’04, Banff, Canada

  44. Lin L (1992) Self-improving reactive agents based on in reinforcement learning, planning and teaching. Mach Learn 8(3–4):293–321

    Google Scholar 

  45. Picklett M, Barto A (2002) An algorithm for creating useful macro-actions in reinforcement learning. In: Proc. of int’l conference on machine learning, ICML’02

  46. Girgin S, Polat F, Alhajj R (2006) Learning by automatic option discovery from conditionally terminating sequences. In: Proc. of the 17th European conference on artificial intelligence (ECAI), Riva del garda, Italy

  47. Girgin S, Polat F (2005) Option discovery in reinforcement learning using frequent subsequences of actions. In: Proc. of international conference on intelligent agents web technologies and internet commerce, IAWTIC. IEEE, Vienna, Austria

  48. Girgin S, Polat F, Alhajj R (2007) State similarity based approach for improving performance in rl. In: The twentieth international joint conference on artificial intelligence IJCAI, (Accepted for presentation), Hyderabad, India

  49. Ghavamzadeh M, Mahadevan S, Makar R (2006) Hierarchical multiagent reinforcement learning. J Auton Agents Multiagent Syst 13(2):197–229, DOI: 10.1007/s10458-006-7035-4

    Google Scholar 

  50. Rovatsos M, Fischer F, Weiss G (2004) Hierarchical reinforcement learning for communicating agents. In: Hoe WVD (eds) Proceedings of the 2nd European Workshop on Multiagent Systems (EUMAS), pp 593–604

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Faruk Polat.

Additional information

Guray Erus received the B.S. degree in computer engineering in 1999, and the M.S. degree in cognitive sciences, in 2002, from Middle East Technical University (METU), Ankara, Turkey. He is currently a teaching and research assistant in Rene“ Descartes University, Paris, France, where he prepares a doctoral dissertation on object detection on satellite images, as a member of the intelligent perception systems group (SIP-CRIP5). His research interests include multi-agent systems and image understanding.

Faruk Polat is a professor in the Department of Computer Engineering of Middle East Technical University, Ankara, Turkey. He received his B.Sc. in computer engineering from the Middle East Technical University, Ankara, in 1987 and his M.S. and Ph.D. degrees in computer engineering from Bilkent University, Ankara, in 1989 and 1993, respectively. He conducted research as a visiting NATO science scholar at Computer Science Department of University of Minnesota, Minneapolis in 1992–93. His research interests include artificial intelligence, multi-agent systems and object oriented data models.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Erus, G., Polat, F. A layered approach to learning coordination knowledge in multiagent environments. Appl Intell 27, 249–267 (2007). https://doi.org/10.1007/s10489-006-0034-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-006-0034-y

Keywords

Navigation