Skip to main content

Multi-policy Optimization in Self-organizing Systems

  • Conference paper
Book cover Self-Organizing Architectures (SOAR 2009)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 6090))

Included in the following conference series:

Abstract

Self-organizing systems are often implemented as collections of collaborating agents. Such agents may need to optimize their own performance according to multiple policies as well as contribute to the optimization of overall system performance towards a potentially different set of policies. These policies can be heterogeneous, i.e., be implemented on different sets of agents, be active at different times and have different levels of priority, leading to the heterogeneity of the agents of which the system is composed. Numerous biologically-inspired techniques as well as techniques from artificial intelligence have been used to implement such self-organizing systems. In this paper we review the most commonly used techniques for multi-policy optimization in such systems, specifically, those based on ant colony optimization, evolutionary algorithms, particle swarm optimization and reinforcement learning (RL). We analyze the characteristics and existing applications of the reviewed algorithms, assessing their suitability for particular types of optimization problems, based on the environment and policy characteristics. We focus on RL, as it is considered particularly suitable for large-scale self-organizing systems due to its ability to take into account the long-term consequences of the actions executed. Therefore, RL enables the system to learn not only the immediate payoffs of its actions, but also the best actions for the long-term performance of the system. Existing RL implementations mostly focus on optimization towards a single system policy, while most multi-policy RL-based optimization techniques have so far been implemented only on a single agent. We argue that, in order to be more widely utilized as a technique for self-optimization, RL needs to address both multiple policies and multiple agents simultaneously, and analyze the challenges associated with extending existing or developing new RL optimization techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Angus, D., Woodward, C.: Multiple objective ant colony optimisation. Swarm Intelligence 3(1), 69–85 (2009)

    Article  Google Scholar 

  2. Babaoglu, O., Meling, H., Montresor, A.: Anthill: A framework for the development of agent-based peer-to-peer systems. In: International Conference on Distributed Computing Systems (2002)

    Google Scholar 

  3. Baird, L., Moore, A.: Gradient descent for general reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 11, pp. 968–974. MIT Press, Cambridge (1999)

    Google Scholar 

  4. Baran, B., Schaerer, M.: A multiobjective ant colony system for vehicle routing problem with time windows. In: Proceedings of IASTED International Conference on Applied Informatics (2003)

    Google Scholar 

  5. Barrett, L., Narayanan, S.: Learning all optimal policies with multiple criteria. In: ICML 2008: Proceedings of the 25th International Conference on Machine Learning, pp. 41–47 (2008)

    Google Scholar 

  6. Bigus, J.P., Schlosnagle, D.A., Pilgrim, J.R., Nathaniel Mills III, W., Diao, Y.: Able: A toolkit for building multiagent autonomic systems. IBM Systems Journal 41(3), 350–371 (2002)

    Article  Google Scholar 

  7. Blum, C., Merkle, D. (eds.): Swarm Intelligence: Introduction and Applications. Natural Computing Series. Springer, Heidelberg (2008)

    MATH  Google Scholar 

  8. Brooks, R.: Achieving artificial intelligence through building robots. Technical report, Massachusetts Institute of Technology, Cambridge, MA, USA (1986)

    Google Scholar 

  9. Brooks, R.A.: How to build complete creatures rather than isolated cognitive simulators. In: Architectures for Intelligence, pp. 225–239. Erlbaum, Mahwah (1991)

    Google Scholar 

  10. Busoniu, L., Schutter, B.D., Babuska, R.: Learning and coordination in dynamic multiagent systems. Technical Report 05-019, Delft Center for Systems and Control, Delft University of Technology, Delft, The Netherlands (October 2005)

    Google Scholar 

  11. Cantu Paz, E., Kamath, C.: An empirical comparison of combinations of evolutionary algorithms and neural networks for classification problems 35(5), 915–927 (October 2005)

    Google Scholar 

  12. Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, pp. 746–752. AAAI Press, Menlo Park (1998)

    Google Scholar 

  13. Coello, C.A.C.: A comprehensive survey of evolutionary-based multiobjective optimization techniques. Knowledge and Information Systems 1, 269–308 (1999)

    Google Scholar 

  14. Cuayahuitl, H., Renals, S., Lemon, O., Shimodaira, H.: Learning multi-goal dialogue strategies using reinforcement learning with reduced state-action spaces. International Journal of Game Theory, 547–565 (2006)

    Google Scholar 

  15. Cui, X., Potok, T., Palathingal, P.: Document clustering using particle swarm optimization. In: Swarm Intelligence Symposium (2005)

    Google Scholar 

  16. Di Caro, G., Dorigo, M.: AntNet: Distributed Stigmergetic Control for Communication Networks. Journal of Artificial Intelligence Research 9, 317–365 (1998)

    MATH  Google Scholar 

  17. Di Caro, G., Ducatelle, F., Gambardella, L.M.: AntHocNet: An adaptive nature-inspired algorithm for routing in mobile ad hoc networks. European Transactions on Telecommunications, Special Issue on Self-organization in Mobile Networking 16, 443–455 (2005)

    Google Scholar 

  18. Di Marzo Serugendo, G., Gleizes, M.-P., Karageorgos, A.: Self-organization in multi-agent systems. Knowl. Eng. Rev. 20(2), 165–189 (2005)

    Article  Google Scholar 

  19. Doerner, K., Hartl, R., Reimann, M.: Are COMPETants more competent for problem solving? - the case of full truckload transportation. Central European Journal of Operations Research 11(2), 115–141 (2003)

    MATH  MathSciNet  Google Scholar 

  20. Dorigo, M., Di Caro, G.D.: The Ant Colony Optimization Meta-Heuristic, pp. 11–32. McGraw-Hill, London (1999)

    Google Scholar 

  21. Dowling, J.: The Decentralised Coordination of Self-Adaptive Components for Autonomic Distributed Systems. PhD thesis, Trinity College Dublin (2005)

    Google Scholar 

  22. Dowling, J., Cunningham, R., Curran, E., Cahill, V.: Building autonomic systems using collaborative reinforcement learning. Knowledge Engineering Review 21(3), 231–238 (2006)

    Article  Google Scholar 

  23. Dowling, J., Haridi, S.: Decentralized Reinforcement Learning for the Online Optimization of Distributed Systems. In: Reinforcement Learning. I-Tech Education and Publishing (2008)

    Google Scholar 

  24. Dusparic, I., Cahill, V.: Distributed W-Learning: Multi-policy optimization in self-organizing systems. In: Third IEEE International Conference on Self-Adaptive and Self-Organizing Systems (2009)

    Google Scholar 

  25. Eiben, A., Smith, J.: Introduction to Evolutionary Computing. Natural Computing Series. Springer, Heidelberg (2003)

    MATH  Google Scholar 

  26. Eiben, A.E.: Evolutionary computing and autonomic computing: Shared problems, shared solutions? In: Babaoğlu, Ö., Jelasity, M., Montresor, A., Fetzer, C., Leonardi, S., van Moorsel, A., van Steen, M. (eds.) SELF-STAR 2004. LNCS, vol. 3460, pp. 36–48. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  27. Gábor, Z., Kalmár, Z., Szepesvári, C.: Multi-criteria reinforcement learning. In: ICML 1998: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 197–205. Morgan Kaufmann Publishers Inc., San Francisco (1998)

    Google Scholar 

  28. Gadanho, S.C., Hallam, J.: Robot learning driven by emotions. Adaptive Behaviour 9(1), 42–64 (2001)

    Article  Google Scholar 

  29. Gambardella, L.M., Taillard, E., Agazzi, G.: MACS-VRPTW: a multiple ant colony system for vehicle routing problems with time windows, pp. 63–76 (1999)

    Google Scholar 

  30. Garcia-Martinez, C., Cordon, O., Herrera, F.: A taxonomy and an empirical analysis of multiple objective ant colony optimization algorithms for the bi-criteria tsp. European Journal of Operational Research 180(1), 116–148 (2007)

    Article  MATH  Google Scholar 

  31. Goldman, C.V., Zilberstein, S.: Optimizing information exchange in cooperative multi-agent systems. In: AAMAS 2003: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 137–144. ACM, New York (2003)

    Chapter  Google Scholar 

  32. Goldman, C.V., Zilberstein, S.: Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of Artificial Intelligence Research (JAIR) 22, 143–174 (2004)

    MATH  MathSciNet  Google Scholar 

  33. Guestrin, C., Koller, D., Parr, R.: Multiagent planning with factored MDPs. In: 14th Neural Information Processing Systems (NIPS-14), Vancouver, Canada, pp. 1523–1530 (December 2001)

    Google Scholar 

  34. Guestrin, C., Lagoudakis, M., Parr, R.: Coordinated reinforcement learning. In: Proceedings of the ICML 2002 The Nineteenth International Conference on Machine Learning, pp. 227–234 (2002)

    Google Scholar 

  35. Hiraoka, K., Yoshida, M., Mishima, T.: Parallel reinforcement learning for weighted multi-criteria model with adaptive margin. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds.) ICONIP 2007, Part I. LNCS, vol. 4984, pp. 487–496. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  36. Hoar, R., Penner, J., Jacob, C.: Evolutionary swarm traffic: if ant roads had traffic lights. In: CEC 2002: Proceedings of the Evolutionary Computation on 2002, CEC 2002. Proceedings of the 2002 Congress, Washington, DC, USA, pp. 1910–1915. IEEE Computer Society, Los Alamitos (2002)

    Google Scholar 

  37. Humphrys, M.: Action Selection methods using Reinforcement Learning. PhD thesis, University of Cambridge (1996)

    Google Scholar 

  38. Kadrovach, B.A., Lamont, G.B.: A particle swarm model for swarm-based networked sensor systems. In: SAC 2002: Proceedings of the 2002 ACM Symposium on Applied Computing, pp. 918–924. ACM, New York (2002)

    Chapter  Google Scholar 

  39. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1995)

    Article  MathSciNet  Google Scholar 

  40. Kalyanmoy, D.: Multi-Objective Optimization Using Evolutionary Algorithms. Wiley, Chichester (2001)

    MATH  Google Scholar 

  41. Karlsson, J.: Learning to solve multiple goals. PhD thesis, Rochester, NY, USA (1997)

    Google Scholar 

  42. Kennedy, J., Russell, E.C.: Swarm Intelligence, The Morgan Kaufmann Series in Artificial Intelligence. Morgan Kaufmann, San Francisco (March 2001)

    Google Scholar 

  43. Kephart, J.O., Walsh, W.E.: An artificial intelligence perspective on autonomic computing policies. In: IEEE International Workshop on Policies for Distributed Systems and Networks (2004)

    Google Scholar 

  44. Kok, J.R., ’t Hoen, P.J., Bakker, B., Vlassis, N.: Utile coordination: learning interdependencies among cooperative agents. In: Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG), Colchester, United Kingdom, pp. 29–36 (April 2005)

    Google Scholar 

  45. Kok, J.R., Vlassis, N.: Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research 7, 1789–1828 (2006)

    MathSciNet  Google Scholar 

  46. Lekavy, M.: Optimising Multi-agent Cooperation using Evolutionary Algorithm. In: Bielikova, M. (ed.) Proceedings of IIT.SRC 2005: Student Research Conference in Informatics and Information Technologies, Bratislava, pp. 49–56. Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava (April 2005)

    Google Scholar 

  47. Littman, M.L., Ravi, N., Fenson, E., Howard, R.: Reinforcement learning for autonomic network repair. In: ICAC 2004: Proceedings of the First International Conference on Autonomic Computing, Washington, DC, USA, pp. 284–285. IEEE Computer Society, Los Alamitos (2004)

    Chapter  Google Scholar 

  48. Maniezzo, V., Gambardella, L.M., Luigi, F.D.: Ant Colony Optimization. In: New Optimization Techniques in Engineering. Springer, Heidelberg (2004)

    Google Scholar 

  49. Mariano, C., Morales, E.F.: A new distributed reinforcement learning algorithm for multiple objective optimization problems. In: Monard, M.C., Sichman, J.S. (eds.) SBIA 2000 and IBERAMIA 2000. LNCS (LNAI), vol. 1952, pp. 290–299. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  50. Melo, F., Veloso, M.: Learning of coordination: Exploiting sparse interactions in multiagent systems. In: Proceedings of the 8th International Conference on Autonomous Agents and Multi-Agent Systems (2009)

    Google Scholar 

  51. Mikami, S., Kakazu, Y.: Genetic reinforcement learning for cooperative traffic signal control. In: International Conference on Evolutionary Computation, pp. 223–228 (1994)

    Google Scholar 

  52. Montresor, A., Meling, H., Babaoglu, O.: Messor: Load-balancing through a swarm of autonomous agents. Technical Report UBLCS-02-08, Departement of Computer Science, University of Bologna, Bologna, Italy (May 2002)

    Google Scholar 

  53. Natarajan, S., Tadepalli, P.: Dynamic preferences in multi-criteria reinforcement learning. In: ICML 2005: Proceedings of the 22nd International Conference on Machine Learning, pp. 601–608. ACM, New York (2005)

    Chapter  Google Scholar 

  54. Oxford. The Oxford English Dictionary. Oxford University Press (2000)

    Google Scholar 

  55. Paquet, S., Bernier, N., Chaib-draa, B.: Multi-attribute decision making in a complex multiagent environment using reinforcement learning with selective perception. In: Tawfik, A.Y., Goodwin, S.D. (eds.) Canadian AI 2004. LNCS (LNAI), vol. 3060, pp. 416–421. Springer, Heidelberg (2004)

    Google Scholar 

  56. Parsopoulos, K.E., Vrahatis, M.N.: Particle swarm optimization method in multiobjective problems. In: SAC 2002: Proceedings of the 2002 ACM Symposium on Applied Computing, pp. 603–607. ACM, New York (2002)

    Chapter  Google Scholar 

  57. Perez, J., Germain-Renaud, C., Kegl, B., Loomis, C.: Grid differentiated services: A reinforcement learning approach. In: CCGRID 2008: Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid, Washington, DC, USA, pp. 287–294. IEEE Computer Society, Los Alamitos (2008)

    Chapter  Google Scholar 

  58. Peshkin, L., Eung Kim, K., Meuleau, N., Kaelbling, L.P.: Learning to cooperate via policy search. In: Proceedings of the 16th Annual Conference on Uncertainty in Artificial Intelligence (UAI 2000), pp. 489–496. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  59. Pugh, J., Zhang, Y., Martinoli, A.: Particle swarm optimization for unsupervised robotic learning. In: Swarm Intelligence Symposium, pp. 92–99 (2005)

    Google Scholar 

  60. Raicevic, P.: Parallel reinforcement learning using multiple reward signals. Neurocomputing 69(16-18), 2171–2179 (2006)

    Article  Google Scholar 

  61. Ramdane-Cherif, A.: Toward autonomic computing: Adaptive neural network for trajectory planning. International Journal of Cognitive Informatics and Natural Intelligence 1(2), 16–33 (2007)

    Google Scholar 

  62. Reyes-Sierra, M., Coello, C.A.C.: Multi-objective particle swarm optimizers: A survey of the state-of-the-art. International Journal of Computational Intelligence Research 2(3), 287–308 (2006)

    MathSciNet  Google Scholar 

  63. Richter, S.: Learning traffic control - towards practical traffic control using policy gradients. Technical report, Albert-Ludwigs-Universitat Freiburg (2006)

    Google Scholar 

  64. Rosenblatt, J.K.: Optimal selection of uncertain actions by maximizing expected utility. Autonomous Robots 9(1), 17–25 (2000)

    Article  Google Scholar 

  65. Russell, S., Norvig, P.: Aritifical Intelligence - A Modern Approach. Prentice Hall, Englewood Cliffs (2003)

    Google Scholar 

  66. Russell, S.J., Zimdars, A.: Q-decomposition for reinforcement learning agents. In: Fawcett, T., Mishra, N. (eds.) International Conference on Machine Learning, pp. 656–663. AAAI Press, Menlo Park (2003)

    Google Scholar 

  67. Salkham, A., Cunningham, R., Garg, A., Cahill, V.: A collaborative reinforcement learning approach to urban traffic control optimization. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 2, pp. 560–566 (2008)

    Google Scholar 

  68. Schneider, J., Wong, W.-K., Moore, A., Riedmiller, M.: Distributed value functions. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 371–378. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  69. Shelton, C.R.: Balancing multiple sources of reward in reinforcement learning. In: Neural Information Processing Systems, pp. 1082–1088 (2000)

    Google Scholar 

  70. Sprague, N., Ballard, D.: Multiple-goal reinforcement learning with modular Sarsa(0). In: International Joint Conference on Artificial Intelligence (2003)

    Google Scholar 

  71. Srinivasan, D., Choy, M.C., Cheu, R.L.: Neural networks for real-time traffic signal control. IEEE Transactions on Intelligent Transportation Systems 7(3), 261–272 (2006)

    Article  Google Scholar 

  72. Subramanian, D., Druschel, P., Chen, J.: Ants and reinforcement learning: A case study in routing in dynamic networks. In: IJCAI (2), pp. 832–838. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  73. Suton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book/The MIT Press, Cambridge (1998)

    Google Scholar 

  74. Tan, K.C., Lee, E.F.K., Heng, T.: Multiobjective Evolutionary Algorithms and Applications, Advanced Information and Knowledge Processing. Springer, New York (2005)

    Google Scholar 

  75. Tan, M.: Multi-agent reinforcement learning: Independent vs. cooperative agents. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 330–337. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  76. Tesauro, G.: Pricing in agent economies using neural networks and multi-agent Q-learning. In: Proceedings of Workshop ABS-3: Learning About, From and With other Agents (1999)

    Google Scholar 

  77. Tesauro, G.: Reinforcement learning in autonomic computing: A manifesto and case studies. IEEE Internet Computing 11(1), 22–30 (2007)

    Article  Google Scholar 

  78. Tesauro, G., Chess, D.M., Walsh, W.E., Das, R., Segal, A., Whalley, I., Kephart, J.O., White, S.R.: A multi-agent systems approach to autonomic computing. In: International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 464–471 (2004)

    Google Scholar 

  79. Tesauro, G., Das, R., Walsh, W.E., Kephart, J.O.: Utility-function-driven resource allocation in autonomic systems. In: International Conference on Autonomic Computing, pp. 342–343 (2005)

    Google Scholar 

  80. Tham, C.K., Prager, R.W.: A modular Q-learning architecture for manipulator task decomposition. In: Proceedings of the Eleventh International Conference on Machine Learning. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  81. Van Veldhuizen, D.A., Lamont, G.B.: Multiobjective evolutionary algorithms: Analyzing the state-of-the-art. Evolutionary Computation 8(2), 125–147 (2000)

    Article  Google Scholar 

  82. Vlassis, N.: A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence. Morgan and Claypool Publishers (2007)

    Google Scholar 

  83. Watkins, C.J.C.H., Dayan, P.: Technical note: Q-learning. Machine Learning 8(3), 279–292 (1992)

    MATH  Google Scholar 

  84. Weijters, A.J.M.M., Hoppenbrouwers, G.A.J.: Backpropagation networks for grapheme-phoneme conversion: a non-technical introduction. In: Artificial Neural Networks: An Introduction to ANN Theory and Practice, London, UK, pp. 11–36. Springer, Heidelberg (1995)

    Google Scholar 

  85. Yagan, D., Tham, C.-K.: Coordinated reinforcement learning for decentralized optimal control. In: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (2007)

    Google Scholar 

  86. Yang, Z., Chen, X., Tang, Y., Sun, J.: Intelligent cooperation control of urban traffic networks. In: Proceedings of 2005 International Conference on Machine Learning and Cybernetics, pp. 1482–1486 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dusparic, I., Cahill, V. (2010). Multi-policy Optimization in Self-organizing Systems. In: Weyns, D., Malek, S., de Lemos, R., Andersson, J. (eds) Self-Organizing Architectures. SOAR 2009. Lecture Notes in Computer Science, vol 6090. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14412-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14412-7_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14411-0

  • Online ISBN: 978-3-642-14412-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics