Abstract
This article describes Distributed W-Learning (DWL), a reinforcement learning-based algorithm for collaborative agent-based optimization of pervasive systems. DWL supports optimization towards multiple heterogeneous policies and addresses the challenges arising from the heterogeneity of the agents that are charged with implementing them. DWL learns and exploits the dependencies between agents and between policies to improve overall system performance. Instead of always executing the locally-best action, agents learn how their actions affect their immediate neighbors and execute actions suggested by neighboring agents if their importance exceeds the local action's importance when scaled using a predefined or learned collaboration coefficient. We have evaluated DWL in a simulation of an Urban Traffic Control (UTC) system, a canonical example of the large-scale pervasive systems that we are addressing. We show that DWL outperforms widely deployed fixed-time and simple adaptive UTC controllers under a variety of traffic loads and patterns. Our results also confirm that enabling collaboration between agents is beneficial as is the ability for agents to learn the degree to which it is appropriate for them to collaborate. These results suggest that DWL is a suitable basis for optimization in other large-scale systems with similar characteristics.
- Abdulhai, B., Pringle, R., and Karakoulas, G. 2003. Reinforcement learning for the true adaptive traffic signal control. J. Trans. Engin. 129, 3, 278--285.Google ScholarCross Ref
- Bazzan, A. L. 2005. A distributed approach for coordination of traffic signal agents. Auton. Agents Multi-Agent Syst. 10, 1, 131--164. Google ScholarDigital Library
- Bernstein, D. S., Zilberstein, S., and Immerman, N. 2000. The complexity of decentralized control of markov decision processes. In Mathematics of Operations Research. Google ScholarDigital Library
- Cuayahuitl, H., Renals, S., Lemon, O., and Shimodaira, H. 2006. Learning multi-goal dialogue strategies using reinforcement learning with reduced state-action spaces. Int. J. Game Theory, 547--565.Google Scholar
- da Silva, B. C., Basso, E. W., Bazzan, A. L. C., and Engel, P. M. 2006. Dealing with non-stationary environments using context detection. In Proceedings of the 23rd International Conference on Machine Learning (ICML'06). ACM, New York, 217--224. Google ScholarDigital Library
- Dowling, J. 2005. The decentralised coordination of self-adaptive components for autonomic distributed systems. Ph.D. thesis, Trinity College Dublin.Google Scholar
- Dowling, J., Cunningham, R., Curran, E., and Cahill, V. 2006. Building autonomic systems using collaborative reinforcement learning. Knowl. Engin. Rev. 21, 3, 231--238. Google ScholarDigital Library
- Dusparic, I. and Cahill, V. 2009a. Distributed W-Learning: Multi-Policy optimization in self-organizing systems. In 3rd IEEE International Conference on Self-Adaptive and Self-Organizing Systems. Google ScholarDigital Library
- Dusparic, I. and Cahill, V. 2009b. Using reinforcement learning for multi-policy optimization in decentralized autonomic systems - An experimental evaluation. In Proceedings of the 6th International Conference on Autonomic and Trusted Computing, W. Reif, G. Wang, and J. Indulska, Eds. Lecture Notes in Computer Science, vol. 5586. Springer, 105--119. Google ScholarDigital Library
- Febbraro, A. D., Giglio, D., and Sacco, N. 2004. Urban traffic control structure based on hybrid petri nets. IEEE Trans. Intell. Trans. Syst. 5, 4, 224--237. Google ScholarDigital Library
- Guestrin, C., Lagoudakis, M., and Parr, R. 2002. Coordinated reinforcement learning. In Proceedings of the ICML-2002 the 19th International Conference on Machine Learning. 227--234. Google ScholarDigital Library
- Hoar, R., Penner, J., and Jacob, C. 2002. Evolutionary swarm traffic: If ant roads had traffic lights. In (CEC'02) Proceedings of the Evolutionary Computation (CEC '02). Proceedings of the 2002 Congress. IEEE Computer Society, Washington, DC, 1910--1915. Google ScholarDigital Library
- Humphrys, M. 1996a. Action selection methods using reinforcement learning. In Proceedings of the 4th International Conference on Simulation of Adaptive Behavior. MIT Press, 135--144.Google Scholar
- Humphrys, M. 1996b. Action selection methods using reinforcement learning. Ph.D. thesis, University of Cambridge.Google Scholar
- Kalyanakrishnan, S. and Stone, P. 2007. Batch reinforcement learning in a complex domain. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. ACM, New York, 650--657. Google ScholarDigital Library
- Kephart, J. O. and Chess, D. M. 2003. The vision of autonomic computing. Comput. 36, 1, 41--50. Google ScholarDigital Library
- Kok, J. R., 't Hoen, P. J., Bakker, B., and Vlassis, N. 2005. Utile coordination: Learning interdependencies among cooperative agents. In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG). 29--36.Google Scholar
- Littman, M. L., Ravi, N., Fenson, E., and Howard, R. 2004. Reinforcement learning for autonomic network repair. In Proceedings of the 1st International Conference on Autonomic Computing (ICAC'04). IEEE Computer Society, Washington, DC, 284--285. Google ScholarDigital Library
- Melo, F. and Veloso, M. 2009. Learning of coordination: Exploiting sparse interactions in multiagent systems. In Proceedings of the 8th International Conference on Autonomous Agents and Multi-Agent Systems. Google ScholarDigital Library
- Oliveira, E. and Duarte, N. 2005. Making way for emergency vehicles. In Proceedings of the European Simulation and Modelling Conference. 128--135.Google Scholar
- Perez, J., Germain-Renaud, C., Kegl, B., and Loomis, C. 2008. Grid differentiated services: A reinforcement learning approach. In Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGRID '08). IEEE Computer Society, Washington, DC, 287--294. Google ScholarDigital Library
- Prothmann, H., Rochner, F., Tomforde, S., Branke, J., Müller-Schloer, C., and Schmeck, H. 2008. Organic control of traffic lights. In Proceedings of the 5th International Conference on Autonomic and Trusted Computing (ATC '08). Springer, 219--233. Google ScholarDigital Library
- Reynolds, V., Cahill, V., and Senart, A. 2006. Requirements for an ubiquitous computing simulation and emulation environment. In Proceedings of the InterSense '06 Conference. ACM, New York. Google ScholarDigital Library
- Richter, S. 2006. Learning traffic control - Towards practical traffic control using policy gradients. Tech. rep., Albert-Ludwigs-Universitat Freiburg.Google Scholar
- Richter, S., Aberdeen, D., and Yu, J. 2007. Natural actor-critic for road traffic optimisation. Adv. Neural Inf. Process. Syst. 19. The MIT Press, Cambridge, MA.Google Scholar
- Salkham, A. and Cahill, V. 2010. Soilse: A decentralized approach to optimization of fluctuating urban traffic using reinforcement learning. In 13th International IEEE Conference on Intelligent Transportation System (ITSC '10).Google Scholar
- Salkham, A., Cunningham, R., Garg, A., and Cahill, V. 2008. A collaborative reinforcement learning approach to urban traffic control optimization. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). Vol. 2. 560--566. Google ScholarDigital Library
- Schneider, J., Wong, W.-K., Moore, A., and Riedmiller, M. 1999. Distributed value functions. In Proceedings of the 16th International Conference on Machine Learning. Morgan Kaufmann, 371--378. Google ScholarDigital Library
- Suton, R. S. and Barto, A. G. 1998. Reinforcement Learning: An Introduction. A Bradford Book. The MIT Press, Cambridge, MA. Google ScholarDigital Library
- Sycara, K. 1998. Multiagent systems. AI Mag. 19, 2.Google Scholar
- Tan, M. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the 10th International Conference on Machine Learning. Morgan Kaufmann, 330--337.Google ScholarCross Ref
- Tesauro, G. 2007. Reinforcement learning in autonomic computing: A manifesto and case studies. IEEE Internet Comput. 11, 1, 22--30. Google ScholarDigital Library
- Tesauro, G., Chess, D. M., Walsh, W. E., Das, R., Segal, A., Whalley, I., Kephart, J. O., and White, S. R. 2004. A multi-agent systems approach to autonomic computing. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. 464--471. Google ScholarDigital Library
- Tesauro, G., Das, R., Walsh, W. E., and Kephart, J. O. 2005. Utility-Function-Driven resource allocation in autonomic systems. In Proceedings of the International Conference on Autonomic Computing. 342--343. Google ScholarDigital Library
- Tesauro, G., Jong, N. K., Das, R., and Bennani, M. N. 2006. A hybrid reinforcement learning approach to autonomic resource allocation. In Proceedings of the IEEE International Conference on Autonomic Computing (ICAC '06). IEEE Computer Society, Washington, DC, 65--73. Google ScholarDigital Library
- Watkins, C. J. C. H. and Dayan, P. 1992. Technical note: Q-learning. Mach. Learn. 8, 3, 279--292. Google ScholarDigital Library
- Wiering, M., van Veenen, J., Vreeken, J., and Koopman, A. 2004. Intelligent traffic light control. Tech. rep., Institute of Information and Computing Sciences, Utrecht University.Google Scholar
- Yang, Z., Chen, X., Tang, Y., and Sun, J. 2005. Intelligent cooperation control of urban traffic networks. In Proceedings of the International Conference on Machine Learning and Cybernetics. 1482--1486.Google Scholar
Index Terms
- Autonomic multi-policy optimization in pervasive systems: Overview and evaluation
Recommendations
Using distributed w-learning for multi-policy optimization in decentralized autonomic systems
ICAC '09: Proceedings of the 6th international conference on Autonomic computingDistributed W-Learning (DWL) is a reinforcement learning-based algorithm for multi-policy optimization in agent-based systems. In this poster we propose the use of DWL for decentralized multi-policy optimization in autonomic systems. Using DWL agents ...
Multi-policy optimization in decentralized autonomic systems
AAMAS '09: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2This paper addresses the challenge of multi-policy optimization in decentralized autonomic systems. We evaluate several multi-policy reinforcement learning-based optimization techniques in an urban traffic control simulation, a canonical example of a ...
Policy Adaptive Multi-agent Deep Deterministic Policy Gradient
PRIMA 2020: Principles and Practice of Multi-Agent SystemsAbstractWe propose a novel approach to address one aspect of the non-stationarity problem in multi-agent reinforcement learning (RL), where the other agents may alter their policies due to environment changes during execution. This violates the Markov ...
Comments