research-article

Autonomic multi-policy optimization in pervasive systems: Overview and evaluation

Authors:
Ivana Dusparic

Trinity College Dublin

Trinity College Dublin
View Profile

,
Vinny Cahill

Trinity College Dublin

Trinity College Dublin
View Profile

ACM Transactions on Autonomous and Adaptive Systems Volume 7 Issue 1Article No.: 11pp 1–25https://doi.org/10.1145/2168260.2168271

Published:04 May 2012Publication History

ACM Transactions on Autonomous and Adaptive Systems

Abstract

This article describes Distributed W-Learning (DWL), a reinforcement learning-based algorithm for collaborative agent-based optimization of pervasive systems. DWL supports optimization towards multiple heterogeneous policies and addresses the challenges arising from the heterogeneity of the agents that are charged with implementing them. DWL learns and exploits the dependencies between agents and between policies to improve overall system performance. Instead of always executing the locally-best action, agents learn how their actions affect their immediate neighbors and execute actions suggested by neighboring agents if their importance exceeds the local action's importance when scaled using a predefined or learned collaboration coefficient. We have evaluated DWL in a simulation of an Urban Traffic Control (UTC) system, a canonical example of the large-scale pervasive systems that we are addressing. We show that DWL outperforms widely deployed fixed-time and simple adaptive UTC controllers under a variety of traffic loads and patterns. Our results also confirm that enabling collaboration between agents is beneficial as is the ability for agents to learn the degree to which it is appropriate for them to collaborate. These results suggest that DWL is a suitable basis for optimization in other large-scale systems with similar characteristics.

References

Abdulhai, B., Pringle, R., and Karakoulas, G. 2003. Reinforcement learning for the true adaptive traffic signal control. J. Trans. Engin. 129, 3, 278--285.Google ScholarCross Ref
Bazzan, A. L. 2005. A distributed approach for coordination of traffic signal agents. Auton. Agents Multi-Agent Syst. 10, 1, 131--164. Google ScholarDigital Library
Bernstein, D. S., Zilberstein, S., and Immerman, N. 2000. The complexity of decentralized control of markov decision processes. In Mathematics of Operations Research. Google ScholarDigital Library
Cuayahuitl, H., Renals, S., Lemon, O., and Shimodaira, H. 2006. Learning multi-goal dialogue strategies using reinforcement learning with reduced state-action spaces. Int. J. Game Theory, 547--565.Google Scholar
da Silva, B. C., Basso, E. W., Bazzan, A. L. C., and Engel, P. M. 2006. Dealing with non-stationary environments using context detection. In Proceedings of the 23rd International Conference on Machine Learning (ICML'06). ACM, New York, 217--224. Google ScholarDigital Library
Dowling, J. 2005. The decentralised coordination of self-adaptive components for autonomic distributed systems. Ph.D. thesis, Trinity College Dublin.Google Scholar
Dowling, J., Cunningham, R., Curran, E., and Cahill, V. 2006. Building autonomic systems using collaborative reinforcement learning. Knowl. Engin. Rev. 21, 3, 231--238. Google ScholarDigital Library
Dusparic, I. and Cahill, V. 2009a. Distributed W-Learning: Multi-Policy optimization in self-organizing systems. In 3rd IEEE International Conference on Self-Adaptive and Self-Organizing Systems. Google ScholarDigital Library
Dusparic, I. and Cahill, V. 2009b. Using reinforcement learning for multi-policy optimization in decentralized autonomic systems - An experimental evaluation. In Proceedings of the 6th International Conference on Autonomic and Trusted Computing, W. Reif, G. Wang, and J. Indulska, Eds. Lecture Notes in Computer Science, vol. 5586. Springer, 105--119. Google ScholarDigital Library
Febbraro, A. D., Giglio, D., and Sacco, N. 2004. Urban traffic control structure based on hybrid petri nets. IEEE Trans. Intell. Trans. Syst. 5, 4, 224--237. Google ScholarDigital Library
Guestrin, C., Lagoudakis, M., and Parr, R. 2002. Coordinated reinforcement learning. In Proceedings of the ICML-2002 the 19th International Conference on Machine Learning. 227--234. Google ScholarDigital Library
Hoar, R., Penner, J., and Jacob, C. 2002. Evolutionary swarm traffic: If ant roads had traffic lights. In (CEC'02) Proceedings of the Evolutionary Computation (CEC '02). Proceedings of the 2002 Congress. IEEE Computer Society, Washington, DC, 1910--1915. Google ScholarDigital Library
Humphrys, M. 1996a. Action selection methods using reinforcement learning. In Proceedings of the 4th International Conference on Simulation of Adaptive Behavior. MIT Press, 135--144.Google Scholar
Humphrys, M. 1996b. Action selection methods using reinforcement learning. Ph.D. thesis, University of Cambridge.Google Scholar
Kalyanakrishnan, S. and Stone, P. 2007. Batch reinforcement learning in a complex domain. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. ACM, New York, 650--657. Google ScholarDigital Library
Kephart, J. O. and Chess, D. M. 2003. The vision of autonomic computing. Comput. 36, 1, 41--50. Google ScholarDigital Library
Kok, J. R., 't Hoen, P. J., Bakker, B., and Vlassis, N. 2005. Utile coordination: Learning interdependencies among cooperative agents. In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG). 29--36.Google Scholar
Littman, M. L., Ravi, N., Fenson, E., and Howard, R. 2004. Reinforcement learning for autonomic network repair. In Proceedings of the 1st International Conference on Autonomic Computing (ICAC'04). IEEE Computer Society, Washington, DC, 284--285. Google ScholarDigital Library
Melo, F. and Veloso, M. 2009. Learning of coordination: Exploiting sparse interactions in multiagent systems. In Proceedings of the 8th International Conference on Autonomous Agents and Multi-Agent Systems. Google ScholarDigital Library
Oliveira, E. and Duarte, N. 2005. Making way for emergency vehicles. In Proceedings of the European Simulation and Modelling Conference. 128--135.Google Scholar
Perez, J., Germain-Renaud, C., Kegl, B., and Loomis, C. 2008. Grid differentiated services: A reinforcement learning approach. In Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGRID '08). IEEE Computer Society, Washington, DC, 287--294. Google ScholarDigital Library
Prothmann, H., Rochner, F., Tomforde, S., Branke, J., Müller-Schloer, C., and Schmeck, H. 2008. Organic control of traffic lights. In Proceedings of the 5th International Conference on Autonomic and Trusted Computing (ATC '08). Springer, 219--233. Google ScholarDigital Library
Reynolds, V., Cahill, V., and Senart, A. 2006. Requirements for an ubiquitous computing simulation and emulation environment. In Proceedings of the InterSense '06 Conference. ACM, New York. Google ScholarDigital Library
Richter, S. 2006. Learning traffic control - Towards practical traffic control using policy gradients. Tech. rep., Albert-Ludwigs-Universitat Freiburg.Google Scholar
Richter, S., Aberdeen, D., and Yu, J. 2007. Natural actor-critic for road traffic optimisation. Adv. Neural Inf. Process. Syst. 19. The MIT Press, Cambridge, MA.Google Scholar
Salkham, A. and Cahill, V. 2010. Soilse: A decentralized approach to optimization of fluctuating urban traffic using reinforcement learning. In 13th International IEEE Conference on Intelligent Transportation System (ITSC '10).Google Scholar
Salkham, A., Cunningham, R., Garg, A., and Cahill, V. 2008. A collaborative reinforcement learning approach to urban traffic control optimization. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). Vol. 2. 560--566. Google ScholarDigital Library
Schneider, J., Wong, W.-K., Moore, A., and Riedmiller, M. 1999. Distributed value functions. In Proceedings of the 16th International Conference on Machine Learning. Morgan Kaufmann, 371--378. Google ScholarDigital Library
Suton, R. S. and Barto, A. G. 1998. Reinforcement Learning: An Introduction. A Bradford Book. The MIT Press, Cambridge, MA. Google ScholarDigital Library
Sycara, K. 1998. Multiagent systems. AI Mag. 19, 2.Google Scholar
Tan, M. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the 10th International Conference on Machine Learning. Morgan Kaufmann, 330--337.Google ScholarCross Ref
Tesauro, G. 2007. Reinforcement learning in autonomic computing: A manifesto and case studies. IEEE Internet Comput. 11, 1, 22--30. Google ScholarDigital Library
Tesauro, G., Chess, D. M., Walsh, W. E., Das, R., Segal, A., Whalley, I., Kephart, J. O., and White, S. R. 2004. A multi-agent systems approach to autonomic computing. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. 464--471. Google ScholarDigital Library
Tesauro, G., Das, R., Walsh, W. E., and Kephart, J. O. 2005. Utility-Function-Driven resource allocation in autonomic systems. In Proceedings of the International Conference on Autonomic Computing. 342--343. Google ScholarDigital Library
Tesauro, G., Jong, N. K., Das, R., and Bennani, M. N. 2006. A hybrid reinforcement learning approach to autonomic resource allocation. In Proceedings of the IEEE International Conference on Autonomic Computing (ICAC '06). IEEE Computer Society, Washington, DC, 65--73. Google ScholarDigital Library
Watkins, C. J. C. H. and Dayan, P. 1992. Technical note: Q-learning. Mach. Learn. 8, 3, 279--292. Google ScholarDigital Library
Wiering, M., van Veenen, J., Vreeken, J., and Koopman, A. 2004. Intelligent traffic light control. Tech. rep., Institute of Information and Computing Sciences, Utrecht University.Google Scholar
Yang, Z., Chen, X., Tang, Y., and Sun, J. 2005. Intelligent cooperation control of urban traffic networks. In Proceedings of the International Conference on Machine Learning and Cybernetics. 1482--1486.Google Scholar

Index Terms

Autonomic multi-policy optimization in pervasive systems: Overview and evaluation
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Multi-agent systems
2. Information systems
  1. Information retrieval
    1. Search engine architectures and scalability
      1. Distributed retrieval
      2. Peer-to-peer retrieval
  2. Information storage systems
    1. Storage architectures
      1. Distributed storage

Recommendations

Using distributed w-learning for multi-policy optimization in decentralized autonomic systems
ICAC '09: Proceedings of the 6th international conference on Autonomic computing

Distributed W-Learning (DWL) is a reinforcement learning-based algorithm for multi-policy optimization in agent-based systems. In this poster we propose the use of DWL for decentralized multi-policy optimization in autonomic systems. Using DWL agents ...
Read More
Multi-policy optimization in decentralized autonomic systems
AAMAS '09: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2

This paper addresses the challenge of multi-policy optimization in decentralized autonomic systems. We evaluate several multi-policy reinforcement learning-based optimization techniques in an urban traffic control simulation, a canonical example of a ...
Read More
Policy Adaptive Multi-agent Deep Deterministic Policy Gradient
PRIMA 2020: Principles and Practice of Multi-Agent Systems
Abstract
We propose a novel approach to address one aspect of the non-stationarity problem in multi-agent reinforcement learning (RL), where the other agents may alter their policies due to environment changes during execution. This violates the Markov ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Autonomous and Adaptive Systems Volume 7, Issue 1
Special section on formal methods in pervasive computing, pervasive adaptation, and self-adaptive systems: Models and algorithms
April 2012
365 pages
ISSN:1556-4665
EISSN:1556-4703
DOI:10.1145/2168260
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 May 2012
- Accepted: 1 May 2011
- Revised: 1 November 2010
- Received: 1 January 2010
Published in taas Volume 7, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Autonomic computing
decentralized systems
reinforcement learning
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 529
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Autonomic multi-policy optimization in pervasive systems: Overview and evaluation

ACM Transactions on Autonomous and Adaptive Systems

Abstract

References

Cited By

Index Terms

Recommendations

Using distributed w-learning for multi-policy optimization in decentralized autonomic systems

Multi-policy optimization in decentralized autonomic systems

Policy Adaptive Multi-agent Deep Deterministic Policy Gradient

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Autonomic multi-policy optimization in pervasive systems: Overview and evaluation

ACM Transactions on Autonomous and Adaptive Systems

Abstract

References

Cited By

Index Terms

Recommendations

Using distributed w-learning for multi-policy optimization in decentralized autonomic systems

Multi-policy optimization in decentralized autonomic systems

Policy Adaptive Multi-agent Deep Deterministic Policy Gradient

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media