research-article

A Q-decomposition and bounded RTDP approach to resource allocation

Authors:

Pierrick Plamondon,

Brahim Chaib-draa,

Abder Rezak BenaskeurAuthors Info & Claims

AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems

Article No.: 200, Pages 1 - 8

https://doi.org/10.1145/1329125.1329368

Published: 14 May 2007 Publication History

Get Access

Abstract

This paper contributes to solve effectively stochastic resource allocation problems known to be NP-Complete. To address this complex resource management problem, a Q-decomposition approach is proposed when the resources which are already shared among the agents, but the actions made by an agent may influence the reward obtained by at least another agent. The Q-decomposition allows to coordinate these reward separated agents and thus permits to reduce the set of states and actions to consider. On the other hand, when the resources are available to all agents, no Q-decomposition is possible and we use heuristic search. In particular, the bounded Real-time Dynamic Programming (bounded RTDP) is used. Bounded RTDP concentrates the planning on significant states only and prunes the action space. The pruning is accomplished by proposing tight upper and lower bounds on the value function.

References

[1]

A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1):81--138, 1995.

Digital Library

Google Scholar

[2]

A. Beynier and A. I. Mouaddib. An iterative algorithm for solving constrained decentralized markov decision processes. In Proceeding of the Twenty-First National Conference on Artificial Intelligence (AAAI-06), 2006.

Digital Library

Google Scholar

[3]

B. Bonet and H. Geffner. Faster heuristic search algorithms for planning with uncertainty and full feedback. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), August 2003.

Digital Library

Google Scholar

[4]

B. Bonet and H. Geffner. Labeled LRTDP approach: Improving the convergence of real-time dynamic programming. In Proceeding of the Thirteenth International Conference on Automated Planning & Scheduling (ICAPS-03), pages 12--21, Trento, Italy, 2003.

Google Scholar

[5]

E. A. Hansen and S. Zilberstein. LAO* : A heuristic search algorithm that finds solutions with loops. Artificial Intelligence, 129(1--2):35--62, 2001.

Digital Library

Google Scholar

[6]

H. B. McMahan, M. Likhachev, and G. J. Gordon. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In ICML '05: Proceedings of the Twenty-Second International Conference on Machine learning, pages 569--576, New York, NY, USA, 2005. ACM Press.

Digital Library

Google Scholar

[7]

R. S. Pindyck and D. L. Rubinfeld. Microeconomics. Prentice Hall, 2000.

Google Scholar

[8]

G. A. Rummery and M. Niranjan. On-line Q-learning using connectionist systems. Technical report CUED/FINFENG/TR 166, Cambridge University Engineering Department, 1994.

Google Scholar

[9]

S. J. Russell and A. Zimdars. Q-decomposition for reinforcement learning agents. In ICML, pages 656--663, 2003.

Google Scholar

[10]

S. Singh and D. Cohn. How to dynamically merge markov decision processes. In Advances in Neural Information Processing Systems, volume 10, pages 1057--1063, Cambridge, MA, USA, 1998. MIT Press.

Digital Library

Google Scholar

[11]

T. Smith and R. Simmons. Focused real-time dynamic programming for MDPs: Squeezing more out of a heuristic. In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI), Boston, USA, 2006.

Digital Library

Google Scholar

[12]

W. Zhang. Modeling and solving a resource allocation problem with soft constraint techniques. Technical report: WUCS-2002-13, Washington University, Saint-Louis, Missouri, 2002.

Google Scholar

Cited By

View all

Sun QSun Q(2018)Model-Free Energy Optimization for Energy InternetEnergy Internet and We-Energy10.1007/978-981-13-0523-8_10(299-325)Online publication date: 13-Jul-2018
https://doi.org/10.1007/978-981-13-0523-8_10
LIU ZLUO YZHUO RJIN X(2017)Distributed reinforcement learning to coordinate current sharing and voltage restoration for islanded DC microgridJournal of Modern Power Systems and Clean Energy10.1007/s40565-017-0323-y6:2(364-374)Online publication date: 23-Sep-2017
https://doi.org/10.1007/s40565-017-0323-y
Yang LSun QHan Y(2017)Multi-Agent Q( $$ \lambda $$ ) Learning for Optimal Operation Management of Energy InternetNeural Information Processing10.1007/978-3-319-70136-3_32(298-306)Online publication date: 26-Oct-2017
https://doi.org/10.1007/978-3-319-70136-3_32
Show More Cited By

Index Terms

A Q-decomposition and bounded RTDP approach to resource allocation
1. Computing methodologies
  1. Artificial intelligence
2. Theory of computation
  1. Design and analysis of algorithms
    1. Algorithm design techniques
      1. Dynamic programming

Recommendations

A Q-decomposition LRTDP Approach to Resource Allocation
IAT '06: Proceedings of the IEEE/WIC/ACM international conference on Intelligent Agent Technology

This paper contributes to solve effectively stochastic resource allocation problems known to be NP-Complete. To address this complex resource management problem, the merging of two approaches is made: The Q-decomposition model, which coordinates reward ...
Frontier-Based RTDP: A New Approach to Solving the Robotic Adversarial Coverage Problem
AAMAS '15: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems

Area coverage is an important problem in robotics, where one or more robots are required to visit all points in a given area. In this paper we consider a recently introduced version of the problem, adversarial coverage, in which the covering robot ...
Improving the Convergence of RTDP
FSKD '07: Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 01

Real-Time Dynamic Programming (RTDP) is an outstanding real-time algorithm for solving non- deterministic planning problems with full observability. RTDP has two key advantages comparing with other DP algorithms: first, it obtain an optimal policy ...

Comments

Information & Contributors

Information

Published In

AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems

May 2007

1585 pages

ISBN:9788190426275

DOI:10.1145/1329125

Conference Chairs:
Edmund Durfee
University of Michigan
,
Makoto Yokoo
Kyushu University
,
Program Chairs:
Michael Huhns
University of South Carolina
,
Onn Shehory
IBM Haifa Research Lab, Israel

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AAMAS07

Sponsor:

AAMAS07: International Conference on Autonomous Agents and Mulitagent Systems

May 14 - 18, 2007

Hawaii, Honolulu

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
118
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Sun QSun Q(2018)Model-Free Energy Optimization for Energy InternetEnergy Internet and We-Energy10.1007/978-981-13-0523-8_10(299-325)Online publication date: 13-Jul-2018
https://doi.org/10.1007/978-981-13-0523-8_10
LIU ZLUO YZHUO RJIN X(2017)Distributed reinforcement learning to coordinate current sharing and voltage restoration for islanded DC microgridJournal of Modern Power Systems and Clean Energy10.1007/s40565-017-0323-y6:2(364-374)Online publication date: 23-Sep-2017
https://doi.org/10.1007/s40565-017-0323-y
Yang LSun QHan Y(2017)Multi-Agent Q( $$ \lambda $$ ) Learning for Optimal Operation Management of Energy InternetNeural Information Processing10.1007/978-3-319-70136-3_32(298-306)Online publication date: 26-Oct-2017
https://doi.org/10.1007/978-3-319-70136-3_32
Sanseverino EDi Silvestre MMineo LFavuzza SNguyen NTran Q(2016)A multi-agent system reinforcement learning based optimal power flow for islanded microgrids2016 IEEE 16th International Conference on Environment and Electrical Engineering (EEEIC)10.1109/EEEIC.2016.7555840(1-6)Online publication date: Jun-2016
https://doi.org/10.1109/EEEIC.2016.7555840
Xu YZhang WLiu WFerrese F(2012)Multiagent-Based Reinforcement Learning for Optimal Reactive Power DispatchIEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews10.1109/TSMCC.2012.221859642:6(1742-1751)Online publication date: 1-Nov-2012
https://dl.acm.org/doi/10.1109/TSMCC.2012.2218596
Peng Li Yanbin Zheng (2012)A novel collaboration and communication decision based on multi-agent in wireless sensor network2012 IEEE 14th International Conference on Communication Technology10.1109/ICCT.2012.6511262(459-463)Online publication date: Nov-2012
https://doi.org/10.1109/ICCT.2012.6511262
Kwon WSuh ILee S(2011)SSPQL: Stochastic shortest path-based Q-learningInternational Journal of Control, Automation and Systems10.1007/s12555-011-0215-29:2(328-338)Online publication date: 2-Apr-2011
https://doi.org/10.1007/s12555-011-0215-2

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

A Q-decomposition LRTDP Approach to Resource Allocation

Frontier-Based RTDP: A New Approach to Solving the Robotic Adversarial Coverage Problem

Improving the Convergence of RTDP

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations