skip to main content
10.1145/1329125.1329368acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

A Q-decomposition and bounded RTDP approach to resource allocation

Published: 14 May 2007 Publication History

Abstract

This paper contributes to solve effectively stochastic resource allocation problems known to be NP-Complete. To address this complex resource management problem, a Q-decomposition approach is proposed when the resources which are already shared among the agents, but the actions made by an agent may influence the reward obtained by at least another agent. The Q-decomposition allows to coordinate these reward separated agents and thus permits to reduce the set of states and actions to consider. On the other hand, when the resources are available to all agents, no Q-decomposition is possible and we use heuristic search. In particular, the bounded Real-time Dynamic Programming (bounded RTDP) is used. Bounded RTDP concentrates the planning on significant states only and prunes the action space. The pruning is accomplished by proposing tight upper and lower bounds on the value function.

References

[1]
A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1):81--138, 1995.
[2]
A. Beynier and A. I. Mouaddib. An iterative algorithm for solving constrained decentralized markov decision processes. In Proceeding of the Twenty-First National Conference on Artificial Intelligence (AAAI-06), 2006.
[3]
B. Bonet and H. Geffner. Faster heuristic search algorithms for planning with uncertainty and full feedback. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), August 2003.
[4]
B. Bonet and H. Geffner. Labeled LRTDP approach: Improving the convergence of real-time dynamic programming. In Proceeding of the Thirteenth International Conference on Automated Planning & Scheduling (ICAPS-03), pages 12--21, Trento, Italy, 2003.
[5]
E. A. Hansen and S. Zilberstein. LAO* : A heuristic search algorithm that finds solutions with loops. Artificial Intelligence, 129(1--2):35--62, 2001.
[6]
H. B. McMahan, M. Likhachev, and G. J. Gordon. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In ICML '05: Proceedings of the Twenty-Second International Conference on Machine learning, pages 569--576, New York, NY, USA, 2005. ACM Press.
[7]
R. S. Pindyck and D. L. Rubinfeld. Microeconomics. Prentice Hall, 2000.
[8]
G. A. Rummery and M. Niranjan. On-line Q-learning using connectionist systems. Technical report CUED/FINFENG/TR 166, Cambridge University Engineering Department, 1994.
[9]
S. J. Russell and A. Zimdars. Q-decomposition for reinforcement learning agents. In ICML, pages 656--663, 2003.
[10]
S. Singh and D. Cohn. How to dynamically merge markov decision processes. In Advances in Neural Information Processing Systems, volume 10, pages 1057--1063, Cambridge, MA, USA, 1998. MIT Press.
[11]
T. Smith and R. Simmons. Focused real-time dynamic programming for MDPs: Squeezing more out of a heuristic. In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI), Boston, USA, 2006.
[12]
W. Zhang. Modeling and solving a resource allocation problem with soft constraint techniques. Technical report: WUCS-2002-13, Washington University, Saint-Louis, Missouri, 2002.

Cited By

View all
  • (2018)Model-Free Energy Optimization for Energy InternetEnergy Internet and We-Energy10.1007/978-981-13-0523-8_10(299-325)Online publication date: 13-Jul-2018
  • (2017)Distributed reinforcement learning to coordinate current sharing and voltage restoration for islanded DC microgridJournal of Modern Power Systems and Clean Energy10.1007/s40565-017-0323-y6:2(364-374)Online publication date: 23-Sep-2017
  • (2017)Multi-Agent Q( $$ \lambda $$ ) Learning for Optimal Operation Management of Energy InternetNeural Information Processing10.1007/978-3-319-70136-3_32(298-306)Online publication date: 26-Oct-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
May 2007
1585 pages
ISBN:9788190426275
DOI:10.1145/1329125
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • IFAAMAS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Q-decomposition
  2. heuristic search
  3. marginal revenue

Qualifiers

  • Research-article

Conference

AAMAS07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Model-Free Energy Optimization for Energy InternetEnergy Internet and We-Energy10.1007/978-981-13-0523-8_10(299-325)Online publication date: 13-Jul-2018
  • (2017)Distributed reinforcement learning to coordinate current sharing and voltage restoration for islanded DC microgridJournal of Modern Power Systems and Clean Energy10.1007/s40565-017-0323-y6:2(364-374)Online publication date: 23-Sep-2017
  • (2017)Multi-Agent Q( $$ \lambda $$ ) Learning for Optimal Operation Management of Energy InternetNeural Information Processing10.1007/978-3-319-70136-3_32(298-306)Online publication date: 26-Oct-2017
  • (2016)A multi-agent system reinforcement learning based optimal power flow for islanded microgrids2016 IEEE 16th International Conference on Environment and Electrical Engineering (EEEIC)10.1109/EEEIC.2016.7555840(1-6)Online publication date: Jun-2016
  • (2012)Multiagent-Based Reinforcement Learning for Optimal Reactive Power DispatchIEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews10.1109/TSMCC.2012.221859642:6(1742-1751)Online publication date: 1-Nov-2012
  • (2012)A novel collaboration and communication decision based on multi-agent in wireless sensor network2012 IEEE 14th International Conference on Communication Technology10.1109/ICCT.2012.6511262(459-463)Online publication date: Nov-2012
  • (2011)SSPQL: Stochastic shortest path-based Q-learningInternational Journal of Control, Automation and Systems10.1007/s12555-011-0215-29:2(328-338)Online publication date: 2-Apr-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media