skip to main content
10.1145/1015330.1015440acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

P3VI: a partitioned, prioritized, parallel value iterator

Published: 04 July 2004 Publication History

Abstract

We present an examination of the state-of-the-art for using value iteration to solve large-scale discrete Markov Decision Processes. We introduce an architecture which combines three independent performance enhancements (the intelligent prioritization of computation, state partitioning, and massively parallel processing) into a single algorithm. We show that each idea improves performance in a different way, meaning that algorithm designers do not have to trade one improvement for another. We give special attention to parallelization issues, discussing how to efficiently partition states, distribute partitions to processors, minimize message passing and ensure high scalability. We present experimental results which demonstrate that this approach solves large problems in reasonable time.

References

[1]
Charles J. Alpert. Multi-way graph and hypergraph partitioning. PhD thesis, UCLA, Los Angeles, California, 1996.
[2]
Andrew G. Barto, Steven J. Bradtke, and Satinder P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1):81--138, 1995.
[3]
Dimitri P. Bertsekas. Distributed dynamic programming. IEEE Transactions on Automatic Control, 27:610--616, 1982.
[4]
George Karypis and Vipin Kumar. Parallel Multilevel k-way Partitioning Schemes for Irregular Graphs. Technical Report 036, Minneapolis, MN 55454, May 1996.
[5]
Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49:209--232.
[6]
Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13:103--130, 1993.
[7]
Remi Muños and Andrew W. Moore. Variable resolution discretization in optimal control. Machine Learning, 49 (2-3):291--323, 2002.
[8]
Martin L. Puterman. Markov Decision Processes---Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc., New York, NY, 1994.
[9]
Martin L. Puterman and M. C. Shin. Modified policy iteration algorithms for discounted markov decision problems. Management Science, pages 1127--1137, 1978.
[10]
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
[11]
David Wingate and Kevin Seppi. Efficient value iteration using partitioned models. In International Conference on Machine Learning and Applications, pages 53--59, 2003.

Cited By

View all
  • (2023)PcTVI: Parallel MDP Solver Using a Decomposition into Independent ChainsClassification and Data Science in the Digital Age10.1007/978-3-031-09034-9_12(101-109)Online publication date: 8-Dec-2023
  • (2020)Value Iteration on Multicore Processors2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)10.1109/ISSPIT51521.2020.9408773(1-7)Online publication date: 9-Dec-2020
  • (2018)Focused Crawling Through Reinforcement LearningWeb Engineering10.1007/978-3-319-91662-0_20(261-278)Online publication date: 20-May-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '04: Proceedings of the twenty-first international conference on Machine learning
July 2004
934 pages
ISBN:1581138385
DOI:10.1145/1015330
  • Conference Chair:
  • Carla Brodley
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 July 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Asynchronous dynamic programming
  2. reinforcement learning
  3. value iteration

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)PcTVI: Parallel MDP Solver Using a Decomposition into Independent ChainsClassification and Data Science in the Digital Age10.1007/978-3-031-09034-9_12(101-109)Online publication date: 8-Dec-2023
  • (2020)Value Iteration on Multicore Processors2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)10.1109/ISSPIT51521.2020.9408773(1-7)Online publication date: 9-Dec-2020
  • (2018)Focused Crawling Through Reinforcement LearningWeb Engineering10.1007/978-3-319-91662-0_20(261-278)Online publication date: 20-May-2018
  • (2012)Planning with Markov Decision Processes: An AI PerspectiveSynthesis Lectures on Artificial Intelligence and Machine Learning10.2200/S00426ED1V01Y201206AIM0176:1(1-210)Online publication date: 30-Jun-2012
  • (2012)A parallel scheduling algorithm for reinforcement learning in large state spaceFrontiers of Computer Science10.1007/s11704-012-1098-yOnline publication date: 10-Nov-2012
  • (2011)Robot-Assisted Needle Steering Using a Control Theoretic ApproachJournal of Intelligent and Robotic Systems10.1007/s10846-010-9455-262:3-4(397-418)Online publication date: 1-Jun-2011
  • (2008)Parallel Reinforcement Learning with Linear Function ApproximationAdaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning10.1007/978-3-540-77949-0_5(60-74)Online publication date: 2008
  • (2005)Parallel reinforcement learning with linear function approximationProceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning10.5555/1898681.1898686(60-74)Online publication date: 1-Jan-2005
  • (2005)Prioritized Multiplicative Schwarz Procedures for Solving Linear SystemsProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 0110.1109/IPDPS.2005.359Online publication date: 4-Apr-2005

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media