Article

P3VI: a partitioned, prioritized, parallel value iterator

Authors:

David Wingate,

Kevin D. SeppiAuthors Info & Claims

ICML '04: Proceedings of the twenty-first international conference on Machine learning

Page 109

https://doi.org/10.1145/1015330.1015440

Published: 04 July 2004 Publication History

Get Access

Abstract

We present an examination of the state-of-the-art for using value iteration to solve large-scale discrete Markov Decision Processes. We introduce an architecture which combines three independent performance enhancements (the intelligent prioritization of computation, state partitioning, and massively parallel processing) into a single algorithm. We show that each idea improves performance in a different way, meaning that algorithm designers do not have to trade one improvement for another. We give special attention to parallelization issues, discussing how to efficiently partition states, distribute partitions to processors, minimize message passing and ensure high scalability. We present experimental results which demonstrate that this approach solves large problems in reasonable time.

References

[1]

Charles J. Alpert. Multi-way graph and hypergraph partitioning. PhD thesis, UCLA, Los Angeles, California, 1996.

Digital Library

Google Scholar

[2]

Andrew G. Barto, Steven J. Bradtke, and Satinder P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1):81--138, 1995.

Digital Library

Google Scholar

[3]

Dimitri P. Bertsekas. Distributed dynamic programming. IEEE Transactions on Automatic Control, 27:610--616, 1982.

Crossref

Google Scholar

[4]

George Karypis and Vipin Kumar. Parallel Multilevel k-way Partitioning Schemes for Irregular Graphs. Technical Report 036, Minneapolis, MN 55454, May 1996.

Digital Library

Google Scholar

[5]

Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49:209--232.

Digital Library

Google Scholar

[6]

Andrew W. Moore and Christopher G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13:103--130, 1993.

Crossref

Google Scholar

[7]

Remi Muños and Andrew W. Moore. Variable resolution discretization in optimal control. Machine Learning, 49 (2-3):291--323, 2002.

Digital Library

Google Scholar

[8]

Martin L. Puterman. Markov Decision Processes---Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc., New York, NY, 1994.

Digital Library

Google Scholar

[9]

Martin L. Puterman and M. C. Shin. Modified policy iteration algorithms for discounted markov decision problems. Management Science, pages 1127--1137, 1978.

Google Scholar

[10]

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.

Digital Library

Google Scholar

[11]

David Wingate and Kevin Seppi. Efficient value iteration using partitioned models. In International Conference on Machine Learning and Applications, pages 53--59, 2003.

Google Scholar

Cited By

View all

Gareau JBeaudry ÉMakarenkov V(2023)PcTVI: Parallel MDP Solver Using a Decomposition into Independent ChainsClassification and Data Science in the Digital Age10.1007/978-3-031-09034-9_12(101-109)Online publication date: 8-Dec-2023
https://doi.org/10.1007/978-3-031-09034-9_12
Jain ASahni S(2020)Value Iteration on Multicore Processors2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)10.1109/ISSPIT51521.2020.9408773(1-7)Online publication date: 9-Dec-2020
https://doi.org/10.1109/ISSPIT51521.2020.9408773
Han MWuillemin PSenellart P(2018)Focused Crawling Through Reinforcement LearningWeb Engineering10.1007/978-3-319-91662-0_20(261-278)Online publication date: 20-May-2018
https://doi.org/10.1007/978-3-319-91662-0_20
Show More Cited By

Recommendations

Parallel Hierarchical Pre-Gauss-Seidel Value Iteration Algorithm

The standard Value Iteration VI algorithm, referred to as Value Iteration Pre-Jacobi PJ-VI algorithm, is the simplest Value Iteration scheme, and the well-known algorithm for solving Markov Decision Processes MDPs. In the literature, several versions of ...
On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning

Approximate value iteration is a simple algorithm that combats the curse of dimensionality in dynamic programs by approximating iterates of the classical value iteration algorithm in a spirit reminiscent of statistical regression. Each iteration of this ...
Factored value iteration converges

In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one, the least-...

Comments

Information & Contributors

Information

Published In

ICML '04: Proceedings of the twenty-first international conference on Machine learning

July 2004

934 pages

ISBN:1581138385

DOI:10.1145/1015330

Conference Chair:
Carla Brodley
Purdue University/Tufts University

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 July 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
173
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Gareau JBeaudry ÉMakarenkov V(2023)PcTVI: Parallel MDP Solver Using a Decomposition into Independent ChainsClassification and Data Science in the Digital Age10.1007/978-3-031-09034-9_12(101-109)Online publication date: 8-Dec-2023
https://doi.org/10.1007/978-3-031-09034-9_12
Jain ASahni S(2020)Value Iteration on Multicore Processors2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)10.1109/ISSPIT51521.2020.9408773(1-7)Online publication date: 9-Dec-2020
https://doi.org/10.1109/ISSPIT51521.2020.9408773
Han MWuillemin PSenellart P(2018)Focused Crawling Through Reinforcement LearningWeb Engineering10.1007/978-3-319-91662-0_20(261-278)Online publication date: 20-May-2018
https://doi.org/10.1007/978-3-319-91662-0_20
Mausam Kolobov A(2012)Planning with Markov Decision Processes: An AI PerspectiveSynthesis Lectures on Artificial Intelligence and Machine Learning10.2200/S00426ED1V01Y201206AIM0176:1(1-210)Online publication date: 30-Jun-2012
https://doi.org/10.2200/S00426ED1V01Y201206AIM017
Liu QYang XJing LLi JLi J(2012)A parallel scheduling algorithm for reinforcement learning in large state spaceFrontiers of Computer Science10.1007/s11704-012-1098-yOnline publication date: 10-Nov-2012
https://doi.org/10.1007/s11704-012-1098-y
Asadian AKermani MPatel R(2011)Robot-Assisted Needle Steering Using a Control Theoretic ApproachJournal of Intelligent and Robotic Systems10.1007/s10846-010-9455-262:3-4(397-418)Online publication date: 1-Jun-2011
https://dl.acm.org/doi/10.1007/s10846-010-9455-2
Grounds MKudenko D(2008)Parallel Reinforcement Learning with Linear Function ApproximationAdaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning10.1007/978-3-540-77949-0_5(60-74)Online publication date: 2008
https://doi.org/10.1007/978-3-540-77949-0_5
Grounds MKudenko D(2005)Parallel reinforcement learning with linear function approximationProceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning10.5555/1898681.1898686(60-74)Online publication date: 1-Jan-2005
https://dl.acm.org/doi/10.5555/1898681.1898686
Wingate DPowell NSnell QSeppi K(2005)Prioritized Multiplicative Schwarz Procedures for Solving Linear SystemsProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 0110.1109/IPDPS.2005.359Online publication date: 4-Apr-2005
https://dl.acm.org/doi/10.1109/IPDPS.2005.359

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Recommendations

Parallel Hierarchical Pre-Gauss-Seidel Value Iteration Algorithm

On the Existence of Fixed Points for Approximate Value Iteration and Temporal-Difference Learning

Factored value iteration converges

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations