(Approximate) iterated successive approximations algorithm for sequential decision processes

Canbolat, Pelin G.; Rothblum, Uriel G.

doi:10.1007/s10479-012-1073-x

(Approximate) iterated successive approximations algorithm for sequential decision processes

Published: 08 February 2012

Volume 208, pages 309–320, (2013)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Pelin G. Canbolat¹ &
Uriel G. Rothblum¹

363 Accesses
2 Citations
Explore all metrics

Abstract

The paper proves the convergence of (Approximate) Iterated Successive Approximations Algorithm for solving infinite-horizon sequential decision processes satisfying the monotone contraction assumption. At every stage of this algorithm, the value function at hand is used as a terminal reward to determine an (approximately) optimal policy for the one-period problem. This policy is then iterated for a (finite or infinite) number of times and the resulting return function is used as the starting value function for the next stage of the scheme. This method generalizes the standard successive approximations, policy iteration and Denardo’s generalization of the latter.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MIDAS: A mixed integer dynamic approximation scheme

Article 08 February 2019

A. B. Philpott, F. Wahid & J. F. Bonnans

Discrete Approximation and Convergence Analysis for a Class of Decision-Dependent Two-Stage Stochastic Linear Programs

Article 28 October 2022

Jie Jiang & Zhi-Ping Chen

On the number of stages in multistage stochastic programs

Article 26 February 2019

Giovanni Pantuso & Trine K. Boomsma

References

Bertsekas, D. P., & Yu, H. (2010). Distributed asynchronous policy iteration in dynamic programming. In Communication, control, and computing, 2010 annual allerton conference (pp. 1368–1375).
Chapter Google Scholar
Bertsekas, D. P., & Yu, H. (2012). Q-learning and enhanced policy iteration in discounted dynamic programming. Mathematics of Operations Research, 37(1), 66–94.
Article Google Scholar
Dembo, R. S., & Haviv, M. (1984). Truncated policy iteration methods. Operations Research Letters, 3(5), 243–246.
Article Google Scholar
Denardo, E. V. (1967). Contraction mappings in the theory underlying dynamic programming. SIAM Review, 9(2), 165–177.
Article Google Scholar
Denardo, E. V., & Rothblum, U. G. (1983). Affine structure and invariant policies for dynamic programming. Mathematics of Operations Research, 8(3), 342–365.
Article Google Scholar
Haviv, M. (1985). Block-successive approximation for a discounted Markov decision model. Stochastic Processes and Their Applications, 19(1), 151–160.
Article Google Scholar
Heyman, D. P., & Sobel, M. J. (1984). Stochastic optimization: Vol. II. Stochastic models in operations research. New York: McGraw-Hill.
Google Scholar
Howard, R. A. (1960). Dynamic programming and Markov processes. Cambridge: MIT Press.
Google Scholar
Kallenberg, L. (2002). Finite state and action MDPs. In E. A. Feinberg & A. Shwartz (Eds.), Handbook of Markov decision processes: Methods and applications. Norwell: Kluwer Academic.
Google Scholar
Porteus, E. L. (1971). Some bounds for discounted sequential decision processes. Management Science, 18(1), 7–11.
Article Google Scholar
Porteus, E. L. (1980). Improved iterative computation of the expected discounted return in Markov and semi-Markov chains. Zeitschrift Für Operations-Research, 24, 155–170.
Google Scholar
Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.
Book Google Scholar
Puterman, M. L., & Brumelle, S. L. (1979). On the convergence of policy iteration in stationary dynamic programming. Mathematics of Operations Research, 4(1), 60–69.
Article Google Scholar
Puterman, M. L., & Shin, M. C. (1978). Modified policy iteration algorithms for discounted Markov decision problems. Management Science, 24(11), 1127–1137.
Article Google Scholar
Puterman, M. L., & Shin, M. C. (1982). Action elimination procedures for modified policy iteration algorithms. Operations Research, 30(2), 301–318.
Article Google Scholar
Rothblum, U. G. (1979). Iterated successive approximation for sequential decision processes. In J. W. B. van Overhagen & H. C. Tijms (Eds.), Stochastic control and optimization, Amsterdam (pp. 30–32).
Google Scholar
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44.
Google Scholar
Van der Wal, J. (1978). Discounted Markov games: generalized policy iteration method. Journal of Optimization Theory and Applications, 25(1), 125–138.
Article Google Scholar
Van Nunen, J. A. E. E. (1976a). A set of successive approximation methods for discounted Markovian decision problems. Zeitschrift Für Operations-Research, 20, 203–208.
Google Scholar
Van Nunen, J. A. E. E. (1976b). Contracting Markov decision processes. Mathematical Centre Tract No. 71, Amsterdam, Holland.
Watkins, C. J. C. H. (1989). Learning from delayed rewards. Ph.D. Thesis, University of Cambridge, England.
Whitt, W. (1978). Approximations of dynamic programs, I. Mathematics of Operations Research, 3(3), 231–243.
Article Google Scholar

Download references

Acknowledgements

The conjecture of convergence of the Iterated Successive Approximations Algorithm in the context of arbitrary sequential decision processes was raised in discussions with Jo van Nunen in 1979. The paper benefited greatly from detailed comments on Rothblum (1979) by Jo van Nunen and Jan van der Wal, who suggested the strengthening of an earlier form of Theorem 1 to its present form. The authors also thank Martin Puterman for a thorough discussion of this work. This work was partially supported by the Daniel Rose Yale University-Technion Initiative for Research on Homeland Security and Counter-Terrorism.

Author information

Authors and Affiliations

Faculty of Industrial Engineering and Management, The Technion—Israel Institute of Technology, Haifa, 32000, Israel
Pelin G. Canbolat & Uriel G. Rothblum

Authors

Pelin G. Canbolat
View author publications
You can also search for this author in PubMed Google Scholar
Uriel G. Rothblum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Uriel G. Rothblum.

Additional information

An early version of this paper appeared as an unpublished manuscript in 1979, an abstract of that paper appeared as Rothblum (1979) and is referenced in Dembo and Haviv (1984), Haviv (1985), Heyman and Sobel (1984), Kallenberg (2002), Porteus (1980) and Puterman (1994).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Canbolat, P.G., Rothblum, U.G. (Approximate) iterated successive approximations algorithm for sequential decision processes. Ann Oper Res 208, 309–320 (2013). https://doi.org/10.1007/s10479-012-1073-x

Download citation

Published: 08 February 2012
Issue Date: September 2013
DOI: https://doi.org/10.1007/s10479-012-1073-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

(Approximate) iterated successive approximations algorithm for sequential decision processes

Abstract

Access this article

Similar content being viewed by others

MIDAS: A mixed integer dynamic approximation scheme

Discrete Approximation and Convergence Analysis for a Class of Decision-Dependent Two-Stage Stochastic Linear Programs

On the number of stages in multistage stochastic programs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

(Approximate) iterated successive approximations algorithm for sequential decision processes

Abstract

Access this article

Similar content being viewed by others

MIDAS: A mixed integer dynamic approximation scheme

Discrete Approximation and Convergence Analysis for a Class of Decision-Dependent Two-Stage Stochastic Linear Programs

On the number of stages in multistage stochastic programs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation