Perspectives of approximate dynamic programming

Powell, Warren B.

doi:10.1007/s10479-012-1077-6

Perspectives of approximate dynamic programming

Published: 07 February 2012

Volume 241, pages 319–356, (2016)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Warren B. Powell¹

2985 Accesses
43 Citations
Explore all metrics

Abstract

Approximate dynamic programming has evolved, initially independently, within operations research, computer science and the engineering controls community, all searching for practical tools for solving sequential stochastic optimization problems. More so than other communities, operations research continued to develop the theory behind the basic model introduced by Bellman with discrete states and actions, even while authors as early as Bellman himself recognized its limits due to the “curse of dimensionality” inherent in discrete state spaces. In response to these limitations, subcommunities in computer science, control theory and operations research have developed a variety of methods for solving different classes of stochastic, dynamic optimization problems, creating the appearance of a jungle of competing approaches. In this article, we show that there is actually a common theme to these strategies, and underpinning the entire field remains the fundamental algorithmic strategies of value and policy iteration that were first introduced in the 1950’s and 60’s.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Article 09 April 2023

Evolutionary algorithms and their applications to engineering problems

Article Open access 16 March 2020

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

References

Barto, A. G., Sutton, R. S., & Brouwer, P. (1981). Associative search network: A reinforcement learning associative memory. Biological Cybernetics, 40(3), 201–211.
Article Google Scholar
Barto, A., Sutton, R. S., & Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13(5), 834–846.
Article Google Scholar
Bellman, R. E. (1957). Dynamic programming. Princeton: Princeton University Press.
Google Scholar
Bellman, R. E. (1971). Introduction to the mathematical theory of control processes (Vol. II). New York: Academic Press.
Google Scholar
Bellman, R. E., & Dreyfus, S. (1959). Functional approximations and dynamic programming. Mathematical Tables and Other Aids To Computation, 13, 247–251.
Article Google Scholar
Bertsekas, D. P. (2011a). Approximate dynamic programming. In Dynamic programming and optimal control (Vol. II, 3rd ed.). Belmont: Athena Scientific, Chap. 6.
Google Scholar
Bertsekas, D. P. (2011b). Approximate policy iteration: A survey and some new methods, Journal of Control Theory and Applications, 9(3), 310–335.
Article Google Scholar
Bertsekas, D. P., & Castanon, D. A. (1999). Rollout algorithms for stochastic scheduling problems. Journal of Heuristics, 5, 89–108.
Article Google Scholar
Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont: Athena Scientific.
Google Scholar
Birge, J. R., & Louveaux, F. (1997). Introduction to stochastic programming. New York: Springer.
Google Scholar
Boesel, J., Nelson, B., & Kim, S. (2003). Using ranking and selection to “clean up” after simulation optimization. Operations Research, 51(5), 814–825.
Article Google Scholar
Bradtke, S. J., & Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(1), 33–57.
Google Scholar
Burnetas, A., & Katehakis, M. N. (1997). Optimal adaptive policies for Markov decision processes. Mathematics of Operations Research, 22(1), 222–225.
Article Google Scholar
Cheung, R. K.-M., & Powell, W. B. (1996). An algorithm for multistage dynamic networks with random arc capacities, with an application to dynamic fleet management. Operations Research, 44, 951–963.
Article Google Scholar
Chick, S. E., & Gans, N. (2009). Economic analysis of simulation selection problems. Management Science, 55(3), 421–437.
Article Google Scholar
Dantzig, G. (1955). Linear programming under uncertainty. Management Science, 1, 197–206.
Article Google Scholar
Dantzig, G., & Ferguson, A. (1956). The allocation of aircrafts to routes: An example of linear programming under uncertain demand. Management Science, 3, 45–73.
Article Google Scholar
Denardo, E. V. (1982). Dynamic programming. Englewood Cliffs: Prentice-Hall.
Google Scholar
Derman, C. (1962). On sequential decisions and Markov chains. Management Science, 9(1), 16–24.
Article Google Scholar
Derman, C. (1966). Denumerable state Markovian decision processes-average cost criterion. Annals of Mathematical Statistics, 37(6), 1545–1553.
Article Google Scholar
Derman, C. (1970). Finite state Markovian decision processes. New York: Academic Press.
Google Scholar
Dreyfus, S., & Law, A. M. (1977). The art and theory of dynamic programming. New York: Academic Press.
Google Scholar
Dupaçová, J., Consigli, G., & Wallace, S. W. (2000). Scenarios for multistage stochastic programs. Annals of Operations Research, 100, 25–53.
Article Google Scholar
Dupacova, J. (1995). Multistage stochastic programs—the state of the art and selected bibliography. Kybernetica, 31, 151–174.
Google Scholar
Dynkin, E. B., & Yushkevich, A. A. (1979). Controlled Markov processes. In A series of comprehensive studies in mathematics: Vol. 235. Grundlehren der mathematischen Wissenschaften. New York: Springer.
Google Scholar
Enders, J., Powell, W. B., & Egan, D. M. (2010). Robust policies for the transformer acquisition and allocation problem. Energy Systems, 1(3), 245–272.
Article Google Scholar
Frazier, P. I., Powell, W. B., & Dayanik, S. (2008). A knowledge gradient policy for sequential information collection. SIAM Journal on Control and Optimization, 47(5), 2410–2439.
Article Google Scholar
Frazier, P. I., Powell, W. B., & Dayanik, S. (2009). The knowledge-gradient policy for correlated normal beliefs. INFORMS Journal on Computing, 21(4), 599–613.
Article Google Scholar
George, A., Powell, W. B., & Kulkarni, S. (2008). Value function approximation using multiple aggregation for multiattribute resource management. Journal of Machine Learning Research, 9, 2079–2111.
Google Scholar
Gittins, J., Glazebrook, K., & Weber, R. R. (2011). Multi-armed bandit allocation indices. New York: Wiley.
Book Google Scholar
Growe-Kuska, N., Heitsch, H., & Romisch, W. (2003). Scenario reduction and scenario tree construction for power management problems. In A. Borghetti, C. A. Nucci, & M. Paolone (Eds.), IEEE Bologna power tech proceedings.
Google Scholar
Gupta, S., & Miescke, K. (1996). Bayesian look ahead one-stage sampling allocations for selection of the best population. Journal of Statistical Planning and Inference, 54(2), 229–244.
Article Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference and prediction. New York: Springer.
Book Google Scholar
Haykin, S. (1999). Neural networks: A comprehensive foundation. New York: Prentice Hall.
Google Scholar
Heyman, D. P., & Sobel, M. (1984). Stochastic models in operations research. Stochastic optimization (Vol. II). New York: McGraw-Hill.
Google Scholar
Higle, J., & Sen, S. (1996). Stochastic decomposition: A statistical method for large scale stochastic linear programming. Dordrecht: Kluwer Academic.
Book Google Scholar
Howard, R. A. (1960). Dynamic programming and Markov process. Cambridge: MIT Press.
Google Scholar
Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), 1185–1201.
Article Google Scholar
Judd, K. L. (1998). Numerical methods in economics. Cambridge: MIT Press.
Google Scholar
Kall, P., & Wallace, S. (1994). Stochastic programming. New York: Wiley.
Google Scholar
Katehakis, M. N., & Derman, C. (1986). Computing optimal sequential allocation rules in clinical trials. In Lecture notes monograph series (Vol. 8, pp. 29–39). New York: JSTOR.
Google Scholar
Katehakis, M. N., & Robbins, H. (1995). Sequential choice from several populations. Proceedings of the National Academy of Sciences of the United States of America, 92, 8584–8585.
Article Google Scholar
Katehakis, M. N., & Veinott, A. F. (1987). The multi-armed bandit problem: decomposition and computation. Mathematics of Operations Research, 12(2), 262–268.
Article Google Scholar
Kaut, M., & Wallace, S. W. (2003). Evaluation of scenario-generation methods for stochastic programming, Stochastic programming e-print series.
Kushner, H. J., & Yin, G. G. (2003). Stochastic approximation and recursive algorithms and applications. Berlin: Springer.
Google Scholar
Law, A., & Kelton, W. (1991). Simulation modeling and analysis (Vol. 2). New York: McGraw-Hill.
Google Scholar
Lewis, F., Jagannathan, S., & Yesildirek, A. (1999). Neural network control of robot manipulators and nonlinear systems. New York: CRC Press.
Google Scholar
Lewis, F. L., & Syrmos, V. L. (1995). Optimal control. Hoboken: Wiley-Interscience.
Google Scholar
Lewis, F. L., & Vrabie, D. (2009). Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine, 9(3), 32–50.
Article Google Scholar
Maei, H. R., Szepesvari, C., Bhatnagar, S., & Sutton, R. S. (2010). Toward off-policy learning control with function approximation. In ICML-2010.
Google Scholar
Negoescu, D. M., Frazier, P. I., & Powell, W. B. (2011). The knowledge-gradient algorithm for sequencing experiments in drug discovery. INFORMS Journal on Computing, 23(3), 346–363.
Article Google Scholar
Nemhauser, G. L. (1966). Introduction to dynamic programming. New York: Wiley.
Google Scholar
Powell, W., & Ryzhov, I. (2012). Optimal learning. Hoboken: Wiley.
Book Google Scholar
Powell, W. B. (1987). An operational planning model for the dynamic vehicle allocation problem with uncertain demands. Transportation Research, 21B, 217–232.
Article Google Scholar
Powell, W. B. (2007). Approximate dynamic programming: Solving the curses of dimensionality. Hoboken: Wiley.
Book Google Scholar
Powell, W. B. (2010). Merging AI and OR to solve high-dimensional stochastic optimization problems using approximate dynamic programming. INFORMS Journal on Computing, 22(1), 2–17.
Article Google Scholar
Powell, W. B. (2011). Approximate dynamic programming: Solving the curses of dimensionality (2nd. ed.) Hoboken: Wiley.
Book Google Scholar
Powell, W. B., & Frantzeskakis, L. F. (1990). A successive linear approximation procedure for stochastic dynamic vehicle allocation problems. Transportation Science, 24, 40–57.
Article Google Scholar
Powell, W. B., & Godfrey, G. (2002). An adaptive dynamic programming algorithm for dynamic fleet management, I: Single period travel times. Transportation Science, 36(1), 21–39.
Article Google Scholar
Powell, W. B., & Ma, J. (2011). A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications. Journal of Control Theory and Applications, 9(3), 336–352.
Article Google Scholar
Powell, W. B., & Simão, H. (2009). Approximate dynamic programming for management of high-value spare parts. Journal of Manufacturing Technology and Management, 20(2), 147–160.
Article Google Scholar
Powell, W. B., & Topaloglu, H. (2005). Fleet management. In S. Wallace & W. Ziemba (Eds.), SIAM series in optimization. Applications of stochastic programming (pp. 185–216). Philadelphia: Math Programming Society.
Chapter Google Scholar
Powell, W. B., & Van Roy, B. (2004). Approximate dynamic programming for high dimensional resource allocation problems. In J. Si, A. G. Barto, W. B. Powell, & D. W. II (Eds.), Handbook of learning and approximate dynamic programming. New York: IEEE Press.
Google Scholar
Powell, W. B., George, A., Lamont, A., & Stewart, J. (2011). SMART: A stochastic multiscale model for the analysis of energy resources, technology and policy. INFORMS Journal on Computing. http://dx.doi.org/10.1287/ijoc.1110.0470.
Puterman, M. L. (1994). Markov decision processes (1st ed.). Hoboken: Wiley.
Book Google Scholar
Puterman, M. L. (2005). Markov decision processes (2nd ed.). Hoboken: Wiley.
Google Scholar
Robbins, H., & Monro, S. (1951). A stochastic approximation method. Annals of Mathematical Statistics, 22(3), 400–407.
Article Google Scholar
Romisch, W., & Heitsch, H. (2009). Scenario tree modeling for multistage stochastic programs. Mathematical Programming, 118, 371–406.
Article Google Scholar
Ross, S. (1983). Introduction to stochastic dynamic programming. New York: Academic Press.
Google Scholar
Ryzhov, I., & Powell, W. B. (2011). Bayesian active learning with basis functions. In 2011 IEEE symposium series on computational intelligence, No 3. Paris: IEEE Press.
Google Scholar
Ryzhov, I., Frazier, P. I., & Powell, W. B. (2012). Stepsize selection for approximate value iteration and a new optimal stepsize rule (Technical report). Department of Operations Research and Financial Engineering, Princeton University.
Ryzhov, I. O., Powell, W. B., & Frazier, P. I. (n.d.). The knowledge gradient algorithm for a general class of online learning problems.
Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3, 211–229.
Article Google Scholar
Sen, S., & Higle, J. (1999). An introductory tutorial on stochastic linear programming models. Interfaces, 29(2), 33–6152.
Article Google Scholar
Si, J., & Wang, Y. T. (2001). Online learning control by association and reinforcement. IEEE Transactions on Neural Networks, 12(2), 264–276.
Article Google Scholar
Si, J., Barto, A. G., Powell, W. B., & Wunsch, D. (2004). Handbook of learning and approximate dynamic programming. New York: Wiley-IEEE Press.
Book Google Scholar
Silver, D. (2009). Reinforcement learning and simulation-based search in computer go. PhD thesis, University of Alberta.
Simao, H. P., Day, J., George, A. P., Gifford, T., Powell, W. B., & Nienow, J. (2009). An approximate dynamic programming algorithm for large-scale fleet management: A case application. Transportation Science, 43(2), 178–197.
Article Google Scholar
Simao, H. P., George, A., Powell, W. B., Gifford, T., Nienow, J., & Day, J. (2010). Approximate dynamic programming captures fleet operations for Schneider national. Interfaces, 40(5), 1–11.
Article Google Scholar
Spall, J. C. (2003). Introduction to stochastic search and optimization: Estimation, simulation and control. Hoboken: Wiley.
Book Google Scholar
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44.
Google Scholar
Sutton, R. S., & Barto, A. G. (1981). Toward a modern theory of adaptive networks. Psychological Review, 88(2), 135–170.
Article Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning (Vol. 35). Cambridge: MIT Press.
Google Scholar
Sutton, R. S., Maei, H. R., Precup, D., Bhatnagar, S., Silver, D., Szepesvari, C., & Wiewiora, E. (2009a). Fast gradient-descent methods for temporal-difference learning with linear function approximation. In Proceedings of the 26th annual international conference on machine learning—ICML’09 (pp. 1–8). New York: ACM Press.
Chapter Google Scholar
Sutton, R. S., Szepesvari, C., & Maei, H. (2009b). A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation. In Advances in neural information processing systems (Vol. 21, pp. 1609–1616). Princeton: Citeseer.
Google Scholar
Topaloglu, H., & Powell, W. B. (2006). Dynamic programming approximations for stochastic, time-staged integer multicommodity flow problems. INFORMS Journal on Computing, 18, 31–42.
Article Google Scholar
Tsitsiklis, J. N. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16, 185–202.
Google Scholar
Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674–690.
Article Google Scholar
Van Roy, B., Bertsekas, D. P., Lee, Y., & Tsitsiklis, J. N. (1997). A neuro-dynamic programming approach to retailer inventory management. In Proceedings of the IEEE conference on decision and control (Vol. 4, pp. 4052–4057).
Chapter Google Scholar
Venayagamoorthy, G., & Harley, R. (2002). Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator. IEEE Transactions on Neural Networks, 13(3), 764–773.
Article Google Scholar
Wang, F.-Y., Zhang, H., & Liu, D. (2009). Adaptive dynamic programming: An introduction. IEEE Computational Intelligence Magazine, May, 39–47.
Article Google Scholar
Watkins, C. (1989). Learning from delayed rewards. PhD thesis, Kings College, Cambridge, England.
Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292.
Google Scholar
Werbos, P. J. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University.
Werbos, P. J. (1989). Backpropagation and neurocontrol: A review and prospectus, Neural Networks, 209–216.
Werbos, P. J. (1990). Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks, 3, 179–189.
Article Google Scholar
Werbos, P. J. (1992a). Approximate dynamic programming for real-time control and neural modelling. In D. J. White & D. A. Sofge (Eds.), Handbook of intelligent control: Neural, fuzzy, and adaptive approaches.
Google Scholar
Werbos, P. J. (1992b). Neurocontrol and supervised learning: An overview and valuation. In D. A. White & D. A. Sofge (Eds.), Handbook of intelligent control: Neural, fuzzy, and adaptive approaches.
Google Scholar
Werbos, P. J., Miller, W. T., & Sutton, R. S. (Eds.) (1990). Neural networks for control. Cambridge: MIT Press.
Google Scholar
White, D. J. (1969). Dynamic programming. San Francisco: Holden-Day.
Google Scholar
Wu, T., Powell, W. B., & Whisman, A. (2009). The optimizing-simulator: An illustration using the military airlift problem. ACM Transactions on Modeling and Simulation, 19(3), 1–31.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, 08544, USA
Warren B. Powell

Authors

Warren B. Powell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Warren B. Powell.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Powell, W.B. Perspectives of approximate dynamic programming. Ann Oper Res 241, 319–356 (2016). https://doi.org/10.1007/s10479-012-1077-6

Download citation

Published: 07 February 2012
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10479-012-1077-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Perspectives of approximate dynamic programming

Abstract

Access this article

Similar content being viewed by others

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Evolutionary algorithms and their applications to engineering problems

A practical guide to multi-objective reinforcement learning and planning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Perspectives of approximate dynamic programming

Abstract

Access this article

Similar content being viewed by others

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Evolutionary algorithms and their applications to engineering problems

A practical guide to multi-objective reinforcement learning and planning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation