Skip to main content
Log in

Efficient sampling in approximate dynamic programming algorithms

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Dynamic Programming (DP) is known to be a standard optimization tool for solving Stochastic Optimal Control (SOC) problems, either over a finite or an infinite horizon of stages. Under very general assumptions, commonly employed numerical algorithms are based on approximations of the cost-to-go functions, by means of suitable parametric models built from a set of sampling points in the d-dimensional state space. Here the problem of sample complexity, i.e., how “fast” the number of points must grow with the input dimension in order to have an accurate estimate of the cost-to-go functions in typical DP approaches such as value iteration and policy iteration, is discussed. It is shown that a choice of the sampling based on low-discrepancy sequences, commonly used for efficient numerical integration, permits to achieve, under suitable hypotheses, an almost linear sample complexity, thus contributing to mitigate the curse of dimensionality of the approximate DP procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)

    Google Scholar 

  2. Bellman, R., Dreyfus, S.: Applied Dynamic Programming. Princeton University Press, Princeton (1962)

    MATH  Google Scholar 

  3. Larson, R.E.: State Increment Dynamic Programming. Elsevier, New York (1968)

    MATH  Google Scholar 

  4. Puterman, M.: Markov Decision Processes. Wiley, New York (1994)

    MATH  Google Scholar 

  5. Bertsekas, D.: Dynamic Programming and Optimal Control, 2nd edn., vol. 1 Athena Scientific, Belmont (2000)

    Google Scholar 

  6. Jacobson, D., Mayne, D.: Differential Dynamic Programming. Academic, New York (1970)

    MATH  Google Scholar 

  7. Bellman, R., Kalaba, R., Kotkin, B.: Polynomial approximation—a new computational technique in dynamic programming allocation processes. Math. Comput. 17, 155–161 (1963)

    Article  MATH  MathSciNet  Google Scholar 

  8. Bertsekas, D.: Convergence of discretization procedures in dynamic programming. IEEE Trans. Autom. Control 20, 415–419 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  9. Foufoula-Georgiou, E., Kitanidis, P.: Gradient dynamic programming for stochastic optimal control of multidimensional water resources systems. Water Resour. Res. 24, 1345–1359 (1988)

    Google Scholar 

  10. Johnson, S., Stedinger, J., Shoemaker, C., Li, Y., Tejada-Guibert, J.: Numerical solution of continuous-state dynamic programs using linear and spline interpolation. Oper. Res. 41, 484–500 (1993)

    MATH  Google Scholar 

  11. Chow, C., Tsitsiklis, J.: An optimal multigrid algorithm for continuous state discrete time stochastic control. IEEE Trans. Autom. Control 36, 898–914 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  12. Chen, V., Ruppert, D., Shoemaker, C.: Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38–53 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  13. Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  14. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1995)

    MATH  Google Scholar 

  15. Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods. Methuen, London (1964)

    MATH  Google Scholar 

  16. Cervellera, C., Muselli, M.: Deterministic design for neural network learning: An approach based on discrepancy. IEEE Trans. Neural Netw. 15, 533–543 (2004)

    Article  Google Scholar 

  17. Cervellera, C., Chen, V.C., Wen, A.: Optimization of a large-scale water reservoir network by stochastic dynamic programming with efficient state space discretization. Eur. J. Oper. Res. 171(3), 1139–1151 (2006)

    Article  MATH  Google Scholar 

  18. Cervellera, C., Chen, V., Wen, A.: Neural network and regression spline value function approximations for stochastic dynamic programming. Comput. Oper. Res. 34(1), 70–90 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  19. Baglietto, M., Cervellera, C., Parisini, T., Sanguineti, M., Zoppoli, R.: Neural approximators, dynamic programming and stochastic approximation. In: Proc. 19th Am. Contr. Conf., pp. 3304–3308, 2000

  20. Zoppoli, R., Sanguineti, M., Parisini, T.: Approximating networks and extended Ritz method for the solution of functional optimization problems. J. Optim. Theory Appl. 112, 403–439 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  21. Fang, K.-T., Wang, Y.: Number-Theoretic Methods in Statistics. Chapman & Hall, London (1994)

    MATH  Google Scholar 

  22. Alon, N., Spencer, J.: The Probabilistic Method. Wiley, New York (2000)

    MATH  Google Scholar 

  23. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia (1992)

    MATH  Google Scholar 

  24. Barron, A.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  25. Niyogi, P., Girosi, F.: On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions. Neural Comput. 8, 819–842 (1996)

    Article  Google Scholar 

  26. Breiman, L.: Hinging hyperplanes for regression, classification and function approximation. IEEE Trans. Inf. Theory 39, 993–1013 (1993)

    Article  MathSciNet  Google Scholar 

  27. Stokey, N., Lucas, R., Prescott, E.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989)

    MATH  Google Scholar 

  28. Dudley, R.M.: Real Analysis and Probability. Wadsworth & Brooks/Cole, Pacific Grove (1989)

    MATH  Google Scholar 

  29. Bratley, P., Fox, B.L., Niederreiter, H.: Programs to generate Niederreiter’s low-discrepancy sequences. ACM Trans. Math. Softw. 20(4), 494–495 (1994)

    Article  MATH  Google Scholar 

  30. Chen, V.C.P., Tsui, K.-L., Barton, R.R., Allen, J.K.: A review of design and modeling in computer experiments. In: Rao, C.R., Khattree, R. (eds.) Handbook in Industrial Statistics, pp. 231–261. Elsevier, Amsterdam (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristiano Cervellera.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cervellera, C., Muselli, M. Efficient sampling in approximate dynamic programming algorithms. Comput Optim Appl 38, 417–443 (2007). https://doi.org/10.1007/s10589-007-9054-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-007-9054-8

Keywords

Navigation