Skip to main content
Log in

Dynamic Programming and Value-Function Approximation in Sequential Decision Problems: Error Analysis and Numerical Results

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

Value-function approximation is investigated for the solution via Dynamic Programming (DP) of continuous-state sequential N-stage decision problems, in which the reward to be maximized has an additive structure over a finite number of stages. Conditions that guarantee smoothness properties of the value function at each stage are derived. These properties are exploited to approximate such functions by means of certain nonlinear approximation schemes, which include splines of suitable order and Gaussian radial-basis networks with variable centers and widths. The accuracies of suboptimal solutions obtained by combining DP with these approximation tools are estimated. The results provide insights into the successful performances appeared in the literature about the use of value-function approximators in DP. The theoretical analysis is applied to a problem of optimal consumption, with simulation results illustrating the use of the proposed solution methodology. Numerical comparisons with classical linear approximators are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. When the decision horizon goes to infinity.

  2. Functions constant along hyperplanes are known as ridge functions. Each ridge function results from the composition of a multivariable function having a particularly simple form, i.e., the inner product, with an arbitrary function dependent on a single variable.

  3. Note that [55, Corollary 3.2] uses “\(\operatorname{ess\,sup}\)” instead of “sup” in (41). However, by the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168], one can replace “\(\operatorname{ess\,sup}\)” with “sup”.

  4. Unfortunately, [55, Corollary 3.2] does not provide neither a closed-form expression of C 1, nor an upper bound on it. For results similar to [55, Corollary 3.2] and for specific choices of ψ, [55] gives upper bounds on similar constants (see, e.g., [55, Theorem 2.3 and Corollary 3.3]).

References

  1. Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  2. Bertsekas, D.P., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  3. Powell, W.B.: Approximate Dynamic Programming—Solving the Curses of Dimensionality. Wiley, Hoboken (2007)

    Book  MATH  Google Scholar 

  4. Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.): Handbook of Learning and Approximate Dynamic Programming. IEEE Press, New York (2004)

    Google Scholar 

  5. Zoppoli, R., Parisini, T., Sanguineti, M., Gnecco, G.: Neural Approximations for Optimal Control and Decision. Springer, London (2012, in preparation)

  6. Haykin, S.: Neural Networks: a Comprehensive Foundation. Prentice Hall, New York (1998)

    Google Scholar 

  7. Bertsekas, D.P.: Dynamic Programming and Optimal Control vol. 1. Athena Scientific, Belmont (2005)

    MATH  Google Scholar 

  8. Bellman, R., Dreyfus, S.: Functional approximations and dynamic programming. Math. Tables Other Aids Comput. 13, 247–251 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bellman, R., Kalaba, R., Kotkin, B.: Polynomial approximation—a new computational technique in dynamic programming. Math. Comput. 17, 155–161 (1963)

    MathSciNet  MATH  Google Scholar 

  10. Foufoula-Georgiou, E., Kitanidis, P.K.: Gradient dynamic programming for stochastic optimal control of multidimensional water resources systems. Water Resour. Res. 24, 1345–1359 (1988)

    Article  Google Scholar 

  11. Johnson, S., Stedinger, J., Shoemaker, C., Li, Y., Tejada-Guibert, J.: Numerical solution of continuous-state dynamic programs using linear and spline interpolation. Oper. Res. 41, 484–500 (1993)

    Article  MATH  Google Scholar 

  12. Chen, V.C.P., Ruppert, D., Shoemaker, C.A.: Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38–53 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  13. Cervellera, C., Muselli, M.: Efficient sampling in approximate dynamic programming algorithms. Comput. Optim. Appl. 38, 417–443 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  14. Philbrick, C.R. Jr., Kitanidis, P.K.: Improved dynamic programming methods for optimal control of lumped-parameter stochastic systems. Oper. Res. 49, 398–412 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  15. Judd, K.: Numerical Methods in Economics. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  16. Kůrková, V., Sanguineti, M.: Comparison of worst-case errors in linear and neural network approximation. IEEE Trans. Inf. Theory 48, 264–275 (2002)

    Article  MATH  Google Scholar 

  17. Tesauro, G.: Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992)

    MATH  Google Scholar 

  18. Gnecco, G., Sanguineti, M., Gaggero, M.: Suboptimal solutions to team optimization problems with stochastic information structure. SIAM J. Optim. 22, 212–243 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  19. Tsitsiklis, J.N., Roy, B.V.: Feature-based methods for large scale dynamic programming. Mach. Learn. 22, 59–94 (1996)

    MATH  Google Scholar 

  20. Zoppoli, R., Sanguineti, M., Parisini, T.: Approximating networks and extended Ritz method for the solution of functional optimization problems. J. Optim. Theory Appl. 112, 403–439 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  21. Alessandri, A., Gaggero, M., Zoppoli, R.: Feedback optimal control of distributed parameter systems by using finite-dimensional approximation schemes. IEEE Trans. Neural Netw. Learn. Syst. 23(6), 984–996 (2012)

    Article  Google Scholar 

  22. Stokey, N.L., Lucas, R.E., Prescott, E.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989)

    MATH  Google Scholar 

  23. Bertsekas, D.P.: Dynamic Programming and Optimal Control vol. 2. Athena Scientific, Belmont (2007)

    Google Scholar 

  24. White, D.J.: Markov Decision Processes. Wiley, New York (1993)

    MATH  Google Scholar 

  25. Puterman, M.L., Shin, M.C.: Modified policy iteration algorithms for discounted Markov decision processes. Manag. Sci. 41, 1127–1137 (1978)

    Article  MathSciNet  Google Scholar 

  26. Altman, E., Nain, P.: Optimal control of the M/G/1 queue with repeated vacations of the server. IEEE Trans. Autom. Control 38, 1766–1775 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lendaris, G.G., Neidhoefer, J.C.: Guidance in the choice of adaptive critics for control. In: Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.) Handbook of Learning and Approximate Dynamic Programming, pp. 97–124. IEEE Press, New York (2004)

    Google Scholar 

  28. Karp, L., Lee, I.H.: Learning-by-doing and the choice of technology: the role of patience. J. Econ. Theory 100, 73–92 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  29. Rapaport, A., Sraidi, S., Terreaux, J.: Optimality of greedy and sustainable policies in the management of renewable resources. Optim. Control Appl. Methods 24, 23–44 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  30. Semmler, W., Sieveking, M.: Critical debt and debt dynamics. J. Econ. Dyn. Control 24, 1121–1144 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  31. Nawijn, W.M.: Look-ahead policies for admission to a single-server loss system. Oper. Res. 38, 854–862 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  32. Gnecco, G., Sanguineti, M.: Suboptimal solutions to dynamic optimization problems via approximations of the policy functions. J. Optim. Theory Appl. 146, 764–794 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  33. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Springer, Berlin (1993)

    Google Scholar 

  34. Stein, E.M.: Singular Integrals and Differentiability Properties of Functions. Princeton University Press, Princeton (1970)

    MATH  Google Scholar 

  35. Singer, I.: Best Approximation in Normed Linear Spaces by Elements of Linear Subspaces. Springer, Berlin (1970)

    MATH  Google Scholar 

  36. Kůrková, V., Sanguineti, M.: Geometric upper bounds on rates of variable-basis approximation. IEEE Trans. Inf. Theory 54, 5681–5688 (2008)

    Article  Google Scholar 

  37. Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  38. Gnecco, G., Kůrková, V., Sanguineti, M.: Some comparisons of complexity in dictionary-based and linear computational models. Neural Netw. 24, 171–182 (2011)

    Article  MATH  Google Scholar 

  39. Wahba, G.: Spline Models for Observational Data. CBMS-NSF Regional Conf. Series in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)

    Book  MATH  Google Scholar 

  40. Mhaskar, H.N.: Neural networks for optimal approximation of smooth and analytic functions. Neural Comput. 8, 164–177 (1996)

    Article  Google Scholar 

  41. Kainen, P.C., Kůrková, V., Sanguineti, M.: Complexity of Gaussian radial-basis networks approximating smooth functions. J. Complex. 25, 63–74 (2009)

    Article  MATH  Google Scholar 

  42. Alessandri, A., Gnecco, G., Sanguineti, M.: Minimizing sequences for a family of functional optimal estimation problems. J. Optim. Theory Appl. 147, 243–262 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  43. Adda, J., Cooper, R.: Dynamic Economics: Quantitative Methods and Applications. MIT Press, Cambridge (2003)

    Google Scholar 

  44. Fang, K.T., Wang, Y.: Number-Theoretic Methods in Statistics. Chapman & Hall, London (1994)

    MATH  Google Scholar 

  45. Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods, Methuen, London (1964)

    Book  MATH  Google Scholar 

  46. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia (1992)

    Book  MATH  Google Scholar 

  47. Sobol’, I.: The distribution of points in a cube and the approximate evaluation of integrals. Zh. Vychisl. Mat. Mat. Fiz. 7, 784–802 (1967)

    MathSciNet  Google Scholar 

  48. Loomis, L.H.: An Introduction to Abstract Harmonic Analysis. Van Nostrand, Princeton (1953)

    MATH  Google Scholar 

  49. Boldrin, M., Montrucchio, L.: On the indeterminacy of capital accumulation paths. J. Econ. Theory 40, 26–39 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  50. Dawid, H., Kopel, M., Feichtinger, G.: Complex solutions of nonconcave dynamic optimization models. Econ. Theory 9, 427–439 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  51. Chambers, J., Cleveland, W.: Graphical Methods for Data Analysis. Wadsworth/Cole Publishing Company, Pacific Grove (1983)

    MATH  Google Scholar 

  52. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006)

    MATH  Google Scholar 

  53. Zhang, F. (ed.): The Schur Complement and Its Applications. Springer, New York (2005)

    MATH  Google Scholar 

  54. Wilkinson, J.H.: The Algebraic Eigenvalue Problem. Oxford Science Publications, Oxford (2004)

    Google Scholar 

  55. Hornik, K., Stinchcombe, M., White, H., Auer, P.: Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Neural Comput. 6, 1262–1275 (1994)

    Article  MATH  Google Scholar 

  56. Adams, R.A., Fournier, J.J.F.: Sobolev Spaces. Academic Press, San Diego (2003)

    MATH  Google Scholar 

  57. Rudin, W.: Functional Analysis. McGraw-Hill, New York (1973)

    MATH  Google Scholar 

  58. Gnecco, G., Sanguineti, M.: Approximation error bounds via Rademacher’s complexity. Appl. Math. Sci. 2, 153–176 (2008)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcello Sanguineti.

Additional information

Communicated by Francesco Zirilli.

Appendix

Appendix

Proof of Proposition 2.2

(i) We use a backward induction argument. For t=N−1,…,0, assume that, at stage t+1, \(\tilde{J}_{t+1}^{o} \in\mathcal{F}_{t+1}\) is such that \(\sup_{x_{t+1} \in X_{t+1}} | J_{t+1}^{o}(x_{t+1})-\tilde{J}_{t+1}^{o}(x_{t+1}) |\leq{\eta}_{t+1}\) for some η t+1≥0. In particular, for t=N−1, one has η N =0, as \(\tilde{J}_{N}^{o} = J_{N}^{o}\). By (3), there exists \(f_{t}\in\mathcal{F}_{t}\) such that \(\sup_{x_{t} \in X_{t}} | (T_{t} \tilde{J}_{t+1}^{o})(x_{t})-f_{t}(x_{t}) | \leq \varepsilon_{t}\). Set \(\tilde{J}_{t}^{o}=f_{t}\). By the triangle inequality and Proposition 2.1,

Then, after N iterations we get \(\sup_{x_{0} \in X_{0}} | J_{0}^{o}(x_{0})-\tilde{J}_{0}^{o}(x_{0}) | \leq\eta_{0} = \varepsilon_{0} + \beta \eta_{1} = \varepsilon_{0} + \beta \varepsilon_{1} + \beta^{2} \eta_{2} = \dots:= \sum_{t=0}^{N-1}{\beta^{t}\varepsilon_{t}}\).

(ii) As before, for t=N−1,…,0, assume that, at stage t+1, \(\tilde{J}_{t+1}^{o} \in\mathcal{F}_{t+1}\) is such that \(\sup_{x_{t+1} \in X_{t+1}} | J_{t+1}^{o}(x_{t+1})-\tilde{J}_{t+1}^{o}(x_{t+1}) |\leq{\eta}_{t+1}\) for some η t+1≥0. In particular, for t=N−1, one has η N =0, as \(\tilde{J}_{N}^{o} = J_{N}^{o}\). Let \(\hat{J}_{t}^{o}=T_{t} \tilde{J}_{t+1}^{o}\). Proposition 2.1 gives

Before moving to the tth stage, one has to find an approximation \(\tilde{J}_{t}^{o} \in\mathcal{F}_{t}\) for \(J_{t}^{o}=T_{t} J_{t+1}^{o}\). Such an approximation has to be obtained from \(\hat{J}_{t}^{o}=T_{t} \tilde{J}_{t+1}^{o}\) (which, in general, may not belong to \(\mathcal{F}_{t}\)), because \(J_{t}^{o}=T_{t} {J}_{t+1}^{o}\) is unknown. By assumption, there exists \(f_{t} \in \mathcal{F}_{t}\) such that \(\sup_{x_{t} \in X_{t}} | J_{t}^{o}(x_{t})-f_{t}(x_{t}) | \leq \varepsilon_{t}\). However, in general, one cannot set \(\tilde{J}_{t}^{o}=f_{t}\), since on a neighborhood of radius βη t+1 of \(\hat{J}_{t}^{o}\) in the sup-norm, there may exist (besides \(J^{o}_{t}\)) some other function \(I_{t} \not= J_{t}^{o}\) which can also be approximated by some function \(\tilde{f}_{t} \in\mathcal{F}_{t}\) with error less than or equal to ε t . As \(J_{t}^{o}\) is unknown, in the worst case it happens that one chooses \(\tilde{J}_{t}^{o}=\tilde{f}_{t}\) instead of \(\tilde{J}_{t}^{o}=f_{t}\). In such a case, we get

Let η t :=2βη t+1+ε t . Then, after N iterations we have \(\sup_{x_{0} \in X_{0}} | J_{0}^{o}(x_{0})-\tilde {J}_{0}^{o}(x_{0}) | \leq\eta_{0} = \varepsilon_{0} + 2\beta \eta_{1} = \varepsilon_{0} + 2\beta \varepsilon_{1} + 4\beta^{2} \eta_{2} = \dots= \sum_{t=0}^{N-1}{(2\beta)^{t}\varepsilon_{t}}\). □

Proof of Proposition 2.3

Set η N/M =0 and for t=N/M−1,…,0, assume that, at stage t+1 of ADP(M), \(\tilde{J}_{t+1}^{o} \in\mathcal{F}_{t+1}\) is such that \(\sup_{x_{t+1} \in X_{t+1}} | J_{M\cdot (t+1)}^{o}(x_{t+1})-\tilde{J}_{t+1}^{o}(x_{t+1}) |\leq{\eta}_{t+1}\). Proceeding as in the proof of Proposition 2.2(i), we get the recursion η t =2β M η t+1+ε t (where β M replaces β since in each iteration of ADP(M) one can apply M times Proposition 2.1). □

In order to prove Proposition 3.1, we shall apply the following technical lemma (which readily follows by [53, Theorem 2.13, p. 69] and the example in [53, p. 70]). Given a square partitioned real matrix such that D is nonsingular, Schur’s complement M/D of D in M is defined [53, p. 18] as the matrix M/D=ABD −1 C. For a symmetric real matrix, we denote by λ max its maximum eigenvalue.

Lemma 9.1

Let be a partitioned symmetric negative-semidefinite matrix such that D is nonsingular. Then λ max(M/D)≤λ max(M).

In the proof of the next theorem, we shall use the following notations. The symbol ∇ denotes the gradient operator when it is applied to a scalar-valued function and the Jacobian operator when applied to a vector-valued function. We use the notation ∇2 for the Hessian. In the case of a composite function, e.g., f(g(x,y,z),h(x,y,z)), by ∇ i f(g(x,y,z),h(x,y,z)) we denote the gradient of f with respect to its ith (vector) argument, computed at (g(x,y,z),h(x,y,z)). The full gradient of f with respect to the argument x is denoted by ∇ x f(g(x,y,z),h(x,y,z)). Similarly, by \(\nabla^{2}_{i,j} f(g(x,y,z),h(x,y,z))\) we denote the submatrix of the Hessian of f computed at (g(x,y,z),h(x,y,z)), whose first indices belong to the vector argument i and the second ones to the vector argument j. \(\nabla J_{t}^{o}(x_{t})\) is a column vector, and \(\nabla g_{t}^{o}(x_{t})\) is a matrix whose rows are the transposes of the gradients of the components of \(g_{t}^{o}(x_{t})\). We denote by \(g^{o}_{t,j}\) the jth component of the optimal policy function \(g^{o}_{t}\) (j=1,…,d). The other notations used in the proof are detailed in Sect. 3.

Proof of Proposition 3.1

(i) Let us first show by backward induction on t that \(J^{o}_{t} \in\mathcal{C}^{m}(X_{t})\) and, for every j∈{1,…,d}, \(g^{o}_{t,j} \in\mathcal{C}^{m-1}(X_{t})\) (which we also need in the proof). Since \(J^{o}_{N}=h_{N}\), we have \(J^{o}_{N} \in\mathcal{C}^{m}(X_{N})\) by hypothesis. Now, fix t and suppose that \(J^{o}_{t+1} \in\mathcal{C}^{m}(X_{t+1})\) and is concave. Let \(x_{t} \in\operatorname{int} (X_{t})\). As by hypothesis the optimal policy \(g^{o}_{t}\) is interior on \(\operatorname{int} (X_{t})\), the first-order optimality condition \(\nabla_{2} h_{t}(x_{t},g^{o}_{t}(x_{t}))+\beta\nabla J^{o}_{t+1}(g^{o}_{t}(x_{t}))=0\) holds. By the implicit function theorem we get

$$ \nabla g^o_t(x_t)=- \bigl[ \nabla_{2,2}^2 \bigl(h_t\bigl(x_t,g^o_t(x_t) \bigr) \bigr)+ \beta\nabla^2 J^o_{t+1} \bigl(g^o_t(x_t)\bigr) \bigr]^{-1} \nabla^2_{2,1}h_t\bigl(x_t,g^o_t(x_t) \bigr) , $$
(39)

where \(\nabla^{2}_{2,2} (h_{t}(x_{t},g^{o}_{t}(x_{t})) )+ \beta \nabla^{2} J^{o}_{t+1}(g^{o}_{t}(x_{t}))\) is nonsingular as \(\nabla^{2}_{2,2} (h_{t}(x_{t},g^{o}_{t}(x_{t})) )\) is negative semidefinite by the α t -concavity of h t for α t >0, and \(\nabla^{2} J^{o}_{t+1}(g^{o}_{t}(x_{t}))\) is negative definite since \(J^{o}_{t+1}\) is concave.

By differentiating the two members of (39) up to derivatives of h t and \(J^{o}_{t+1}\) of order m, for j=1,…,d, we get \(g^{o}_{t,j} \in\mathcal {C}^{m-1}(\operatorname{int} (X_{t}))\). As the expressions that one can obtain for its partial derivatives up to the order m−1 are bounded and continuous not only on \(\operatorname{int} (X_{t})\), but on the whole X t , one has \(g^{o}_{t,j} \in \mathcal{C}^{m-1}(X_{t})\).

By differentiating the equality \(J^{o}_{t}(x_{t})=h_{t}(x_{t},g^{o}_{t}(x_{t}))+ \beta J^{o}_{t+1}(g^{o}_{t}(x_{t}))\) we obtain

So, by the first-order optimality condition we get

$$ \nabla J^o_t(x_t)=\nabla_1 h_t\bigl(x_t,g^o_t(x_t) \bigr). $$
(40)

By differentiating the two members of (40) up to derivatives of h t of order m, we obtain \(J^{o}_{t} \in\mathcal{C}^{m}(\operatorname{int} (X_{t}))\). Likewise for the optimal policies, this extends to \(J^{o}_{t} \in\mathcal{C}^{m}(X_{t})\).

In order to conclude the backward induction step, it remains to show that \(J^{o}_{t}\) is concave. This can be proved by the following direct argument. By differentiating (40) and using (39), for the Hessian of \(J^{o}_{t}\), we obtain

which is Schur’s complement of \([\nabla^{2}_{2,2}h_{t}(x_{t},g^{o}_{t}(x_{t})) + \beta\nabla^{2} J^{o}_{t+1}(x_{t},g^{o}_{t}(x_{t})) ]\) in the matrix

$$\left( \begin{array}{c@{\quad}c} \nabla^2_{1,1} h_t(x_t,g^o_t(x_t)) & \nabla^2_{1,2}h_t(x_t,g^o_t(x_t)) \\[6pt] \nabla^2_{2,1}h_t(x_t,g^o_t(x_t)) & \nabla^2_{2,2}h_t(x_t,g^o_t(x_t)) + \beta\nabla^2 J^o_{t+1}(x_t,g^o_t(x_t)) \end{array} \right) . $$

Note that such a matrix is negative semidefinite, as it is the sum of the two matrices

$$\left( \begin{array}{c@{\quad}c} \nabla^2_{1,1} h_t(x_t,g^o_t(x_t)) & \nabla^2_{1,2}h_t(x_t,g^o_t(x_t)) \\ [6pt] \nabla^2_{2,1}h_t(x_t,g^o_t(x_t)) & \nabla^2_{2,2}h_t(x_t,g^o_t(x_t)) \end{array} \right) \quad \mbox{and} \quad \left( \begin{array}{c@{\quad}c} 0 & 0 \\ [4pt] 0 & \beta\nabla^2 J^o_{t+1}(x_t,g^o_t(x_t)) \end{array} \right) , $$

which are negative-semidefinite as h t and \(J^{o}_{t+1}\) are concave and twice continuously differentiable. In particular, it follows by [54, p. 102] (which gives bounds on the eigenvalues of the sum of two symmetric matrices) that its maximum eigenvalue is smaller than or equal to α t . Then, it follows by Lemma 9.1 that \(J^{o}_{t}\) is concave (even α t -concave).

Thus, by backward induction on t and by the compactness of X t we conclude that, for every t=N,…,0, \(J^{o}_{t} \in\mathcal{C}^{m}(X_{t}) \subset\mathcal{W}^{m}_{p}(\operatorname{int}(X_{t}))\) for every 1≤p≤+∞.

(ii) As X t is bounded and convex, by Sobolev’s extension theorem [34, Theorem 5, p. 181, and Example 2, p. 189], for every 1≤p≤+∞, the function \(J^{o}_{t} \in\mathcal{W}^{m}_{p}(\operatorname{int}(X_{t}))\) can be extended on the whole ℝd to a function \(\bar {J}_{t}^{o,p} \in \mathcal{W}^{m}_{p}(\mathbb{R}^{d})\).

(iii) For 1<p<+∞, the statement follows by item (ii) and the equivalence between Sobolev spaces and Bessel potential spaces [34, Theorem 3, p. 135]. For p=1 and m≥2 even, it follows by item (ii) and the inclusion \(\mathcal{W}^{m}_{1}(\mathbb{R}^{d}) \subset\mathcal{B}^{m}_{1}(\mathbb{R}^{d})\) from [34, p. 160]. □

Proof of Proposition 3.2

(i) is proved likewise Proposition 3.1 by replacing \(J_{t+1}^{o}\) with \(\tilde{J}_{t+1}^{o}\) and \(g_{t}^{o}\) with \(\tilde{g}_{t}^{o}\).

(ii) Inspection of the proof of Proposition 3.1(i) shows that \(J_{t}^{o}\) is α t -concave (α t >0) for t=0,…,N−1, whereas the α N -concavity (α N >0) of \(J_{N}^{o}=h_{N}\) is assumed. By (12) and condition (10), \(\tilde{J}_{t+1,j}^{o}\) is concave for j sufficiently large. Hence, one can apply (i) to \(\tilde{J}_{t+1,j}^{o}\), and so there exists \(\hat{J}^{o,p}_{t,j} \in\mathcal{W}^{m}_{p}(\mathbb{R}^{d})\) such that \(T_{t} \tilde{J}_{t+1,j}^{o}=\hat{J}^{o,p}_{t,j}|_{X_{t}}\). Proceeding as in the proof of Proposition 3.1, one obtains equations analogous to (39) and (40) (with obvious replacements). Then, by differentiating \(T_{t} \tilde{J}_{t+1,j}^{o}\) up to the order m, we get

$$\lim_{j \to\infty} \max_{0 \leq|\mathbf{r}| \leq m} \bigl\{ \operatorname{sup}_{x_t \in X_t }\big| D^{\mathbf{r}}\bigl(J_t^o(x_t)- \bigl(T_t \tilde{J}_{t+1,j}^o\bigr) (x_t)\bigr) \big| \bigr\}=0. $$

Finally, the statement follows by the continuity of the embedding of \(\mathcal{C}^{m}(X_{t})\) into \(\mathcal{W}^{m}_{p}(\operatorname{int} (X_{t}))\) (since X t is compact) and the continuity of the Sobolev’s extension operator. □

Proof of Proposition 4.1

(i) For ω∈ℝd, let M(ω)=max{∥ω∥,1}, ν be a positive integer, and define the set of functions

$$\varGamma^\nu\bigl(\mathbb{R}^d\bigr) := \biggl\{ f \in \mathcal{L}_2\bigl(\mathbb {R}^d\bigr) : \int _{\mathbb{R}^d} M(\omega)^\nu \big|{\hat{f}}({\omega})\big| \, d { \omega} < \infty \biggr\} , $$

where \({\hat{f}}\) is the Fourier transform of f. For fΓ ν(ℝd), let

$$\|f\|_{\varGamma^\nu(\mathbb{R}^d)}:=\int_{\mathbb{R}^d} M(\omega)^\nu \big|{\hat{f}}({\omega})\big| \, d {\omega} $$

and for θ>0, denote by

$$B_\theta\bigl(\|\cdot\|_{\varGamma^\nu(\mathbb{R}^d)}\bigr) := \biggl\{ f \in \mathcal{L}_2\bigl(\mathbb{R}^d\bigr) : \int _{\mathbb{R}^d} M(\omega)^\nu \big|{\hat{f}}({\omega})\big| \,d { \omega} \leq\theta \biggr\}, $$

the closed ball of radius θ in Γ ν(ℝd). By [55, Corollary 3.2]Footnote 3, the compactness of the support of ψ, and the regularity of its boundary (which allows one to apply the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168]), for s=⌊d/2⌋+1 and \(\psi\in\mathcal{S}^{q+s}\), there existsFootnote 4 C 1>0 such that, for every \(f \in B_{\theta}(\|\cdot\|_{\varGamma^{q+s+1}})\) and every positive integer n, there is \(f_{n} \in\mathcal{R}(\psi,n)\) such that

$$ \max_{0\leq|\mathbf{r}|\leq q} \sup_{x \in X} \bigl \vert D^{\mathbf{r}} f(x) - D^{\mathbf{r}} f_n(x) \bigr \vert \leq C_1 \frac{\theta}{\sqrt{n}}. $$
(41)

The next step consists in proving that, for every positive integer ν and s=⌊d/2⌋+1, the space \(\mathcal{W}^{\nu +s}_{2}(\mathbb{R}^{d})\) is continuously embedded in Γ ν(ℝd). Let \(f \in \mathcal{W}^{\nu+s}_{2}(\mathbb{R}^{d})\). Then

$$\int_{\mathbb{R}^d}M(\omega)^\nu \big|{\hat{f}}({\omega})\big| \,d\omega= \int_{\|\omega\|\leq1} \big|{\hat{f}}({\omega})\big| \,d\omega+ \int _{\|\omega\|>1}\|\omega\|^\nu \big|{\hat{f}}({\omega})\big| \,d\omega. $$

The first integral is finite by the Cauchy–Schwarz inequality and the finiteness of \(\int_{\|\omega\|\leq1} |{\hat{f}}({\omega})|^{2} \,d\omega \). To study the second integral, taking the hint from [37, p. 941], we factorize \(\|\omega\|^{\nu}|{\hat{f}}({\omega})| = a(\omega) b(\omega)\), where a(ω):=(1+∥ω2s)−1/2 and \(b(\omega) := \|\omega\|^{\nu}|{\hat{f}}({\omega})| (1+ \|\omega\|^{2s})^{1/2}\). By the Cauchy–Schwarz inequality,

$$\int_{\|\omega\|>1}\|\omega\|^\nu \big|{\hat{f}}({\omega})\big| \,d\omega\leq \biggl( \int_{\mathbb{R}^d}a^2(\omega) \,d \omega \biggr)^{1/2} \biggl( \int_{\mathbb{R}^d}b^2( \omega) \,d\omega \biggr)^{1/2}. $$

The integral \(\int_{\mathbb{R}^{d}}a^{2}(\omega) \,d\omega= \int_{\mathbb{R}^{d}}(1+ \|\omega\|^{2s})^{-1} \,d\omega\) is finite for 2s>d, which is satisfied for all d≥1 as s=⌊d/2⌋+1. By Parseval’s identity [57, p. 172], since f has square-integrable νth and (ν+s)th partial derivatives, the integral \(\int_{\mathbb{R}^{d}}b^{2}(\omega) \,d\omega= \int_{\mathbb{R}^{d}} \| \omega\|^{2\nu} |{\hat{f}}({\omega})|^{2} (1+ \|\omega\|^{2s}) \,d\omega= \int_{\mathbb{R}^{d}} |{\hat{f}}({\omega})|^{2} (\|\omega\|^{2\nu} + \|\omega\|^{2(\nu+s)}) \,d\omega\) is finite. Hence, \(\int_{\mathbb{R} ^{d}}M(\omega)^{\nu}|{\hat{f}}({\omega})| \,d\omega\) is finite, so fΓ ν(ℝd), and, by the argument above, there exists C 2>0 such that \(B_{\rho}(\|\cdot\|_{\mathcal{W}^{\nu+s}_{2}}) \subset B_{C_{2} \rho}(\|\cdot\|_{\varGamma^{\nu}})\).

Taking ν=q+s+1 as required in (41) and C=C 1C 2, we conclude that, for every \(f \in B_{\rho}(\|\cdot\|_{\mathcal{W}^{q + 2s+1}_{2}})\) and every positive integer n, there exists \(f_{n} \in\mathcal{R}(\psi,n)\) such that \(\max_{0\leq|\mathbf{r}|\leq q} \sup_{x \in X} \vert D^{\mathbf{r}} f(x) - D^{\mathbf{r}} f_{n}(x) \vert \leq C \frac{\rho}{\sqrt{n}}\).

(ii) Follows by [40, Theorem 2.1] and the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168], which allows to use “sup” in (20) instead of “\(\operatorname{ess\,sup}\)”.

(iii) Follows by [58, Corollary 5.2]. □

Proof of Proposition 4.2

(i) We detail the proof for t=N−1 and t=N−2; the other cases follow by backward induction.

Let us start with t=N−1 and \(\tilde{J}^{o}_{N}=J^{o}_{N}\). By Proposition 3.1(ii), there exists \(\bar{J}^{o,2}_{N-1} \in\mathcal {W}^{2+(2s+1)N}_{2}(\mathbb{R}^{d})\) such that \(T_{N-1} \tilde{J}^{o}_{N}=T_{N-1} J^{o}_{N}=J^{o}_{N-1}=\bar {J}^{o,2}_{N-1}|_{X_{N-1}}\).

By Proposition 4.1(i) with q=2+(2s+1)(N−1) applied to \(\bar{J}^{o,2}_{N-1}\), we obtain (22) for t=N−1. Set \(\tilde{J}^{o}_{N-1}=f_{N-1}\) in (22). By (22) and condition (10), there exists a positive integer \(\bar{n}_{N-1}\) such that \(\tilde{J}^{o}_{N-1}\) is concave for \(n_{N-1}\geq\bar{n}_{N-1}\).

Now consider t=N−2. By Proposition 3.2(i), it follows that there exists \(\hat{J}^{o,2}_{N-2} \in \mathcal{W}^{2+(2s+1)(N-1)}_{2}(\mathbb{R}^{d})\) such that \(T_{N-2} \tilde{J}^{o}_{N-1}=\hat{J}^{o,2}_{N-2}|_{X_{N-2}}\). By applying to \(\hat{J}^{o,2}_{N-2}\) Proposition 4.1(i) with q=2+(2s+1)(N−2), for every positive integer n N−2, we conclude that there exists \(f_{N-2} \in\mathcal{R}(\psi_{t},n_{N-2})\) such that

(42)

where, by Proposition 3.2(i), \(\hat {J}^{o,2}_{N-2} \in\mathcal{W}^{2 + (2s+1)(N-1)}_{2}(\mathbb{R}^{d})\) is a suitable extension of \(T_{N-2} \tilde{J}^{o}_{N-1}\) on ℝd, and \(\bar {C}_{N-2}>0\) does not depend on the approximations generated in the previous iterations. The statement for t=N−2 follows by the fact that the dependence of the bound (42) on \(\| \hat{J}^{o,2}_{N-2} \|_{\mathcal{W}^{2 + (2s+1)(N-1)}_{2}(\mathbb{R}^{d})}\) can be removed by exploiting Proposition 3.2(ii); in particular, we can choose C N−2>0 independently of n N−1. So, we get (22) for t=N−2. Set \(\tilde{J}^{o}_{N-2}=f_{N-2}\) in (22). By (22) and condition (10), there exists a positive integer \(\bar {n}_{N-2}\) such that \(\tilde{J}^{o}_{N-2}\) is concave for \(n_{N-2}\geq \bar{n}_{N-2}\).

The proof proceeds similarly for the other values of t; each constant C t can be chosen independently on n t+1,…,n N−1.

(ii) follows by Proposition 3.1(ii) (with p=+∞) and Proposition 4.1(ii).

(iii) follows by Proposition 3.1(iii) (with p=1) and Proposition 4.1(iii). □

Proof of Proposition 5.1

We first derive some constraints on the form of the sets A t,j and then show that the budget constraints (25) are satisfied if and only if the sets A t,j are chosen as in Assumption 5.1 (or are suitable subsets).

As the labor incomes y t,j and the interest rates r t,j are known, for t=1,…,N, we have

$$a_{t,j} \leq a_{0,j}^{\max} \prod _{k=0}^{t-1}(1+r_{k,j}) + \sum _{i=0}^{t-1} y_{i,j} \prod _{k=i}^{t-1}(1+r_{k,j})=a_{t,j}^{\max} $$

(the upper bound is achieved when all the consumptions c t,j are equal to 0), so the corresponding feasible sets A t,j are bounded from above by \(a_{t,j}^{\max}\). The boundedness from below of each A t,j follows from the budget constraints (25), which for c k,j =0 (k=t,…,N) are equivalent for t=N to

$$ a_{N,j} \geq-y_{N,j} $$
(43)

and for t=0,…,N−1 to \(a_{t,j} \prod_{k=t}^{N-1} (1+r_{k,j}) + \sum_{i=t}^{N-1} y_{i,j} \prod_{k=i}^{N-1} (1+r_{k,j}) + y_{N,j} \geq0 \), i.e.,

$$ a_{t,j} \geq-\frac{\sum_{i=t}^{N-1} y_{i,j} \prod_{k=i}^{N-1} (1+r_{k,j}) + y_{N,j}}{\prod_{k=t}^{N-1} (1+r_{k,j} )}. $$
(44)

So, in order to satisfy the budget constraints (25), the constraints (43) and (44) have to be satisfied. Then the maximal sets A t that satisfy the budget constraints (25) have the form described in Assumption 5.1. □

Proof of Proposition 5.2

(a) About Assumption 3.1(i). By construction, the sets \(\bar{A}_{t}\) are compact, convex, and have nonempty interiors, since they are Cartesian products of nonempty closed intervals. The same holds for the \(\bar{D}_{t}\), since by (31) they are the intersections between \(\bar{A}_{t} \times\bar{A}_{t+1}\) and the sets D t , which are compact, convex, and have nonempty interiors too.

(b) About Assumption 3.1(ii). This is Assumption 5.2(i), with the obvious replacements of X t and D t .

(c) About Assumption 3.1(iii). Recall that for Problem \(\mathrm {OC}_{N}^{d}\) and t=0,…,N−1, we have

$$ \nonumber h_t(a_t,a_{t+1})=u \biggl( \frac{(1+r_t) \circ (a_t+y_t)-a_{t+1}}{1+r_t} \biggr)+\sum_{j=1}^d v_{t,j}(a_{t,j}). $$

Then, \(h_{t} \in\mathcal{C}^{m}(\bar{D}_{t})\) by Assumption 5.2(ii) and (iii). As u(⋅) and v t,j (⋅) are twice continuously differentiable, the second part of Assumption 3.1(iii) means that there exists some α t >0 such that the function

$$u \biggl(\frac{(1+r_t) \circ (a_t+y_t)-a_{t+1}}{1+r_t} \biggr)+\sum_{j=1}^d v_{t,j}(a_{t,j})+ \frac{1}{2}\alpha_t \|a_t\|^2 $$

has negative semi-definite Hessian with respect to the variables a t and a t+1. Assumption 5.2(ii) and easy computations show that the function \(u (\frac{(1+r_{t}) \circ (a_{t}+y_{t})-a_{t+1}}{1+r_{t}} )\) has negative semi-definite Hessian. By Assumption 5.2(iii), for each j=1,…,d and α t,j ∈(0,β t,j ], \(v_{t,j}(a_{t,j})+ \frac{1}{2}\alpha_{t,j} a_{t,j}^{2}\) has negative semi-definite Hessian too. So, Assumption 3.1(iii) is satisfied for every α t ∈(0,min j=1,…,d {β t,j }].

(d) About Assumption 3.1(iv). Recall that for Problem \(\mathrm {OC}_{N}^{d}\), we have h N (a N )=u(a N +y N ). Then, \(h_{N} \in\mathcal{C}^{m}(\bar{A}_{N})\) and is concave by Assumption 5.2(ii). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gaggero, M., Gnecco, G. & Sanguineti, M. Dynamic Programming and Value-Function Approximation in Sequential Decision Problems: Error Analysis and Numerical Results. J Optim Theory Appl 156, 380–416 (2013). https://doi.org/10.1007/s10957-012-0118-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-012-0118-2

Keywords

Navigation