Dynamic Programming and Value-Function Approximation in Sequential Decision Problems: Error Analysis and Numerical Results

Gaggero, Mauro; Gnecco, Giorgio; Sanguineti, Marcello

doi:10.1007/s10957-012-0118-2

Dynamic Programming and Value-Function Approximation in Sequential Decision Problems: Error Analysis and Numerical Results

Published: 06 July 2012

Volume 156, pages 380–416, (2013)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Mauro Gaggero¹,
Giorgio Gnecco² &
Marcello Sanguineti²

793 Accesses
31 Citations
Explore all metrics

Abstract

Value-function approximation is investigated for the solution via Dynamic Programming (DP) of continuous-state sequential N-stage decision problems, in which the reward to be maximized has an additive structure over a finite number of stages. Conditions that guarantee smoothness properties of the value function at each stage are derived. These properties are exploited to approximate such functions by means of certain nonlinear approximation schemes, which include splines of suitable order and Gaussian radial-basis networks with variable centers and widths. The accuracies of suboptimal solutions obtained by combining DP with these approximation tools are estimated. The results provide insights into the successful performances appeared in the literature about the use of value-function approximators in DP. The theoretical analysis is applied to a problem of optimal consumption, with simulation results illustrating the use of the proposed solution methodology. Numerical comparisons with classical linear approximators are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Suboptimal Policies for Stochastic $$N$$ -Stage Optimization: Accuracy Analysis and a Case Study from Optimal Consumption

A Computationally Efficient FPTAS for Convex Stochastic Dynamic Programs

Solving High-Dimensional Dynamic Portfolio Choice Models with Hierarchical B-Splines on Sparse Grids

Article Open access 04 January 2021

Peter Schober, Julian Valentin & Dirk Pflüger

Notes

When the decision horizon goes to infinity.
Functions constant along hyperplanes are known as ridge functions. Each ridge function results from the composition of a multivariable function having a particularly simple form, i.e., the inner product, with an arbitrary function dependent on a single variable.
Note that [55, Corollary 3.2] uses “$\operatorname{ess\,sup}$” instead of “sup” in (41). However, by the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168], one can replace “$\operatorname{ess\,sup}$” with “sup”.
Unfortunately, [55, Corollary 3.2] does not provide neither a closed-form expression of C ₁, nor an upper bound on it. For results similar to [55, Corollary 3.2] and for specific choices of ψ, [55] gives upper bounds on similar constants (see, e.g., [55, Theorem 2.3 and Corollary 3.3]).

References

Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
MATH Google Scholar
Powell, W.B.: Approximate Dynamic Programming—Solving the Curses of Dimensionality. Wiley, Hoboken (2007)
Book MATH Google Scholar
Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.): Handbook of Learning and Approximate Dynamic Programming. IEEE Press, New York (2004)
Google Scholar
Zoppoli, R., Parisini, T., Sanguineti, M., Gnecco, G.: Neural Approximations for Optimal Control and Decision. Springer, London (2012, in preparation)
Haykin, S.: Neural Networks: a Comprehensive Foundation. Prentice Hall, New York (1998)
Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control vol. 1. Athena Scientific, Belmont (2005)
MATH Google Scholar
Bellman, R., Dreyfus, S.: Functional approximations and dynamic programming. Math. Tables Other Aids Comput. 13, 247–251 (1959)
Article MathSciNet MATH Google Scholar
Bellman, R., Kalaba, R., Kotkin, B.: Polynomial approximation—a new computational technique in dynamic programming. Math. Comput. 17, 155–161 (1963)
MathSciNet MATH Google Scholar
Foufoula-Georgiou, E., Kitanidis, P.K.: Gradient dynamic programming for stochastic optimal control of multidimensional water resources systems. Water Resour. Res. 24, 1345–1359 (1988)
Article Google Scholar
Johnson, S., Stedinger, J., Shoemaker, C., Li, Y., Tejada-Guibert, J.: Numerical solution of continuous-state dynamic programs using linear and spline interpolation. Oper. Res. 41, 484–500 (1993)
Article MATH Google Scholar
Chen, V.C.P., Ruppert, D., Shoemaker, C.A.: Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38–53 (1999)
Article MathSciNet MATH Google Scholar
Cervellera, C., Muselli, M.: Efficient sampling in approximate dynamic programming algorithms. Comput. Optim. Appl. 38, 417–443 (2007)
Article MathSciNet MATH Google Scholar
Philbrick, C.R. Jr., Kitanidis, P.K.: Improved dynamic programming methods for optimal control of lumped-parameter stochastic systems. Oper. Res. 49, 398–412 (2001)
Article MathSciNet MATH Google Scholar
Judd, K.: Numerical Methods in Economics. MIT Press, Cambridge (1998)
MATH Google Scholar
Kůrková, V., Sanguineti, M.: Comparison of worst-case errors in linear and neural network approximation. IEEE Trans. Inf. Theory 48, 264–275 (2002)
Article MATH Google Scholar
Tesauro, G.: Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992)
MATH Google Scholar
Gnecco, G., Sanguineti, M., Gaggero, M.: Suboptimal solutions to team optimization problems with stochastic information structure. SIAM J. Optim. 22, 212–243 (2012)
Article MathSciNet MATH Google Scholar
Tsitsiklis, J.N., Roy, B.V.: Feature-based methods for large scale dynamic programming. Mach. Learn. 22, 59–94 (1996)
MATH Google Scholar
Zoppoli, R., Sanguineti, M., Parisini, T.: Approximating networks and extended Ritz method for the solution of functional optimization problems. J. Optim. Theory Appl. 112, 403–439 (2002)
Article MathSciNet MATH Google Scholar
Alessandri, A., Gaggero, M., Zoppoli, R.: Feedback optimal control of distributed parameter systems by using finite-dimensional approximation schemes. IEEE Trans. Neural Netw. Learn. Syst. 23(6), 984–996 (2012)
Article Google Scholar
Stokey, N.L., Lucas, R.E., Prescott, E.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989)
MATH Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control vol. 2. Athena Scientific, Belmont (2007)
Google Scholar
White, D.J.: Markov Decision Processes. Wiley, New York (1993)
MATH Google Scholar
Puterman, M.L., Shin, M.C.: Modified policy iteration algorithms for discounted Markov decision processes. Manag. Sci. 41, 1127–1137 (1978)
Article MathSciNet Google Scholar
Altman, E., Nain, P.: Optimal control of the M/G/1 queue with repeated vacations of the server. IEEE Trans. Autom. Control 38, 1766–1775 (1993)
Article MathSciNet MATH Google Scholar
Lendaris, G.G., Neidhoefer, J.C.: Guidance in the choice of adaptive critics for control. In: Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.) Handbook of Learning and Approximate Dynamic Programming, pp. 97–124. IEEE Press, New York (2004)
Google Scholar
Karp, L., Lee, I.H.: Learning-by-doing and the choice of technology: the role of patience. J. Econ. Theory 100, 73–92 (2001)
Article MathSciNet MATH Google Scholar
Rapaport, A., Sraidi, S., Terreaux, J.: Optimality of greedy and sustainable policies in the management of renewable resources. Optim. Control Appl. Methods 24, 23–44 (2003)
Article MathSciNet MATH Google Scholar
Semmler, W., Sieveking, M.: Critical debt and debt dynamics. J. Econ. Dyn. Control 24, 1121–1144 (2000)
Article MathSciNet MATH Google Scholar
Nawijn, W.M.: Look-ahead policies for admission to a single-server loss system. Oper. Res. 38, 854–862 (1990)
Article MathSciNet MATH Google Scholar
Gnecco, G., Sanguineti, M.: Suboptimal solutions to dynamic optimization problems via approximations of the policy functions. J. Optim. Theory Appl. 146, 764–794 (2010)
Article MathSciNet MATH Google Scholar
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Springer, Berlin (1993)
Google Scholar
Stein, E.M.: Singular Integrals and Differentiability Properties of Functions. Princeton University Press, Princeton (1970)
MATH Google Scholar
Singer, I.: Best Approximation in Normed Linear Spaces by Elements of Linear Subspaces. Springer, Berlin (1970)
MATH Google Scholar
Kůrková, V., Sanguineti, M.: Geometric upper bounds on rates of variable-basis approximation. IEEE Trans. Inf. Theory 54, 5681–5688 (2008)
Article Google Scholar
Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993)
Article MathSciNet MATH Google Scholar
Gnecco, G., Kůrková, V., Sanguineti, M.: Some comparisons of complexity in dictionary-based and linear computational models. Neural Netw. 24, 171–182 (2011)
Article MATH Google Scholar
Wahba, G.: Spline Models for Observational Data. CBMS-NSF Regional Conf. Series in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)
Book MATH Google Scholar
Mhaskar, H.N.: Neural networks for optimal approximation of smooth and analytic functions. Neural Comput. 8, 164–177 (1996)
Article Google Scholar
Kainen, P.C., Kůrková, V., Sanguineti, M.: Complexity of Gaussian radial-basis networks approximating smooth functions. J. Complex. 25, 63–74 (2009)
Article MATH Google Scholar
Alessandri, A., Gnecco, G., Sanguineti, M.: Minimizing sequences for a family of functional optimal estimation problems. J. Optim. Theory Appl. 147, 243–262 (2010)
Article MathSciNet MATH Google Scholar
Adda, J., Cooper, R.: Dynamic Economics: Quantitative Methods and Applications. MIT Press, Cambridge (2003)
Google Scholar
Fang, K.T., Wang, Y.: Number-Theoretic Methods in Statistics. Chapman & Hall, London (1994)
MATH Google Scholar
Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods, Methuen, London (1964)
Book MATH Google Scholar
Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia (1992)
Book MATH Google Scholar
Sobol’, I.: The distribution of points in a cube and the approximate evaluation of integrals. Zh. Vychisl. Mat. Mat. Fiz. 7, 784–802 (1967)
MathSciNet Google Scholar
Loomis, L.H.: An Introduction to Abstract Harmonic Analysis. Van Nostrand, Princeton (1953)
MATH Google Scholar
Boldrin, M., Montrucchio, L.: On the indeterminacy of capital accumulation paths. J. Econ. Theory 40, 26–39 (1986)
Article MathSciNet MATH Google Scholar
Dawid, H., Kopel, M., Feichtinger, G.: Complex solutions of nonconcave dynamic optimization models. Econ. Theory 9, 427–439 (1997)
Article MathSciNet MATH Google Scholar
Chambers, J., Cleveland, W.: Graphical Methods for Data Analysis. Wadsworth/Cole Publishing Company, Pacific Grove (1983)
MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006)
MATH Google Scholar
Zhang, F. (ed.): The Schur Complement and Its Applications. Springer, New York (2005)
MATH Google Scholar
Wilkinson, J.H.: The Algebraic Eigenvalue Problem. Oxford Science Publications, Oxford (2004)
Google Scholar
Hornik, K., Stinchcombe, M., White, H., Auer, P.: Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Neural Comput. 6, 1262–1275 (1994)
Article MATH Google Scholar
Adams, R.A., Fournier, J.J.F.: Sobolev Spaces. Academic Press, San Diego (2003)
MATH Google Scholar
Rudin, W.: Functional Analysis. McGraw-Hill, New York (1973)
MATH Google Scholar
Gnecco, G., Sanguineti, M.: Approximation error bounds via Rademacher’s complexity. Appl. Math. Sci. 2, 153–176 (2008)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Intelligent Systems for Automation, National Research Council of Italy, Genova, Italy
Mauro Gaggero
DIBRIS, University of Genova, Genova, Italy
Giorgio Gnecco & Marcello Sanguineti

Authors

Mauro Gaggero
View author publications
You can also search for this author in PubMed Google Scholar
Giorgio Gnecco
View author publications
You can also search for this author in PubMed Google Scholar
Marcello Sanguineti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcello Sanguineti.

Additional information

Communicated by Francesco Zirilli.

Appendix

Proof of Proposition 2.2

(i) We use a backward induction argument. For t=N−1,…,0, assume that, at stage t+1, $\tilde{J}_{t+1}^{o} \in\mathcal{F}_{t+1}$ is such that $\sup_{x_{t+1} \in X_{t+1}} | J_{t+1}^{o}(x_{t+1})-\tilde{J}_{t+1}^{o}(x_{t+1}) |\leq{\eta}_{t+1}$ for some η _t+1≥0. In particular, for t=N−1, one has η _N=0, as $\tilde{J}_{N}^{o} = J_{N}^{o}$. By (3), there exists $f_{t}\in\mathcal{F}_{t}$ such that $\sup_{x_{t} \in X_{t}} | (T_{t} \tilde{J}_{t+1}^{o})(x_{t})-f_{t}(x_{t}) | \leq \varepsilon_{t}$. Set $\tilde{J}_{t}^{o}=f_{t}$. By the triangle inequality and Proposition 2.1,

Then, after N iterations we get $\sup_{x_{0} \in X_{0}} | J_{0}^{o}(x_{0})-\tilde{J}_{0}^{o}(x_{0}) | \leq\eta_{0} = \varepsilon_{0} + \beta \eta_{1} = \varepsilon_{0} + \beta \varepsilon_{1} + \beta^{2} \eta_{2} = \dots:= \sum_{t=0}^{N-1}{\beta^{t}\varepsilon_{t}}$.

(ii) As before, for t=N−1,…,0, assume that, at stage t+1, $\tilde{J}_{t+1}^{o} \in\mathcal{F}_{t+1}$ is such that $\sup_{x_{t+1} \in X_{t+1}} | J_{t+1}^{o}(x_{t+1})-\tilde{J}_{t+1}^{o}(x_{t+1}) |\leq{\eta}_{t+1}$ for some η _t+1≥0. In particular, for t=N−1, one has η _N=0, as $\tilde{J}_{N}^{o} = J_{N}^{o}$. Let $\hat{J}_{t}^{o}=T_{t} \tilde{J}_{t+1}^{o}$. Proposition 2.1 gives

Before moving to the tth stage, one has to find an approximation $\tilde{J}_{t}^{o} \in\mathcal{F}_{t}$ for $J_{t}^{o}=T_{t} J_{t+1}^{o}$. Such an approximation has to be obtained from $\hat{J}_{t}^{o}=T_{t} \tilde{J}_{t+1}^{o}$ (which, in general, may not belong to $\mathcal{F}_{t}$), because $J_{t}^{o}=T_{t} {J}_{t+1}^{o}$ is unknown. By assumption, there exists $f_{t} \in \mathcal{F}_{t}$ such that $\sup_{x_{t} \in X_{t}} | J_{t}^{o}(x_{t})-f_{t}(x_{t}) | \leq \varepsilon_{t}$. However, in general, one cannot set $\tilde{J}_{t}^{o}=f_{t}$, since on a neighborhood of radius βη _t+1 of $\hat{J}_{t}^{o}$ in the sup-norm, there may exist (besides $J^{o}_{t}$) some other function $I_{t} \not= J_{t}^{o}$ which can also be approximated by some function $\tilde{f}_{t} \in\mathcal{F}_{t}$ with error less than or equal to ε _t. As $J_{t}^{o}$ is unknown, in the worst case it happens that one chooses $\tilde{J}_{t}^{o}=\tilde{f}_{t}$ instead of $\tilde{J}_{t}^{o}=f_{t}$. In such a case, we get

Let η _t:=2βη _t+1+ε _t. Then, after N iterations we have $\sup_{x_{0} \in X_{0}} | J_{0}^{o}(x_{0})-\tilde {J}_{0}^{o}(x_{0}) | \leq\eta_{0} = \varepsilon_{0} + 2\beta \eta_{1} = \varepsilon_{0} + 2\beta \varepsilon_{1} + 4\beta^{2} \eta_{2} = \dots= \sum_{t=0}^{N-1}{(2\beta)^{t}\varepsilon_{t}}$. □

Proof of Proposition 2.3

Set η _N/M=0 and for t=N/M−1,…,0, assume that, at stage t+1 of ADP^(M), $\tilde{J}_{t+1}^{o} \in\mathcal{F}_{t+1}$ is such that $\sup_{x_{t+1} \in X_{t+1}} | J_{M\cdot (t+1)}^{o}(x_{t+1})-\tilde{J}_{t+1}^{o}(x_{t+1}) |\leq{\eta}_{t+1}$. Proceeding as in the proof of Proposition 2.2(i), we get the recursion η _t=2β ^M η _t+1+ε _t (where β ^M replaces β since in each iteration of ADP^(M) one can apply M times Proposition 2.1). □

In order to prove Proposition 3.1, we shall apply the following technical lemma (which readily follows by [53, Theorem 2.13, p. 69] and the example in [53, p. 70]). Given a square partitioned real matrix such that D is nonsingular, Schur’s complement M/D of D in M is defined [53, p. 18] as the matrix M/D=A−BD ⁻¹ C. For a symmetric real matrix, we denote by λ _max its maximum eigenvalue.

Lemma 9.1

Let be a partitioned symmetric negative-semidefinite matrix such that D is nonsingular. Then λ _max(M/D)≤λ _max(M).

In the proof of the next theorem, we shall use the following notations. The symbol ∇ denotes the gradient operator when it is applied to a scalar-valued function and the Jacobian operator when applied to a vector-valued function. We use the notation ∇² for the Hessian. In the case of a composite function, e.g., f(g(x,y,z),h(x,y,z)), by ∇_i f(g(x,y,z),h(x,y,z)) we denote the gradient of f with respect to its ith (vector) argument, computed at (g(x,y,z),h(x,y,z)). The full gradient of f with respect to the argument x is denoted by ∇_x f(g(x,y,z),h(x,y,z)). Similarly, by $\nabla^{2}_{i,j} f(g(x,y,z),h(x,y,z))$ we denote the submatrix of the Hessian of f computed at (g(x,y,z),h(x,y,z)), whose first indices belong to the vector argument i and the second ones to the vector argument j. $\nabla J_{t}^{o}(x_{t})$ is a column vector, and $\nabla g_{t}^{o}(x_{t})$ is a matrix whose rows are the transposes of the gradients of the components of $g_{t}^{o}(x_{t})$. We denote by $g^{o}_{t,j}$ the jth component of the optimal policy function $g^{o}_{t}$ (j=1,…,d). The other notations used in the proof are detailed in Sect. 3.

Proof of Proposition 3.1

(i) Let us first show by backward induction on t that $J^{o}_{t} \in\mathcal{C}^{m}(X_{t})$ and, for every j∈{1,…,d}, $g^{o}_{t,j} \in\mathcal{C}^{m-1}(X_{t})$ (which we also need in the proof). Since $J^{o}_{N}=h_{N}$, we have $J^{o}_{N} \in\mathcal{C}^{m}(X_{N})$ by hypothesis. Now, fix t and suppose that $J^{o}_{t+1} \in\mathcal{C}^{m}(X_{t+1})$ and is concave. Let $x_{t} \in\operatorname{int} (X_{t})$. As by hypothesis the optimal policy $g^{o}_{t}$ is interior on $\operatorname{int} (X_{t})$, the first-order optimality condition $\nabla_{2} h_{t}(x_{t},g^{o}_{t}(x_{t}))+\beta\nabla J^{o}_{t+1}(g^{o}_{t}(x_{t}))=0$ holds. By the implicit function theorem we get

$$ \nabla g^o_t(x_t)=- \bigl[ \nabla_{2,2}^2 \bigl(h_t\bigl(x_t,g^o_t(x_t) \bigr) \bigr)+ \beta\nabla^2 J^o_{t+1} \bigl(g^o_t(x_t)\bigr) \bigr]^{-1} \nabla^2_{2,1}h_t\bigl(x_t,g^o_t(x_t) \bigr) , $$

(39)

where $\nabla^{2}_{2,2} (h_{t}(x_{t},g^{o}_{t}(x_{t})) )+ \beta \nabla^{2} J^{o}_{t+1}(g^{o}_{t}(x_{t}))$ is nonsingular as $\nabla^{2}_{2,2} (h_{t}(x_{t},g^{o}_{t}(x_{t})) )$ is negative semidefinite by the α _t-concavity of h _t for α _t>0, and $\nabla^{2} J^{o}_{t+1}(g^{o}_{t}(x_{t}))$ is negative definite since $J^{o}_{t+1}$ is concave.

By differentiating the two members of (39) up to derivatives of h _t and $J^{o}_{t+1}$ of order m, for j=1,…,d, we get $g^{o}_{t,j} \in\mathcal {C}^{m-1}(\operatorname{int} (X_{t}))$. As the expressions that one can obtain for its partial derivatives up to the order m−1 are bounded and continuous not only on $\operatorname{int} (X_{t})$, but on the whole X _t, one has $g^{o}_{t,j} \in \mathcal{C}^{m-1}(X_{t})$.

By differentiating the equality $J^{o}_{t}(x_{t})=h_{t}(x_{t},g^{o}_{t}(x_{t}))+ \beta J^{o}_{t+1}(g^{o}_{t}(x_{t}))$ we obtain

So, by the first-order optimality condition we get

$$ \nabla J^o_t(x_t)=\nabla_1 h_t\bigl(x_t,g^o_t(x_t) \bigr). $$

(40)

By differentiating the two members of (40) up to derivatives of h _t of order m, we obtain $J^{o}_{t} \in\mathcal{C}^{m}(\operatorname{int} (X_{t}))$. Likewise for the optimal policies, this extends to $J^{o}_{t} \in\mathcal{C}^{m}(X_{t})$.

In order to conclude the backward induction step, it remains to show that $J^{o}_{t}$ is concave. This can be proved by the following direct argument. By differentiating (40) and using (39), for the Hessian of $J^{o}_{t}$, we obtain

which is Schur’s complement of $[\nabla^{2}_{2,2}h_{t}(x_{t},g^{o}_{t}(x_{t})) + \beta\nabla^{2} J^{o}_{t+1}(x_{t},g^{o}_{t}(x_{t})) ]$ in the matrix

$$\left( \begin{array}{c@{\quad}c} \nabla^2_{1,1} h_t(x_t,g^o_t(x_t)) & \nabla^2_{1,2}h_t(x_t,g^o_t(x_t)) \\[6pt] \nabla^2_{2,1}h_t(x_t,g^o_t(x_t)) & \nabla^2_{2,2}h_t(x_t,g^o_t(x_t)) + \beta\nabla^2 J^o_{t+1}(x_t,g^o_t(x_t)) \end{array} \right) . $$

Note that such a matrix is negative semidefinite, as it is the sum of the two matrices

$$\left( \begin{array}{c@{\quad}c} \nabla^2_{1,1} h_t(x_t,g^o_t(x_t)) & \nabla^2_{1,2}h_t(x_t,g^o_t(x_t)) \\ [6pt] \nabla^2_{2,1}h_t(x_t,g^o_t(x_t)) & \nabla^2_{2,2}h_t(x_t,g^o_t(x_t)) \end{array} \right) \quad \mbox{and} \quad \left( \begin{array}{c@{\quad}c} 0 & 0 \\ [4pt] 0 & \beta\nabla^2 J^o_{t+1}(x_t,g^o_t(x_t)) \end{array} \right) , $$

which are negative-semidefinite as h _t and $J^{o}_{t+1}$ are concave and twice continuously differentiable. In particular, it follows by [54, p. 102] (which gives bounds on the eigenvalues of the sum of two symmetric matrices) that its maximum eigenvalue is smaller than or equal to α _t. Then, it follows by Lemma 9.1 that $J^{o}_{t}$ is concave (even α _t-concave).

Thus, by backward induction on t and by the compactness of X _t we conclude that, for every t=N,…,0, $J^{o}_{t} \in\mathcal{C}^{m}(X_{t}) \subset\mathcal{W}^{m}_{p}(\operatorname{int}(X_{t}))$ for every 1≤p≤+∞.

(ii) As X _t is bounded and convex, by Sobolev’s extension theorem [34, Theorem 5, p. 181, and Example 2, p. 189], for every 1≤p≤+∞, the function $J^{o}_{t} \in\mathcal{W}^{m}_{p}(\operatorname{int}(X_{t}))$ can be extended on the whole ℝ^d to a function $\bar {J}_{t}^{o,p} \in \mathcal{W}^{m}_{p}(\mathbb{R}^{d})$.

(iii) For 1<p<+∞, the statement follows by item (ii) and the equivalence between Sobolev spaces and Bessel potential spaces [34, Theorem 3, p. 135]. For p=1 and m≥2 even, it follows by item (ii) and the inclusion $\mathcal{W}^{m}_{1}(\mathbb{R}^{d}) \subset\mathcal{B}^{m}_{1}(\mathbb{R}^{d})$ from [34, p. 160]. □

Proof of Proposition 3.2

(i) is proved likewise Proposition 3.1 by replacing $J_{t+1}^{o}$ with $\tilde{J}_{t+1}^{o}$ and $g_{t}^{o}$ with $\tilde{g}_{t}^{o}$.

(ii) Inspection of the proof of Proposition 3.1(i) shows that $J_{t}^{o}$ is α _t-concave (α _t>0) for t=0,…,N−1, whereas the α _N-concavity (α _N>0) of $J_{N}^{o}=h_{N}$ is assumed. By (12) and condition (10), $\tilde{J}_{t+1,j}^{o}$ is concave for j sufficiently large. Hence, one can apply (i) to $\tilde{J}_{t+1,j}^{o}$, and so there exists $\hat{J}^{o,p}_{t,j} \in\mathcal{W}^{m}_{p}(\mathbb{R}^{d})$ such that $T_{t} \tilde{J}_{t+1,j}^{o}=\hat{J}^{o,p}_{t,j}|_{X_{t}}$. Proceeding as in the proof of Proposition 3.1, one obtains equations analogous to (39) and (40) (with obvious replacements). Then, by differentiating $T_{t} \tilde{J}_{t+1,j}^{o}$ up to the order m, we get

$$\lim_{j \to\infty} \max_{0 \leq|\mathbf{r}| \leq m} \bigl\{ \operatorname{sup}_{x_t \in X_t }\big| D^{\mathbf{r}}\bigl(J_t^o(x_t)- \bigl(T_t \tilde{J}_{t+1,j}^o\bigr) (x_t)\bigr) \big| \bigr\}=0. $$

Finally, the statement follows by the continuity of the embedding of $\mathcal{C}^{m}(X_{t})$ into $\mathcal{W}^{m}_{p}(\operatorname{int} (X_{t}))$ (since X _t is compact) and the continuity of the Sobolev’s extension operator. □

Proof of Proposition 4.1

(i) For ω∈ℝ^d, let M(ω)=max{∥ω∥,1}, ν be a positive integer, and define the set of functions

$$\varGamma^\nu\bigl(\mathbb{R}^d\bigr) := \biggl\{ f \in \mathcal{L}_2\bigl(\mathbb {R}^d\bigr) : \int _{\mathbb{R}^d} M(\omega)^\nu \big|{\hat{f}}({\omega})\big| \, d { \omega} < \infty \biggr\} , $$

where ${\hat{f}}$ is the Fourier transform of f. For f∈Γ ^ν(ℝ^d), let

$$\|f\|_{\varGamma^\nu(\mathbb{R}^d)}:=\int_{\mathbb{R}^d} M(\omega)^\nu \big|{\hat{f}}({\omega})\big| \, d {\omega} $$

and for θ>0, denote by

$$B_\theta\bigl(\|\cdot\|_{\varGamma^\nu(\mathbb{R}^d)}\bigr) := \biggl\{ f \in \mathcal{L}_2\bigl(\mathbb{R}^d\bigr) : \int _{\mathbb{R}^d} M(\omega)^\nu \big|{\hat{f}}({\omega})\big| \,d { \omega} \leq\theta \biggr\}, $$

the closed ball of radius θ in Γ ^ν(ℝ^d). By [55, Corollary 3.2]^{Footnote 3}, the compactness of the support of ψ, and the regularity of its boundary (which allows one to apply the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168]), for s=⌊d/2⌋+1 and $\psi\in\mathcal{S}^{q+s}$, there exists^{Footnote 4} C ₁>0 such that, for every $f \in B_{\theta}(\|\cdot\|_{\varGamma^{q+s+1}})$ and every positive integer n, there is $f_{n} \in\mathcal{R}(\psi,n)$ such that

$$ \max_{0\leq|\mathbf{r}|\leq q} \sup_{x \in X} \bigl \vert D^{\mathbf{r}} f(x) - D^{\mathbf{r}} f_n(x) \bigr \vert \leq C_1 \frac{\theta}{\sqrt{n}}. $$

(41)

The next step consists in proving that, for every positive integer ν and s=⌊d/2⌋+1, the space $\mathcal{W}^{\nu +s}_{2}(\mathbb{R}^{d})$ is continuously embedded in Γ ^ν(ℝ^d). Let $f \in \mathcal{W}^{\nu+s}_{2}(\mathbb{R}^{d})$. Then

$$\int_{\mathbb{R}^d}M(\omega)^\nu \big|{\hat{f}}({\omega})\big| \,d\omega= \int_{\|\omega\|\leq1} \big|{\hat{f}}({\omega})\big| \,d\omega+ \int _{\|\omega\|>1}\|\omega\|^\nu \big|{\hat{f}}({\omega})\big| \,d\omega. $$

The first integral is finite by the Cauchy–Schwarz inequality and the finiteness of $\int_{\|\omega\|\leq1} |{\hat{f}}({\omega})|^{2} \,d\omega $. To study the second integral, taking the hint from [37, p. 941], we factorize $\|\omega\|^{\nu}|{\hat{f}}({\omega})| = a(\omega) b(\omega)$, where a(ω):=(1+∥ω∥^2s)^−1/2 and $b(\omega) := \|\omega\|^{\nu}|{\hat{f}}({\omega})| (1+ \|\omega\|^{2s})^{1/2}$. By the Cauchy–Schwarz inequality,

$$\int_{\|\omega\|>1}\|\omega\|^\nu \big|{\hat{f}}({\omega})\big| \,d\omega\leq \biggl( \int_{\mathbb{R}^d}a^2(\omega) \,d \omega \biggr)^{1/2} \biggl( \int_{\mathbb{R}^d}b^2( \omega) \,d\omega \biggr)^{1/2}. $$

The integral $\int_{\mathbb{R}^{d}}a^{2}(\omega) \,d\omega= \int_{\mathbb{R}^{d}}(1+ \|\omega\|^{2s})^{-1} \,d\omega$ is finite for 2s>d, which is satisfied for all d≥1 as s=⌊d/2⌋+1. By Parseval’s identity [57, p. 172], since f has square-integrable νth and (ν+s)th partial derivatives, the integral $\int_{\mathbb{R}^{d}}b^{2}(\omega) \,d\omega= \int_{\mathbb{R}^{d}} \| \omega\|^{2\nu} |{\hat{f}}({\omega})|^{2} (1+ \|\omega\|^{2s}) \,d\omega= \int_{\mathbb{R}^{d}} |{\hat{f}}({\omega})|^{2} (\|\omega\|^{2\nu} + \|\omega\|^{2(\nu+s)}) \,d\omega$ is finite. Hence, $\int_{\mathbb{R} ^{d}}M(\omega)^{\nu}|{\hat{f}}({\omega})| \,d\omega$ is finite, so f∈Γ ^ν(ℝ^d), and, by the argument above, there exists C ₂>0 such that $B_{\rho}(\|\cdot\|_{\mathcal{W}^{\nu+s}_{2}}) \subset B_{C_{2} \rho}(\|\cdot\|_{\varGamma^{\nu}})$.

Taking ν=q+s+1 as required in (41) and C=C ₁⋅C ₂, we conclude that, for every $f \in B_{\rho}(\|\cdot\|_{\mathcal{W}^{q + 2s+1}_{2}})$ and every positive integer n, there exists $f_{n} \in\mathcal{R}(\psi,n)$ such that $\max_{0\leq|\mathbf{r}|\leq q} \sup_{x \in X} \vert D^{\mathbf{r}} f(x) - D^{\mathbf{r}} f_{n}(x) \vert \leq C \frac{\rho}{\sqrt{n}}$.

(ii) Follows by [40, Theorem 2.1] and the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168], which allows to use “sup” in (20) instead of “$\operatorname{ess\,sup}$”.

(iii) Follows by [58, Corollary 5.2]. □

Proof of Proposition 4.2

(i) We detail the proof for t=N−1 and t=N−2; the other cases follow by backward induction.

Let us start with t=N−1 and $\tilde{J}^{o}_{N}=J^{o}_{N}$. By Proposition 3.1(ii), there exists $\bar{J}^{o,2}_{N-1} \in\mathcal {W}^{2+(2s+1)N}_{2}(\mathbb{R}^{d})$ such that $T_{N-1} \tilde{J}^{o}_{N}=T_{N-1} J^{o}_{N}=J^{o}_{N-1}=\bar {J}^{o,2}_{N-1}|_{X_{N-1}}$.

By Proposition 4.1(i) with q=2+(2s+1)(N−1) applied to $\bar{J}^{o,2}_{N-1}$, we obtain (22) for t=N−1. Set $\tilde{J}^{o}_{N-1}=f_{N-1}$ in (22). By (22) and condition (10), there exists a positive integer $\bar{n}_{N-1}$ such that $\tilde{J}^{o}_{N-1}$ is concave for $n_{N-1}\geq\bar{n}_{N-1}$.

Now consider t=N−2. By Proposition 3.2(i), it follows that there exists $\hat{J}^{o,2}_{N-2} \in \mathcal{W}^{2+(2s+1)(N-1)}_{2}(\mathbb{R}^{d})$ such that $T_{N-2} \tilde{J}^{o}_{N-1}=\hat{J}^{o,2}_{N-2}|_{X_{N-2}}$. By applying to $\hat{J}^{o,2}_{N-2}$ Proposition 4.1(i) with q=2+(2s+1)(N−2), for every positive integer n _N−2, we conclude that there exists $f_{N-2} \in\mathcal{R}(\psi_{t},n_{N-2})$ such that

(42)

where, by Proposition 3.2(i), $\hat {J}^{o,2}_{N-2} \in\mathcal{W}^{2 + (2s+1)(N-1)}_{2}(\mathbb{R}^{d})$ is a suitable extension of $T_{N-2} \tilde{J}^{o}_{N-1}$ on ℝ^d, and $\bar {C}_{N-2}>0$ does not depend on the approximations generated in the previous iterations. The statement for t=N−2 follows by the fact that the dependence of the bound (42) on $\| \hat{J}^{o,2}_{N-2} \|_{\mathcal{W}^{2 + (2s+1)(N-1)}_{2}(\mathbb{R}^{d})}$ can be removed by exploiting Proposition 3.2(ii); in particular, we can choose C _N−2>0 independently of n _N−1. So, we get (22) for t=N−2. Set $\tilde{J}^{o}_{N-2}=f_{N-2}$ in (22). By (22) and condition (10), there exists a positive integer $\bar {n}_{N-2}$ such that $\tilde{J}^{o}_{N-2}$ is concave for $n_{N-2}\geq \bar{n}_{N-2}$.

The proof proceeds similarly for the other values of t; each constant C _t can be chosen independently on n _t+1,…,n _N−1.

(ii) follows by Proposition 3.1(ii) (with p=+∞) and Proposition 4.1(ii).

(iii) follows by Proposition 3.1(iii) (with p=1) and Proposition 4.1(iii). □

Proof of Proposition 5.1

We first derive some constraints on the form of the sets A _t,j and then show that the budget constraints (25) are satisfied if and only if the sets A _t,j are chosen as in Assumption 5.1 (or are suitable subsets).

As the labor incomes y _t,j and the interest rates r _t,j are known, for t=1,…,N, we have

$$a_{t,j} \leq a_{0,j}^{\max} \prod _{k=0}^{t-1}(1+r_{k,j}) + \sum _{i=0}^{t-1} y_{i,j} \prod _{k=i}^{t-1}(1+r_{k,j})=a_{t,j}^{\max} $$

(the upper bound is achieved when all the consumptions c _t,j are equal to 0), so the corresponding feasible sets A _t,j are bounded from above by $a_{t,j}^{\max}$. The boundedness from below of each A _t,j follows from the budget constraints (25), which for c _k,j=0 (k=t,…,N) are equivalent for t=N to

$$ a_{N,j} \geq-y_{N,j} $$

(43)

and for t=0,…,N−1 to $a_{t,j} \prod_{k=t}^{N-1} (1+r_{k,j}) + \sum_{i=t}^{N-1} y_{i,j} \prod_{k=i}^{N-1} (1+r_{k,j}) + y_{N,j} \geq0 $, i.e.,

$$ a_{t,j} \geq-\frac{\sum_{i=t}^{N-1} y_{i,j} \prod_{k=i}^{N-1} (1+r_{k,j}) + y_{N,j}}{\prod_{k=t}^{N-1} (1+r_{k,j} )}. $$

(44)

So, in order to satisfy the budget constraints (25), the constraints (43) and (44) have to be satisfied. Then the maximal sets A _t that satisfy the budget constraints (25) have the form described in Assumption 5.1. □

Proof of Proposition 5.2

(a) About Assumption 3.1(i). By construction, the sets $\bar{A}_{t}$ are compact, convex, and have nonempty interiors, since they are Cartesian products of nonempty closed intervals. The same holds for the $\bar{D}_{t}$, since by (31) they are the intersections between $\bar{A}_{t} \times\bar{A}_{t+1}$ and the sets D _t, which are compact, convex, and have nonempty interiors too.

(b) About Assumption 3.1(ii). This is Assumption 5.2(i), with the obvious replacements of X _t and D _t.

(c) About Assumption 3.1(iii). Recall that for Problem $\mathrm {OC}_{N}^{d}$ and t=0,…,N−1, we have

$$ \nonumber h_t(a_t,a_{t+1})=u \biggl( \frac{(1+r_t) \circ (a_t+y_t)-a_{t+1}}{1+r_t} \biggr)+\sum_{j=1}^d v_{t,j}(a_{t,j}). $$

Then, $h_{t} \in\mathcal{C}^{m}(\bar{D}_{t})$ by Assumption 5.2(ii) and (iii). As u(⋅) and v _t,j(⋅) are twice continuously differentiable, the second part of Assumption 3.1(iii) means that there exists some α _t>0 such that the function

$$u \biggl(\frac{(1+r_t) \circ (a_t+y_t)-a_{t+1}}{1+r_t} \biggr)+\sum_{j=1}^d v_{t,j}(a_{t,j})+ \frac{1}{2}\alpha_t \|a_t\|^2 $$

has negative semi-definite Hessian with respect to the variables a _t and a _t+1. Assumption 5.2(ii) and easy computations show that the function $u (\frac{(1+r_{t}) \circ (a_{t}+y_{t})-a_{t+1}}{1+r_{t}} )$ has negative semi-definite Hessian. By Assumption 5.2(iii), for each j=1,…,d and α _t,j∈(0,β _t,j], $v_{t,j}(a_{t,j})+ \frac{1}{2}\alpha_{t,j} a_{t,j}^{2}$ has negative semi-definite Hessian too. So, Assumption 3.1(iii) is satisfied for every α _t∈(0,min_j=1,…,d{β _t,j}].

(d) About Assumption 3.1(iv). Recall that for Problem $\mathrm {OC}_{N}^{d}$, we have h _N(a _N)=u(a _N+y _N). Then, $h_{N} \in\mathcal{C}^{m}(\bar{A}_{N})$ and is concave by Assumption 5.2(ii). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gaggero, M., Gnecco, G. & Sanguineti, M. Dynamic Programming and Value-Function Approximation in Sequential Decision Problems: Error Analysis and Numerical Results. J Optim Theory Appl 156, 380–416 (2013). https://doi.org/10.1007/s10957-012-0118-2

Download citation

Received: 04 August 2011
Accepted: 21 June 2012
Published: 06 July 2012
Issue Date: February 2013
DOI: https://doi.org/10.1007/s10957-012-0118-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Programming and Value-Function Approximation in Sequential Decision Problems: Error Analysis and Numerical Results

Abstract

Access this article

Similar content being viewed by others

Suboptimal Policies for Stochastic $$N$$ -Stage Optimization: Accuracy Analysis and a Case Study from Optimal Consumption

A Computationally Efficient FPTAS for Convex Stochastic Dynamic Programs

Solving High-Dimensional Dynamic Portfolio Choice Models with Hierarchical B-Splines on Sparse Grids

Notes

References