Abstract
Value-function approximation is investigated for the solution via Dynamic Programming (DP) of continuous-state sequential N-stage decision problems, in which the reward to be maximized has an additive structure over a finite number of stages. Conditions that guarantee smoothness properties of the value function at each stage are derived. These properties are exploited to approximate such functions by means of certain nonlinear approximation schemes, which include splines of suitable order and Gaussian radial-basis networks with variable centers and widths. The accuracies of suboptimal solutions obtained by combining DP with these approximation tools are estimated. The results provide insights into the successful performances appeared in the literature about the use of value-function approximators in DP. The theoretical analysis is applied to a problem of optimal consumption, with simulation results illustrating the use of the proposed solution methodology. Numerical comparisons with classical linear approximators are presented.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10957-012-0118-2/MediaObjects/10957_2012_118_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10957-012-0118-2/MediaObjects/10957_2012_118_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10957-012-0118-2/MediaObjects/10957_2012_118_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10957-012-0118-2/MediaObjects/10957_2012_118_Fig4_HTML.gif)
Similar content being viewed by others
Notes
When the decision horizon goes to infinity.
Functions constant along hyperplanes are known as ridge functions. Each ridge function results from the composition of a multivariable function having a particularly simple form, i.e., the inner product, with an arbitrary function dependent on a single variable.
References
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
Bertsekas, D.P., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
Powell, W.B.: Approximate Dynamic Programming—Solving the Curses of Dimensionality. Wiley, Hoboken (2007)
Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.): Handbook of Learning and Approximate Dynamic Programming. IEEE Press, New York (2004)
Zoppoli, R., Parisini, T., Sanguineti, M., Gnecco, G.: Neural Approximations for Optimal Control and Decision. Springer, London (2012, in preparation)
Haykin, S.: Neural Networks: a Comprehensive Foundation. Prentice Hall, New York (1998)
Bertsekas, D.P.: Dynamic Programming and Optimal Control vol. 1. Athena Scientific, Belmont (2005)
Bellman, R., Dreyfus, S.: Functional approximations and dynamic programming. Math. Tables Other Aids Comput. 13, 247–251 (1959)
Bellman, R., Kalaba, R., Kotkin, B.: Polynomial approximation—a new computational technique in dynamic programming. Math. Comput. 17, 155–161 (1963)
Foufoula-Georgiou, E., Kitanidis, P.K.: Gradient dynamic programming for stochastic optimal control of multidimensional water resources systems. Water Resour. Res. 24, 1345–1359 (1988)
Johnson, S., Stedinger, J., Shoemaker, C., Li, Y., Tejada-Guibert, J.: Numerical solution of continuous-state dynamic programs using linear and spline interpolation. Oper. Res. 41, 484–500 (1993)
Chen, V.C.P., Ruppert, D., Shoemaker, C.A.: Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47, 38–53 (1999)
Cervellera, C., Muselli, M.: Efficient sampling in approximate dynamic programming algorithms. Comput. Optim. Appl. 38, 417–443 (2007)
Philbrick, C.R. Jr., Kitanidis, P.K.: Improved dynamic programming methods for optimal control of lumped-parameter stochastic systems. Oper. Res. 49, 398–412 (2001)
Judd, K.: Numerical Methods in Economics. MIT Press, Cambridge (1998)
Kůrková, V., Sanguineti, M.: Comparison of worst-case errors in linear and neural network approximation. IEEE Trans. Inf. Theory 48, 264–275 (2002)
Tesauro, G.: Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992)
Gnecco, G., Sanguineti, M., Gaggero, M.: Suboptimal solutions to team optimization problems with stochastic information structure. SIAM J. Optim. 22, 212–243 (2012)
Tsitsiklis, J.N., Roy, B.V.: Feature-based methods for large scale dynamic programming. Mach. Learn. 22, 59–94 (1996)
Zoppoli, R., Sanguineti, M., Parisini, T.: Approximating networks and extended Ritz method for the solution of functional optimization problems. J. Optim. Theory Appl. 112, 403–439 (2002)
Alessandri, A., Gaggero, M., Zoppoli, R.: Feedback optimal control of distributed parameter systems by using finite-dimensional approximation schemes. IEEE Trans. Neural Netw. Learn. Syst. 23(6), 984–996 (2012)
Stokey, N.L., Lucas, R.E., Prescott, E.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989)
Bertsekas, D.P.: Dynamic Programming and Optimal Control vol. 2. Athena Scientific, Belmont (2007)
White, D.J.: Markov Decision Processes. Wiley, New York (1993)
Puterman, M.L., Shin, M.C.: Modified policy iteration algorithms for discounted Markov decision processes. Manag. Sci. 41, 1127–1137 (1978)
Altman, E., Nain, P.: Optimal control of the M/G/1 queue with repeated vacations of the server. IEEE Trans. Autom. Control 38, 1766–1775 (1993)
Lendaris, G.G., Neidhoefer, J.C.: Guidance in the choice of adaptive critics for control. In: Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.) Handbook of Learning and Approximate Dynamic Programming, pp. 97–124. IEEE Press, New York (2004)
Karp, L., Lee, I.H.: Learning-by-doing and the choice of technology: the role of patience. J. Econ. Theory 100, 73–92 (2001)
Rapaport, A., Sraidi, S., Terreaux, J.: Optimality of greedy and sustainable policies in the management of renewable resources. Optim. Control Appl. Methods 24, 23–44 (2003)
Semmler, W., Sieveking, M.: Critical debt and debt dynamics. J. Econ. Dyn. Control 24, 1121–1144 (2000)
Nawijn, W.M.: Look-ahead policies for admission to a single-server loss system. Oper. Res. 38, 854–862 (1990)
Gnecco, G., Sanguineti, M.: Suboptimal solutions to dynamic optimization problems via approximations of the policy functions. J. Optim. Theory Appl. 146, 764–794 (2010)
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Springer, Berlin (1993)
Stein, E.M.: Singular Integrals and Differentiability Properties of Functions. Princeton University Press, Princeton (1970)
Singer, I.: Best Approximation in Normed Linear Spaces by Elements of Linear Subspaces. Springer, Berlin (1970)
Kůrková, V., Sanguineti, M.: Geometric upper bounds on rates of variable-basis approximation. IEEE Trans. Inf. Theory 54, 5681–5688 (2008)
Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993)
Gnecco, G., Kůrková, V., Sanguineti, M.: Some comparisons of complexity in dictionary-based and linear computational models. Neural Netw. 24, 171–182 (2011)
Wahba, G.: Spline Models for Observational Data. CBMS-NSF Regional Conf. Series in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)
Mhaskar, H.N.: Neural networks for optimal approximation of smooth and analytic functions. Neural Comput. 8, 164–177 (1996)
Kainen, P.C., Kůrková, V., Sanguineti, M.: Complexity of Gaussian radial-basis networks approximating smooth functions. J. Complex. 25, 63–74 (2009)
Alessandri, A., Gnecco, G., Sanguineti, M.: Minimizing sequences for a family of functional optimal estimation problems. J. Optim. Theory Appl. 147, 243–262 (2010)
Adda, J., Cooper, R.: Dynamic Economics: Quantitative Methods and Applications. MIT Press, Cambridge (2003)
Fang, K.T., Wang, Y.: Number-Theoretic Methods in Statistics. Chapman & Hall, London (1994)
Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods, Methuen, London (1964)
Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia (1992)
Sobol’, I.: The distribution of points in a cube and the approximate evaluation of integrals. Zh. Vychisl. Mat. Mat. Fiz. 7, 784–802 (1967)
Loomis, L.H.: An Introduction to Abstract Harmonic Analysis. Van Nostrand, Princeton (1953)
Boldrin, M., Montrucchio, L.: On the indeterminacy of capital accumulation paths. J. Econ. Theory 40, 26–39 (1986)
Dawid, H., Kopel, M., Feichtinger, G.: Complex solutions of nonconcave dynamic optimization models. Econ. Theory 9, 427–439 (1997)
Chambers, J., Cleveland, W.: Graphical Methods for Data Analysis. Wadsworth/Cole Publishing Company, Pacific Grove (1983)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006)
Zhang, F. (ed.): The Schur Complement and Its Applications. Springer, New York (2005)
Wilkinson, J.H.: The Algebraic Eigenvalue Problem. Oxford Science Publications, Oxford (2004)
Hornik, K., Stinchcombe, M., White, H., Auer, P.: Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Neural Comput. 6, 1262–1275 (1994)
Adams, R.A., Fournier, J.J.F.: Sobolev Spaces. Academic Press, San Diego (2003)
Rudin, W.: Functional Analysis. McGraw-Hill, New York (1973)
Gnecco, G., Sanguineti, M.: Approximation error bounds via Rademacher’s complexity. Appl. Math. Sci. 2, 153–176 (2008)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Francesco Zirilli.
Appendix
Appendix
Proof of Proposition 2.2
(i) We use a backward induction argument. For t=N−1,…,0, assume that, at stage t+1, \(\tilde{J}_{t+1}^{o} \in\mathcal{F}_{t+1}\) is such that \(\sup_{x_{t+1} \in X_{t+1}} | J_{t+1}^{o}(x_{t+1})-\tilde{J}_{t+1}^{o}(x_{t+1}) |\leq{\eta}_{t+1}\) for some η t+1≥0. In particular, for t=N−1, one has η N =0, as \(\tilde{J}_{N}^{o} = J_{N}^{o}\). By (3), there exists \(f_{t}\in\mathcal{F}_{t}\) such that \(\sup_{x_{t} \in X_{t}} | (T_{t} \tilde{J}_{t+1}^{o})(x_{t})-f_{t}(x_{t}) | \leq \varepsilon_{t}\). Set \(\tilde{J}_{t}^{o}=f_{t}\). By the triangle inequality and Proposition 2.1,
![](http://media.springernature.com/full/springer-static/image/art%3A10.1007%2Fs10957-012-0118-2/MediaObjects/10957_2012_118_Equr_HTML.gif)
Then, after N iterations we get \(\sup_{x_{0} \in X_{0}} | J_{0}^{o}(x_{0})-\tilde{J}_{0}^{o}(x_{0}) | \leq\eta_{0} = \varepsilon_{0} + \beta \eta_{1} = \varepsilon_{0} + \beta \varepsilon_{1} + \beta^{2} \eta_{2} = \dots:= \sum_{t=0}^{N-1}{\beta^{t}\varepsilon_{t}}\).
(ii) As before, for t=N−1,…,0, assume that, at stage t+1, \(\tilde{J}_{t+1}^{o} \in\mathcal{F}_{t+1}\) is such that \(\sup_{x_{t+1} \in X_{t+1}} | J_{t+1}^{o}(x_{t+1})-\tilde{J}_{t+1}^{o}(x_{t+1}) |\leq{\eta}_{t+1}\) for some η t+1≥0. In particular, for t=N−1, one has η N =0, as \(\tilde{J}_{N}^{o} = J_{N}^{o}\). Let \(\hat{J}_{t}^{o}=T_{t} \tilde{J}_{t+1}^{o}\). Proposition 2.1 gives
![](http://media.springernature.com/full/springer-static/image/art%3A10.1007%2Fs10957-012-0118-2/MediaObjects/10957_2012_118_Equs_HTML.gif)
Before moving to the tth stage, one has to find an approximation \(\tilde{J}_{t}^{o} \in\mathcal{F}_{t}\) for \(J_{t}^{o}=T_{t} J_{t+1}^{o}\). Such an approximation has to be obtained from \(\hat{J}_{t}^{o}=T_{t} \tilde{J}_{t+1}^{o}\) (which, in general, may not belong to \(\mathcal{F}_{t}\)), because \(J_{t}^{o}=T_{t} {J}_{t+1}^{o}\) is unknown. By assumption, there exists \(f_{t} \in \mathcal{F}_{t}\) such that \(\sup_{x_{t} \in X_{t}} | J_{t}^{o}(x_{t})-f_{t}(x_{t}) | \leq \varepsilon_{t}\). However, in general, one cannot set \(\tilde{J}_{t}^{o}=f_{t}\), since on a neighborhood of radius βη t+1 of \(\hat{J}_{t}^{o}\) in the sup-norm, there may exist (besides \(J^{o}_{t}\)) some other function \(I_{t} \not= J_{t}^{o}\) which can also be approximated by some function \(\tilde{f}_{t} \in\mathcal{F}_{t}\) with error less than or equal to ε t . As \(J_{t}^{o}\) is unknown, in the worst case it happens that one chooses \(\tilde{J}_{t}^{o}=\tilde{f}_{t}\) instead of \(\tilde{J}_{t}^{o}=f_{t}\). In such a case, we get
![](http://media.springernature.com/full/springer-static/image/art%3A10.1007%2Fs10957-012-0118-2/MediaObjects/10957_2012_118_Equt_HTML.gif)
Let η t :=2βη t+1+ε t . Then, after N iterations we have \(\sup_{x_{0} \in X_{0}} | J_{0}^{o}(x_{0})-\tilde {J}_{0}^{o}(x_{0}) | \leq\eta_{0} = \varepsilon_{0} + 2\beta \eta_{1} = \varepsilon_{0} + 2\beta \varepsilon_{1} + 4\beta^{2} \eta_{2} = \dots= \sum_{t=0}^{N-1}{(2\beta)^{t}\varepsilon_{t}}\). □
Proof of Proposition 2.3
Set η N/M =0 and for t=N/M−1,…,0, assume that, at stage t+1 of ADP(M), \(\tilde{J}_{t+1}^{o} \in\mathcal{F}_{t+1}\) is such that \(\sup_{x_{t+1} \in X_{t+1}} | J_{M\cdot (t+1)}^{o}(x_{t+1})-\tilde{J}_{t+1}^{o}(x_{t+1}) |\leq{\eta}_{t+1}\). Proceeding as in the proof of Proposition 2.2(i), we get the recursion η t =2β M η t+1+ε t (where β M replaces β since in each iteration of ADP(M) one can apply M times Proposition 2.1). □
In order to prove Proposition 3.1, we shall apply the following technical lemma (which readily follows by [53, Theorem 2.13, p. 69] and the example in [53, p. 70]). Given a square partitioned real matrix such that D is nonsingular, Schur’s complement
M/D of D in M is defined [53, p. 18] as the matrix M/D=A−BD
−1
C. For a symmetric real matrix, we denote by λ
max its maximum eigenvalue.
Lemma 9.1
Let
be a partitioned symmetric negative-semidefinite matrix such that
D
is nonsingular. Then
λ
max(M/D)≤λ
max(M).
In the proof of the next theorem, we shall use the following notations. The symbol ∇ denotes the gradient operator when it is applied to a scalar-valued function and the Jacobian operator when applied to a vector-valued function. We use the notation ∇2 for the Hessian. In the case of a composite function, e.g., f(g(x,y,z),h(x,y,z)), by ∇ i f(g(x,y,z),h(x,y,z)) we denote the gradient of f with respect to its ith (vector) argument, computed at (g(x,y,z),h(x,y,z)). The full gradient of f with respect to the argument x is denoted by ∇ x f(g(x,y,z),h(x,y,z)). Similarly, by \(\nabla^{2}_{i,j} f(g(x,y,z),h(x,y,z))\) we denote the submatrix of the Hessian of f computed at (g(x,y,z),h(x,y,z)), whose first indices belong to the vector argument i and the second ones to the vector argument j. \(\nabla J_{t}^{o}(x_{t})\) is a column vector, and \(\nabla g_{t}^{o}(x_{t})\) is a matrix whose rows are the transposes of the gradients of the components of \(g_{t}^{o}(x_{t})\). We denote by \(g^{o}_{t,j}\) the jth component of the optimal policy function \(g^{o}_{t}\) (j=1,…,d). The other notations used in the proof are detailed in Sect. 3.
Proof of Proposition 3.1
(i) Let us first show by backward induction on t that \(J^{o}_{t} \in\mathcal{C}^{m}(X_{t})\) and, for every j∈{1,…,d}, \(g^{o}_{t,j} \in\mathcal{C}^{m-1}(X_{t})\) (which we also need in the proof). Since \(J^{o}_{N}=h_{N}\), we have \(J^{o}_{N} \in\mathcal{C}^{m}(X_{N})\) by hypothesis. Now, fix t and suppose that \(J^{o}_{t+1} \in\mathcal{C}^{m}(X_{t+1})\) and is concave. Let \(x_{t} \in\operatorname{int} (X_{t})\). As by hypothesis the optimal policy \(g^{o}_{t}\) is interior on \(\operatorname{int} (X_{t})\), the first-order optimality condition \(\nabla_{2} h_{t}(x_{t},g^{o}_{t}(x_{t}))+\beta\nabla J^{o}_{t+1}(g^{o}_{t}(x_{t}))=0\) holds. By the implicit function theorem we get
where \(\nabla^{2}_{2,2} (h_{t}(x_{t},g^{o}_{t}(x_{t})) )+ \beta \nabla^{2} J^{o}_{t+1}(g^{o}_{t}(x_{t}))\) is nonsingular as \(\nabla^{2}_{2,2} (h_{t}(x_{t},g^{o}_{t}(x_{t})) )\) is negative semidefinite by the α t -concavity of h t for α t >0, and \(\nabla^{2} J^{o}_{t+1}(g^{o}_{t}(x_{t}))\) is negative definite since \(J^{o}_{t+1}\) is concave.
By differentiating the two members of (39) up to derivatives of h t and \(J^{o}_{t+1}\) of order m, for j=1,…,d, we get \(g^{o}_{t,j} \in\mathcal {C}^{m-1}(\operatorname{int} (X_{t}))\). As the expressions that one can obtain for its partial derivatives up to the order m−1 are bounded and continuous not only on \(\operatorname{int} (X_{t})\), but on the whole X t , one has \(g^{o}_{t,j} \in \mathcal{C}^{m-1}(X_{t})\).
By differentiating the equality \(J^{o}_{t}(x_{t})=h_{t}(x_{t},g^{o}_{t}(x_{t}))+ \beta J^{o}_{t+1}(g^{o}_{t}(x_{t}))\) we obtain
![](http://media.springernature.com/full/springer-static/image/art%3A10.1007%2Fs10957-012-0118-2/MediaObjects/10957_2012_118_Equu_HTML.gif)
So, by the first-order optimality condition we get
By differentiating the two members of (40) up to derivatives of h t of order m, we obtain \(J^{o}_{t} \in\mathcal{C}^{m}(\operatorname{int} (X_{t}))\). Likewise for the optimal policies, this extends to \(J^{o}_{t} \in\mathcal{C}^{m}(X_{t})\).
In order to conclude the backward induction step, it remains to show that \(J^{o}_{t}\) is concave. This can be proved by the following direct argument. By differentiating (40) and using (39), for the Hessian of \(J^{o}_{t}\), we obtain
![](http://media.springernature.com/full/springer-static/image/art%3A10.1007%2Fs10957-012-0118-2/MediaObjects/10957_2012_118_Equv_HTML.gif)
which is Schur’s complement of \([\nabla^{2}_{2,2}h_{t}(x_{t},g^{o}_{t}(x_{t})) + \beta\nabla^{2} J^{o}_{t+1}(x_{t},g^{o}_{t}(x_{t})) ]\) in the matrix
Note that such a matrix is negative semidefinite, as it is the sum of the two matrices
which are negative-semidefinite as h t and \(J^{o}_{t+1}\) are concave and twice continuously differentiable. In particular, it follows by [54, p. 102] (which gives bounds on the eigenvalues of the sum of two symmetric matrices) that its maximum eigenvalue is smaller than or equal to α t . Then, it follows by Lemma 9.1 that \(J^{o}_{t}\) is concave (even α t -concave).
Thus, by backward induction on t and by the compactness of X t we conclude that, for every t=N,…,0, \(J^{o}_{t} \in\mathcal{C}^{m}(X_{t}) \subset\mathcal{W}^{m}_{p}(\operatorname{int}(X_{t}))\) for every 1≤p≤+∞.
(ii) As X t is bounded and convex, by Sobolev’s extension theorem [34, Theorem 5, p. 181, and Example 2, p. 189], for every 1≤p≤+∞, the function \(J^{o}_{t} \in\mathcal{W}^{m}_{p}(\operatorname{int}(X_{t}))\) can be extended on the whole ℝd to a function \(\bar {J}_{t}^{o,p} \in \mathcal{W}^{m}_{p}(\mathbb{R}^{d})\).
(iii) For 1<p<+∞, the statement follows by item (ii) and the equivalence between Sobolev spaces and Bessel potential spaces [34, Theorem 3, p. 135]. For p=1 and m≥2 even, it follows by item (ii) and the inclusion \(\mathcal{W}^{m}_{1}(\mathbb{R}^{d}) \subset\mathcal{B}^{m}_{1}(\mathbb{R}^{d})\) from [34, p. 160]. □
Proof of Proposition 3.2
(i) is proved likewise Proposition 3.1 by replacing \(J_{t+1}^{o}\) with \(\tilde{J}_{t+1}^{o}\) and \(g_{t}^{o}\) with \(\tilde{g}_{t}^{o}\).
(ii) Inspection of the proof of Proposition 3.1(i) shows that \(J_{t}^{o}\) is α t -concave (α t >0) for t=0,…,N−1, whereas the α N -concavity (α N >0) of \(J_{N}^{o}=h_{N}\) is assumed. By (12) and condition (10), \(\tilde{J}_{t+1,j}^{o}\) is concave for j sufficiently large. Hence, one can apply (i) to \(\tilde{J}_{t+1,j}^{o}\), and so there exists \(\hat{J}^{o,p}_{t,j} \in\mathcal{W}^{m}_{p}(\mathbb{R}^{d})\) such that \(T_{t} \tilde{J}_{t+1,j}^{o}=\hat{J}^{o,p}_{t,j}|_{X_{t}}\). Proceeding as in the proof of Proposition 3.1, one obtains equations analogous to (39) and (40) (with obvious replacements). Then, by differentiating \(T_{t} \tilde{J}_{t+1,j}^{o}\) up to the order m, we get
Finally, the statement follows by the continuity of the embedding of \(\mathcal{C}^{m}(X_{t})\) into \(\mathcal{W}^{m}_{p}(\operatorname{int} (X_{t}))\) (since X t is compact) and the continuity of the Sobolev’s extension operator. □
Proof of Proposition 4.1
(i) For ω∈ℝd, let M(ω)=max{∥ω∥,1}, ν be a positive integer, and define the set of functions
where \({\hat{f}}\) is the Fourier transform of f. For f∈Γ ν(ℝd), let
and for θ>0, denote by
the closed ball of radius θ in Γ ν(ℝd). By [55, Corollary 3.2]Footnote 3, the compactness of the support of ψ, and the regularity of its boundary (which allows one to apply the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168]), for s=⌊d/2⌋+1 and \(\psi\in\mathcal{S}^{q+s}\), there existsFootnote 4 C 1>0 such that, for every \(f \in B_{\theta}(\|\cdot\|_{\varGamma^{q+s+1}})\) and every positive integer n, there is \(f_{n} \in\mathcal{R}(\psi,n)\) such that
The next step consists in proving that, for every positive integer ν and s=⌊d/2⌋+1, the space \(\mathcal{W}^{\nu +s}_{2}(\mathbb{R}^{d})\) is continuously embedded in Γ ν(ℝd). Let \(f \in \mathcal{W}^{\nu+s}_{2}(\mathbb{R}^{d})\). Then
The first integral is finite by the Cauchy–Schwarz inequality and the finiteness of \(\int_{\|\omega\|\leq1} |{\hat{f}}({\omega})|^{2} \,d\omega \). To study the second integral, taking the hint from [37, p. 941], we factorize \(\|\omega\|^{\nu}|{\hat{f}}({\omega})| = a(\omega) b(\omega)\), where a(ω):=(1+∥ω∥2s)−1/2 and \(b(\omega) := \|\omega\|^{\nu}|{\hat{f}}({\omega})| (1+ \|\omega\|^{2s})^{1/2}\). By the Cauchy–Schwarz inequality,
The integral \(\int_{\mathbb{R}^{d}}a^{2}(\omega) \,d\omega= \int_{\mathbb{R}^{d}}(1+ \|\omega\|^{2s})^{-1} \,d\omega\) is finite for 2s>d, which is satisfied for all d≥1 as s=⌊d/2⌋+1. By Parseval’s identity [57, p. 172], since f has square-integrable νth and (ν+s)th partial derivatives, the integral \(\int_{\mathbb{R}^{d}}b^{2}(\omega) \,d\omega= \int_{\mathbb{R}^{d}} \| \omega\|^{2\nu} |{\hat{f}}({\omega})|^{2} (1+ \|\omega\|^{2s}) \,d\omega= \int_{\mathbb{R}^{d}} |{\hat{f}}({\omega})|^{2} (\|\omega\|^{2\nu} + \|\omega\|^{2(\nu+s)}) \,d\omega\) is finite. Hence, \(\int_{\mathbb{R} ^{d}}M(\omega)^{\nu}|{\hat{f}}({\omega})| \,d\omega\) is finite, so f∈Γ ν(ℝd), and, by the argument above, there exists C 2>0 such that \(B_{\rho}(\|\cdot\|_{\mathcal{W}^{\nu+s}_{2}}) \subset B_{C_{2} \rho}(\|\cdot\|_{\varGamma^{\nu}})\).
Taking ν=q+s+1 as required in (41) and C=C 1⋅C 2, we conclude that, for every \(f \in B_{\rho}(\|\cdot\|_{\mathcal{W}^{q + 2s+1}_{2}})\) and every positive integer n, there exists \(f_{n} \in\mathcal{R}(\psi,n)\) such that \(\max_{0\leq|\mathbf{r}|\leq q} \sup_{x \in X} \vert D^{\mathbf{r}} f(x) - D^{\mathbf{r}} f_{n}(x) \vert \leq C \frac{\rho}{\sqrt{n}}\).
(ii) Follows by [40, Theorem 2.1] and the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168], which allows to use “sup” in (20) instead of “\(\operatorname{ess\,sup}\)”.
(iii) Follows by [58, Corollary 5.2]. □
Proof of Proposition 4.2
(i) We detail the proof for t=N−1 and t=N−2; the other cases follow by backward induction.
Let us start with t=N−1 and \(\tilde{J}^{o}_{N}=J^{o}_{N}\). By Proposition 3.1(ii), there exists \(\bar{J}^{o,2}_{N-1} \in\mathcal {W}^{2+(2s+1)N}_{2}(\mathbb{R}^{d})\) such that \(T_{N-1} \tilde{J}^{o}_{N}=T_{N-1} J^{o}_{N}=J^{o}_{N-1}=\bar {J}^{o,2}_{N-1}|_{X_{N-1}}\).
By Proposition 4.1(i) with q=2+(2s+1)(N−1) applied to \(\bar{J}^{o,2}_{N-1}\), we obtain (22) for t=N−1. Set \(\tilde{J}^{o}_{N-1}=f_{N-1}\) in (22). By (22) and condition (10), there exists a positive integer \(\bar{n}_{N-1}\) such that \(\tilde{J}^{o}_{N-1}\) is concave for \(n_{N-1}\geq\bar{n}_{N-1}\).
Now consider t=N−2. By Proposition 3.2(i), it follows that there exists \(\hat{J}^{o,2}_{N-2} \in \mathcal{W}^{2+(2s+1)(N-1)}_{2}(\mathbb{R}^{d})\) such that \(T_{N-2} \tilde{J}^{o}_{N-1}=\hat{J}^{o,2}_{N-2}|_{X_{N-2}}\). By applying to \(\hat{J}^{o,2}_{N-2}\) Proposition 4.1(i) with q=2+(2s+1)(N−2), for every positive integer n N−2, we conclude that there exists \(f_{N-2} \in\mathcal{R}(\psi_{t},n_{N-2})\) such that
![](http://media.springernature.com/full/springer-static/image/art%3A10.1007%2Fs10957-012-0118-2/MediaObjects/10957_2012_118_Equ45_HTML.gif)
where, by Proposition 3.2(i), \(\hat {J}^{o,2}_{N-2} \in\mathcal{W}^{2 + (2s+1)(N-1)}_{2}(\mathbb{R}^{d})\) is a suitable extension of \(T_{N-2} \tilde{J}^{o}_{N-1}\) on ℝd, and \(\bar {C}_{N-2}>0\) does not depend on the approximations generated in the previous iterations. The statement for t=N−2 follows by the fact that the dependence of the bound (42) on \(\| \hat{J}^{o,2}_{N-2} \|_{\mathcal{W}^{2 + (2s+1)(N-1)}_{2}(\mathbb{R}^{d})}\) can be removed by exploiting Proposition 3.2(ii); in particular, we can choose C N−2>0 independently of n N−1. So, we get (22) for t=N−2. Set \(\tilde{J}^{o}_{N-2}=f_{N-2}\) in (22). By (22) and condition (10), there exists a positive integer \(\bar {n}_{N-2}\) such that \(\tilde{J}^{o}_{N-2}\) is concave for \(n_{N-2}\geq \bar{n}_{N-2}\).
The proof proceeds similarly for the other values of t; each constant C t can be chosen independently on n t+1,…,n N−1.
(ii) follows by Proposition 3.1(ii) (with p=+∞) and Proposition 4.1(ii).
(iii) follows by Proposition 3.1(iii) (with p=1) and Proposition 4.1(iii). □
Proof of Proposition 5.1
We first derive some constraints on the form of the sets A t,j and then show that the budget constraints (25) are satisfied if and only if the sets A t,j are chosen as in Assumption 5.1 (or are suitable subsets).
As the labor incomes y t,j and the interest rates r t,j are known, for t=1,…,N, we have
(the upper bound is achieved when all the consumptions c t,j are equal to 0), so the corresponding feasible sets A t,j are bounded from above by \(a_{t,j}^{\max}\). The boundedness from below of each A t,j follows from the budget constraints (25), which for c k,j =0 (k=t,…,N) are equivalent for t=N to
and for t=0,…,N−1 to \(a_{t,j} \prod_{k=t}^{N-1} (1+r_{k,j}) + \sum_{i=t}^{N-1} y_{i,j} \prod_{k=i}^{N-1} (1+r_{k,j}) + y_{N,j} \geq0 \), i.e.,
So, in order to satisfy the budget constraints (25), the constraints (43) and (44) have to be satisfied. Then the maximal sets A t that satisfy the budget constraints (25) have the form described in Assumption 5.1. □
Proof of Proposition 5.2
(a) About Assumption 3.1(i). By construction, the sets \(\bar{A}_{t}\) are compact, convex, and have nonempty interiors, since they are Cartesian products of nonempty closed intervals. The same holds for the \(\bar{D}_{t}\), since by (31) they are the intersections between \(\bar{A}_{t} \times\bar{A}_{t+1}\) and the sets D t , which are compact, convex, and have nonempty interiors too.
(b) About Assumption 3.1(ii). This is Assumption 5.2(i), with the obvious replacements of X t and D t .
(c) About Assumption 3.1(iii). Recall that for Problem \(\mathrm {OC}_{N}^{d}\) and t=0,…,N−1, we have
Then, \(h_{t} \in\mathcal{C}^{m}(\bar{D}_{t})\) by Assumption 5.2(ii) and (iii). As u(⋅) and v t,j (⋅) are twice continuously differentiable, the second part of Assumption 3.1(iii) means that there exists some α t >0 such that the function
has negative semi-definite Hessian with respect to the variables a t and a t+1. Assumption 5.2(ii) and easy computations show that the function \(u (\frac{(1+r_{t}) \circ (a_{t}+y_{t})-a_{t+1}}{1+r_{t}} )\) has negative semi-definite Hessian. By Assumption 5.2(iii), for each j=1,…,d and α t,j ∈(0,β t,j ], \(v_{t,j}(a_{t,j})+ \frac{1}{2}\alpha_{t,j} a_{t,j}^{2}\) has negative semi-definite Hessian too. So, Assumption 3.1(iii) is satisfied for every α t ∈(0,min j=1,…,d {β t,j }].
(d) About Assumption 3.1(iv). Recall that for Problem \(\mathrm {OC}_{N}^{d}\), we have h N (a N )=u(a N +y N ). Then, \(h_{N} \in\mathcal{C}^{m}(\bar{A}_{N})\) and is concave by Assumption 5.2(ii). □
Rights and permissions
About this article
Cite this article
Gaggero, M., Gnecco, G. & Sanguineti, M. Dynamic Programming and Value-Function Approximation in Sequential Decision Problems: Error Analysis and Numerical Results. J Optim Theory Appl 156, 380–416 (2013). https://doi.org/10.1007/s10957-012-0118-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-012-0118-2