Value-function approximation is investigated for the solution via Dynamic Programming (DP) of continuous-state sequential N-stage decision problems, in which the reward to be maximized has an additive structure over a finite number of stages. Conditions that guarantee smoothness properties of the value function at each stage are derived. These properties are exploited to approximate such functions by means of certain nonlinear approximation schemes, which include splines of suitable order and Gaussian radial-basis networks with variable centers and widths. The accuracies of suboptimal solutions obtained by combining DP with these approximation tools are estimated. The results provide insights into the successful performances appeared in the literature about the use of value-function approximators in DP. The theoretical analysis is applied to a problem of optimal consumption, with simulation results illustrating the use of the proposed solution methodology. Numerical comparisons with classical linear approximators are presented.
When the decision horizon goes to infinity.
Functions constant along hyperplanes are known as ridge functions. Each ridge function results from the composition of a multivariable function having a particularly simple form, i.e., the inner product, with an arbitrary function dependent on a single variable.
Proof of Proposition 2.2
(i) We use a backward induction argument. For t=N−1,…,0, assume that, at stage t+1, \(\tilde{J}_{t+1}^{o} \in\mathcal{F}_{t+1}\) is such that \(\sup_{x_{t+1} \in X_{t+1}} | J_{t+1}^{o}(x_{t+1})-\tilde{J}_{t+1}^{o}(x_{t+1}) |\leq{\eta}_{t+1}\) for some η t+1≥0. In particular, for t=N−1, one has η N =0, as \(\tilde{J}_{N}^{o} = J_{N}^{o}\). By (3), there exists \(f_{t}\in\mathcal{F}_{t}\) such that \(\sup_{x_{t} \in X_{t}} | (T_{t} \tilde{J}_{t+1}^{o})(x_{t})-f_{t}(x_{t}) | \leq \varepsilon_{t}\). Set \(\tilde{J}_{t}^{o}=f_{t}\). By the triangle inequality and Proposition 2.1,
Then, after N iterations we get \(\sup_{x_{0} \in X_{0}} | J_{0}^{o}(x_{0})-\tilde{J}_{0}^{o}(x_{0}) | \leq\eta_{0} = \varepsilon_{0} + \beta \eta_{1} = \varepsilon_{0} + \beta \varepsilon_{1} + \beta^{2} \eta_{2} = \dots:= \sum_{t=0}^{N-1}{\beta^{t}\varepsilon_{t}}\).
(ii) As before, for t=N−1,…,0, assume that, at stage t+1, \(\tilde{J}_{t+1}^{o} \in\mathcal{F}_{t+1}\) is such that \(\sup_{x_{t+1} \in X_{t+1}} | J_{t+1}^{o}(x_{t+1})-\tilde{J}_{t+1}^{o}(x_{t+1}) |\leq{\eta}_{t+1}\) for some η t+1≥0. In particular, for t=N−1, one has η N =0, as \(\tilde{J}_{N}^{o} = J_{N}^{o}\). Let \(\hat{J}_{t}^{o}=T_{t} \tilde{J}_{t+1}^{o}\). Proposition 2.1 gives
Before moving to the tth stage, one has to find an approximation \(\tilde{J}_{t}^{o} \in\mathcal{F}_{t}\) for \(J_{t}^{o}=T_{t} J_{t+1}^{o}\). Such an approximation has to be obtained from \(\hat{J}_{t}^{o}=T_{t} \tilde{J}_{t+1}^{o}\) (which, in general, may not belong to \(\mathcal{F}_{t}\)), because \(J_{t}^{o}=T_{t} {J}_{t+1}^{o}\) is unknown. By assumption, there exists \(f_{t} \in \mathcal{F}_{t}\) such that \(\sup_{x_{t} \in X_{t}} | J_{t}^{o}(x_{t})-f_{t}(x_{t}) | \leq \varepsilon_{t}\). However, in general, one cannot set \(\tilde{J}_{t}^{o}=f_{t}\), since on a neighborhood of radius βη t+1 of \(\hat{J}_{t}^{o}\) in the sup-norm, there may exist (besides \(J^{o}_{t}\)) some other function \(I_{t} \not= J_{t}^{o}\) which can also be approximated by some function \(\tilde{f}_{t} \in\mathcal{F}_{t}\) with error less than or equal to ε t . As \(J_{t}^{o}\) is unknown, in the worst case it happens that one chooses \(\tilde{J}_{t}^{o}=\tilde{f}_{t}\) instead of \(\tilde{J}_{t}^{o}=f_{t}\). In such a case, we get
Let η t :=2βη t+1+ε t . Then, after N iterations we have \(\sup_{x_{0} \in X_{0}} | J_{0}^{o}(x_{0})-\tilde {J}_{0}^{o}(x_{0}) | \leq\eta_{0} = \varepsilon_{0} + 2\beta \eta_{1} = \varepsilon_{0} + 2\beta \varepsilon_{1} + 4\beta^{2} \eta_{2} = \dots= \sum_{t=0}^{N-1}{(2\beta)^{t}\varepsilon_{t}}\). □
Proof of Proposition 2.3
Set η N/M =0 and for t=N/M−1,…,0, assume that, at stage t+1 of ADP(M), \(\tilde{J}_{t+1}^{o} \in\mathcal{F}_{t+1}\) is such that \(\sup_{x_{t+1} \in X_{t+1}} | J_{M\cdot (t+1)}^{o}(x_{t+1})-\tilde{J}_{t+1}^{o}(x_{t+1}) |\leq{\eta}_{t+1}\). Proceeding as in the proof of Proposition 2.2(i), we get the recursion η t =2β M η t+1+ε t (where β M replaces β since in each iteration of ADP(M) one can apply M times Proposition 2.1). □
In order to prove Proposition 3.1, we shall apply the following technical lemma (which readily follows by [53, Theorem 2.13, p. 69] and the example in [53, p. 70]). Given a square partitioned real matrix such that D is nonsingular, Schur’s complement
M/D of D in M is defined [53, p. 18] as the matrix M/D=A−BD
C. For a symmetric real matrix, we denote by λ
max its maximum eigenvalue.
Lemma 9.1
be a partitioned symmetric negative-semidefinite matrix such that
is nonsingular. Then
In the proof of the next theorem, we shall use the following notations. The symbol ∇ denotes the gradient operator when it is applied to a scalar-valued function and the Jacobian operator when applied to a vector-valued function. We use the notation ∇2 for the Hessian. In the case of a composite function, e.g., f(g(x,y,z),h(x,y,z)), by ∇ i f(g(x,y,z),h(x,y,z)) we denote the gradient of f with respect to its ith (vector) argument, computed at (g(x,y,z),h(x,y,z)). The full gradient of f with respect to the argument x is denoted by ∇ x f(g(x,y,z),h(x,y,z)). Similarly, by \(\nabla^{2}_{i,j} f(g(x,y,z),h(x,y,z))\) we denote the submatrix of the Hessian of f computed at (g(x,y,z),h(x,y,z)), whose first indices belong to the vector argument i and the second ones to the vector argument j. \(\nabla J_{t}^{o}(x_{t})\) is a column vector, and \(\nabla g_{t}^{o}(x_{t})\) is a matrix whose rows are the transposes of the gradients of the components of \(g_{t}^{o}(x_{t})\). We denote by \(g^{o}_{t,j}\) the jth component of the optimal policy function \(g^{o}_{t}\) (j=1,…,d). The other notations used in the proof are detailed in Sect. 3.
Proof of Proposition 3.1
(i) Let us first show by backward induction on t that \(J^{o}_{t} \in\mathcal{C}^{m}(X_{t})\) and, for every j∈{1,…,d}, \(g^{o}_{t,j} \in\mathcal{C}^{m-1}(X_{t})\) (which we also need in the proof). Since \(J^{o}_{N}=h_{N}\), we have \(J^{o}_{N} \in\mathcal{C}^{m}(X_{N})\) by hypothesis. Now, fix t and suppose that \(J^{o}_{t+1} \in\mathcal{C}^{m}(X_{t+1})\) and is concave. Let \(x_{t} \in\operatorname{int} (X_{t})\). As by hypothesis the optimal policy \(g^{o}_{t}\) is interior on \(\operatorname{int} (X_{t})\), the first-order optimality condition \(\nabla_{2} h_{t}(x_{t},g^{o}_{t}(x_{t}))+\beta\nabla J^{o}_{t+1}(g^{o}_{t}(x_{t}))=0\) holds. By the implicit function theorem we get
where \(\nabla^{2}_{2,2} (h_{t}(x_{t},g^{o}_{t}(x_{t})) )+ \beta \nabla^{2} J^{o}_{t+1}(g^{o}_{t}(x_{t}))\) is nonsingular as \(\nabla^{2}_{2,2} (h_{t}(x_{t},g^{o}_{t}(x_{t})) )\) is negative semidefinite by the α t -concavity of h t for α t >0, and \(\nabla^{2} J^{o}_{t+1}(g^{o}_{t}(x_{t}))\) is negative definite since \(J^{o}_{t+1}\) is concave.
By differentiating the two members of (39) up to derivatives of h t and \(J^{o}_{t+1}\) of order m, for j=1,…,d, we get \(g^{o}_{t,j} \in\mathcal {C}^{m-1}(\operatorname{int} (X_{t}))\). As the expressions that one can obtain for its partial derivatives up to the order m−1 are bounded and continuous not only on \(\operatorname{int} (X_{t})\), but on the whole X t , one has \(g^{o}_{t,j} \in \mathcal{C}^{m-1}(X_{t})\).
By differentiating the equality \(J^{o}_{t}(x_{t})=h_{t}(x_{t},g^{o}_{t}(x_{t}))+ \beta J^{o}_{t+1}(g^{o}_{t}(x_{t}))\) we obtain
So, by the first-order optimality condition we get
By differentiating the two members of (40) up to derivatives of h t of order m, we obtain \(J^{o}_{t} \in\mathcal{C}^{m}(\operatorname{int} (X_{t}))\). Likewise for the optimal policies, this extends to \(J^{o}_{t} \in\mathcal{C}^{m}(X_{t})\).
In order to conclude the backward induction step, it remains to show that \(J^{o}_{t}\) is concave. This can be proved by the following direct argument. By differentiating (40) and using (39), for the Hessian of \(J^{o}_{t}\), we obtain
which is Schur’s complement of \([\nabla^{2}_{2,2}h_{t}(x_{t},g^{o}_{t}(x_{t})) + \beta\nabla^{2} J^{o}_{t+1}(x_{t},g^{o}_{t}(x_{t})) ]\) in the matrix
Note that such a matrix is negative semidefinite, as it is the sum of the two matrices
which are negative-semidefinite as h t and \(J^{o}_{t+1}\) are concave and twice continuously differentiable. In particular, it follows by [54, p. 102] (which gives bounds on the eigenvalues of the sum of two symmetric matrices) that its maximum eigenvalue is smaller than or equal to α t . Then, it follows by Lemma 9.1 that \(J^{o}_{t}\) is concave (even α t -concave).
Thus, by backward induction on t and by the compactness of X t we conclude that, for every t=N,…,0, \(J^{o}_{t} \in\mathcal{C}^{m}(X_{t}) \subset\mathcal{W}^{m}_{p}(\operatorname{int}(X_{t}))\) for every 1≤p≤+∞.
(ii) As X t is bounded and convex, by Sobolev’s extension theorem [34, Theorem 5, p. 181, and Example 2, p. 189], for every 1≤p≤+∞, the function \(J^{o}_{t} \in\mathcal{W}^{m}_{p}(\operatorname{int}(X_{t}))\) can be extended on the whole ℝd to a function \(\bar {J}_{t}^{o,p} \in \mathcal{W}^{m}_{p}(\mathbb{R}^{d})\).
(iii) For 1<p<+∞, the statement follows by item (ii) and the equivalence between Sobolev spaces and Bessel potential spaces [34, Theorem 3, p. 135]. For p=1 and m≥2 even, it follows by item (ii) and the inclusion \(\mathcal{W}^{m}_{1}(\mathbb{R}^{d}) \subset\mathcal{B}^{m}_{1}(\mathbb{R}^{d})\) from [34, p. 160]. □
Proof of Proposition 3.2
(i) is proved likewise Proposition 3.1 by replacing \(J_{t+1}^{o}\) with \(\tilde{J}_{t+1}^{o}\) and \(g_{t}^{o}\) with \(\tilde{g}_{t}^{o}\).
(ii) Inspection of the proof of Proposition 3.1(i) shows that \(J_{t}^{o}\) is α t -concave (α t >0) for t=0,…,N−1, whereas the α N -concavity (α N >0) of \(J_{N}^{o}=h_{N}\) is assumed. By (12) and condition (10), \(\tilde{J}_{t+1,j}^{o}\) is concave for j sufficiently large. Hence, one can apply (i) to \(\tilde{J}_{t+1,j}^{o}\), and so there exists \(\hat{J}^{o,p}_{t,j} \in\mathcal{W}^{m}_{p}(\mathbb{R}^{d})\) such that \(T_{t} \tilde{J}_{t+1,j}^{o}=\hat{J}^{o,p}_{t,j}|_{X_{t}}\). Proceeding as in the proof of Proposition 3.1, one obtains equations analogous to (39) and (40) (with obvious replacements). Then, by differentiating \(T_{t} \tilde{J}_{t+1,j}^{o}\) up to the order m, we get
Finally, the statement follows by the continuity of the embedding of \(\mathcal{C}^{m}(X_{t})\) into \(\mathcal{W}^{m}_{p}(\operatorname{int} (X_{t}))\) (since X t is compact) and the continuity of the Sobolev’s extension operator. □
Proof of Proposition 4.1
(i) For ω∈ℝd, let M(ω)=max{∥ω∥,1}, ν be a positive integer, and define the set of functions
where \({\hat{f}}\) is the Fourier transform of f. For f∈Γ ν(ℝd), let
and for θ>0, denote by
the closed ball of radius θ in Γ ν(ℝd). By [55, Corollary 3.2]Footnote 3, the compactness of the support of ψ, and the regularity of its boundary (which allows one to apply the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168]), for s=⌊d/2⌋+1 and \(\psi\in\mathcal{S}^{q+s}\), there existsFootnote 4 C 1>0 such that, for every \(f \in B_{\theta}(\|\cdot\|_{\varGamma^{q+s+1}})\) and every positive integer n, there is \(f_{n} \in\mathcal{R}(\psi,n)\) such that
The next step consists in proving that, for every positive integer ν and s=⌊d/2⌋+1, the space \(\mathcal{W}^{\nu +s}_{2}(\mathbb{R}^{d})\) is continuously embedded in Γ ν(ℝd). Let \(f \in \mathcal{W}^{\nu+s}_{2}(\mathbb{R}^{d})\). Then
The first integral is finite by the Cauchy–Schwarz inequality and the finiteness of \(\int_{\|\omega\|\leq1} |{\hat{f}}({\omega})|^{2} \,d\omega \). To study the second integral, taking the hint from [37, p. 941], we factorize \(\|\omega\|^{\nu}|{\hat{f}}({\omega})| = a(\omega) b(\omega)\), where a(ω):=(1+∥ω∥2s)−1/2 and \(b(\omega) := \|\omega\|^{\nu}|{\hat{f}}({\omega})| (1+ \|\omega\|^{2s})^{1/2}\). By the Cauchy–Schwarz inequality,
The integral \(\int_{\mathbb{R}^{d}}a^{2}(\omega) \,d\omega= \int_{\mathbb{R}^{d}}(1+ \|\omega\|^{2s})^{-1} \,d\omega\) is finite for 2s>d, which is satisfied for all d≥1 as s=⌊d/2⌋+1. By Parseval’s identity [57, p. 172], since f has square-integrable νth and (ν+s)th partial derivatives, the integral \(\int_{\mathbb{R}^{d}}b^{2}(\omega) \,d\omega= \int_{\mathbb{R}^{d}} \| \omega\|^{2\nu} |{\hat{f}}({\omega})|^{2} (1+ \|\omega\|^{2s}) \,d\omega= \int_{\mathbb{R}^{d}} |{\hat{f}}({\omega})|^{2} (\|\omega\|^{2\nu} + \|\omega\|^{2(\nu+s)}) \,d\omega\) is finite. Hence, \(\int_{\mathbb{R} ^{d}}M(\omega)^{\nu}|{\hat{f}}({\omega})| \,d\omega\) is finite, so f∈Γ ν(ℝd), and, by the argument above, there exists C 2>0 such that \(B_{\rho}(\|\cdot\|_{\mathcal{W}^{\nu+s}_{2}}) \subset B_{C_{2} \rho}(\|\cdot\|_{\varGamma^{\nu}})\).
Taking ν=q+s+1 as required in (41) and C=C 1⋅C 2, we conclude that, for every \(f \in B_{\rho}(\|\cdot\|_{\mathcal{W}^{q + 2s+1}_{2}})\) and every positive integer n, there exists \(f_{n} \in\mathcal{R}(\psi,n)\) such that \(\max_{0\leq|\mathbf{r}|\leq q} \sup_{x \in X} \vert D^{\mathbf{r}} f(x) - D^{\mathbf{r}} f_{n}(x) \vert \leq C \frac{\rho}{\sqrt{n}}\).
(ii) Follows by [40, Theorem 2.1] and the Rellich–Kondrachov theorem [56, Theorem 6.3, p. 168], which allows to use “sup” in (20) instead of “\(\operatorname{ess\,sup}\)”.
(iii) Follows by [58, Corollary 5.2]. □
Proof of Proposition 4.2
(i) We detail the proof for t=N−1 and t=N−2; the other cases follow by backward induction.
Let us start with t=N−1 and \(\tilde{J}^{o}_{N}=J^{o}_{N}\). By Proposition 3.1(ii), there exists \(\bar{J}^{o,2}_{N-1} \in\mathcal {W}^{2+(2s+1)N}_{2}(\mathbb{R}^{d})\) such that \(T_{N-1} \tilde{J}^{o}_{N}=T_{N-1} J^{o}_{N}=J^{o}_{N-1}=\bar {J}^{o,2}_{N-1}|_{X_{N-1}}\).
By Proposition 4.1(i) with q=2+(2s+1)(N−1) applied to \(\bar{J}^{o,2}_{N-1}\), we obtain (22) for t=N−1. Set \(\tilde{J}^{o}_{N-1}=f_{N-1}\) in (22). By (22) and condition (10), there exists a positive integer \(\bar{n}_{N-1}\) such that \(\tilde{J}^{o}_{N-1}\) is concave for \(n_{N-1}\geq\bar{n}_{N-1}\).
Now consider t=N−2. By Proposition 3.2(i), it follows that there exists \(\hat{J}^{o,2}_{N-2} \in \mathcal{W}^{2+(2s+1)(N-1)}_{2}(\mathbb{R}^{d})\) such that \(T_{N-2} \tilde{J}^{o}_{N-1}=\hat{J}^{o,2}_{N-2}|_{X_{N-2}}\). By applying to \(\hat{J}^{o,2}_{N-2}\) Proposition 4.1(i) with q=2+(2s+1)(N−2), for every positive integer n N−2, we conclude that there exists \(f_{N-2} \in\mathcal{R}(\psi_{t},n_{N-2})\) such that
where, by Proposition 3.2(i), \(\hat {J}^{o,2}_{N-2} \in\mathcal{W}^{2 + (2s+1)(N-1)}_{2}(\mathbb{R}^{d})\) is a suitable extension of \(T_{N-2} \tilde{J}^{o}_{N-1}\) on ℝd, and \(\bar {C}_{N-2}>0\) does not depend on the approximations generated in the previous iterations. The statement for t=N−2 follows by the fact that the dependence of the bound (42) on \(\| \hat{J}^{o,2}_{N-2} \|_{\mathcal{W}^{2 + (2s+1)(N-1)}_{2}(\mathbb{R}^{d})}\) can be removed by exploiting Proposition 3.2(ii); in particular, we can choose C N−2>0 independently of n N−1. So, we get (22) for t=N−2. Set \(\tilde{J}^{o}_{N-2}=f_{N-2}\) in (22). By (22) and condition (10), there exists a positive integer \(\bar {n}_{N-2}\) such that \(\tilde{J}^{o}_{N-2}\) is concave for \(n_{N-2}\geq \bar{n}_{N-2}\).
The proof proceeds similarly for the other values of t; each constant C t can be chosen independently on n t+1,…,n N−1.
(ii) follows by Proposition 3.1(ii) (with p=+∞) and Proposition 4.1(ii).
(iii) follows by Proposition 3.1(iii) (with p=1) and Proposition 4.1(iii). □
Proof of Proposition 5.1
We first derive some constraints on the form of the sets A t,j and then show that the budget constraints (25) are satisfied if and only if the sets A t,j are chosen as in Assumption 5.1 (or are suitable subsets).
As the labor incomes y t,j and the interest rates r t,j are known, for t=1,…,N, we have
(the upper bound is achieved when all the consumptions c t,j are equal to 0), so the corresponding feasible sets A t,j are bounded from above by \(a_{t,j}^{\max}\). The boundedness from below of each A t,j follows from the budget constraints (25), which for c k,j =0 (k=t,…,N) are equivalent for t=N to
and for t=0,…,N−1 to \(a_{t,j} \prod_{k=t}^{N-1} (1+r_{k,j}) + \sum_{i=t}^{N-1} y_{i,j} \prod_{k=i}^{N-1} (1+r_{k,j}) + y_{N,j} \geq0 \), i.e.,
So, in order to satisfy the budget constraints (25), the constraints (43) and (44) have to be satisfied. Then the maximal sets A t that satisfy the budget constraints (25) have the form described in Assumption 5.1. □
Proof of Proposition 5.2
(a) About Assumption 3.1(i). By construction, the sets \(\bar{A}_{t}\) are compact, convex, and have nonempty interiors, since they are Cartesian products of nonempty closed intervals. The same holds for the \(\bar{D}_{t}\), since by (31) they are the intersections between \(\bar{A}_{t} \times\bar{A}_{t+1}\) and the sets D t , which are compact, convex, and have nonempty interiors too.
(b) About Assumption 3.1(ii). This is Assumption 5.2(i), with the obvious replacements of X t and D t .
(c) About Assumption 3.1(iii). Recall that for Problem \(\mathrm {OC}_{N}^{d}\) and t=0,…,N−1, we have
Then, \(h_{t} \in\mathcal{C}^{m}(\bar{D}_{t})\) by Assumption 5.2(ii) and (iii). As u(⋅) and v t,j (⋅) are twice continuously differentiable, the second part of Assumption 3.1(iii) means that there exists some α t >0 such that the function
has negative semi-definite Hessian with respect to the variables a t and a t+1. Assumption 5.2(ii) and easy computations show that the function \(u (\frac{(1+r_{t}) \circ (a_{t}+y_{t})-a_{t+1}}{1+r_{t}} )\) has negative semi-definite Hessian. By Assumption 5.2(iii), for each j=1,…,d and α t,j ∈(0,β t,j ], \(v_{t,j}(a_{t,j})+ \frac{1}{2}\alpha_{t,j} a_{t,j}^{2}\) has negative semi-definite Hessian too. So, Assumption 3.1(iii) is satisfied for every α t ∈(0,min j=1,…,d {β t,j }].
(d) About Assumption 3.1(iv). Recall that for Problem \(\mathrm {OC}_{N}^{d}\), we have h N (a N )=u(a N +y N ). Then, \(h_{N} \in\mathcal{C}^{m}(\bar{A}_{N})\) and is concave by Assumption 5.2(ii). □
