Abstract
We present and discuss the theory of minimax I- and D-robust designs on a finite design space, and detail three methods for their construction that are new in this context: (i) a numerical search for the optimal parameters in a provably minimax robust parametric class of designs, (ii) a first-order iterative algorithm similar to that of Wynn (Ann Math Stat 5:1655–1664, 1970), and (iii) response-adaptive designs. These designs minimize a loss function, based on the mean squared error of the predicted responses or the parameter estimates, when the regression response is possibly misspecified. The loss function being minimized has first been maximized over a neighbourhood of the approximate and possibly inadequate response being fitted by the experimenter. The methods presented are all vastly more economical, in terms of the computing time required, than previously available algorithms.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agboto, V., Li, W., Nachtsheim, C.: A comparison of three approaches for constructing robust experimental designs. J. Stat. Theory Pract. 5, 1–11 (2011)
Ahipaşaoğlu, S.D.: A first-order algorithm for the A-optimal experimental design problem: a mathematical programming approach. Stat. Comput. 25, 1113–1127 (2015)
Bates, D.M., Watts, D.G.: Nonlinear Regression Analysis and Its Applications. Wiley, New York (1988)
Chaudhuri, P., Mykland, P.A.: Nonlinear experiments: optimal design and inference based on likelihood. J. Am. Stat. Assoc. 88, 538–546 (1993)
Cruz, J.A.R., Pereira, A.G.C.: The elitist non-homogeneous genetic algorithm: almost sure convergence. Stat. Probab. Lett. 83, 2179–2185 (2013)
Dette, H., Melas, V.B.: A note on all-bias designs with applications in spline regression models. J. Stat. Plan. Inference 140, 2037–2045 (2010)
Esteban-Bravo, M., Leszkiewicz, A., Vidal-Sanz, J. M.: Exact optimal experimental designs with constraints. Stat. Comput. (in press). doi:10.1007/s11222-016-9658-x (2016)
Fang, Z., Wiens, D.P.: Integer-valued, minimax robust designs for estimation and extrapolation in heteroscedastic, approximately linear models. J. Am. Stat. Assoc. 95, 807–818 (2000)
Gotwalt, C.M., Jones, B.A., Steinberg, D.M.: Fast computation of designs robust to parameter uncertainty for nonlinear settings. Technometrics 51, 88–95 (2009)
Huber, P.J.: Robustness and designs. In: Srivastava, J.N. (ed.) A Survey of Statistical Design and Linear Models, pp. 287–303. North Holland, Amsterdam (1975)
Imhof, L.: Exact designs minimising the integrated variance in quadratic regression. Statistics 34, 103–115 (2000)
Karami, J.H., Wiens, D.P.: Robust static designs for approximately specified nonlinear regression models. J. Stat. Plan. Inference 144, 55–62 (2014)
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, pp. 1942–1948 (1995)
Kiefer, J., Wolfowitz, J.: The equivalence of two extremum problems. Can. J. Math. 12, 363–366 (1960)
Linssen, H.N.: Nonlinear measures: a case study. Stat. Neerl. 29, 93–99 (1975)
Magnus, J.R.: On differentiating eigenvalues and eigenvectors. Econom. Theory 1, 179–191 (1985)
Pukelsheim, F.: Optimal Design of Experiments. Wiley, Neew York (1993)
Pukelsheim, F., Rieder, S.: Efficient rounding of approximate designs. Biometrika 79, 763–770 (1992)
Pukelsheim, F., Torsney, B.: Optimal weights for experimental designs on linearly independent support points. Ann. Stat. 19, 1614–1625 (1991)
Shi, P., Ye, J., Zhou, J.: Minimax robust designs for misspecified regression models. Can. J. Stat. 31, 397–414 (2003)
Silvey, S.D., Titterington, D.M., Torsney, B.: An algorithm for optimal designs on a finite design space. Commun. Stat. Theory Methods 14, 1379–1389 (1978)
Smucker, B.J., del Castillo, E., Rosenberger, J.: Exchange algorithms for constructing model-robust experimental designs. J. Qual. Technol. 43, 1–15 (2011)
Titterington, D.M.: Algorithms for computing D-optimal designs on a finite design space. In: Conference on Information Sciences and Systems. Department of Electrical Engineering, Johns Hopkins University of Baltimore, pp. 213–216 (1976)
Torsney, B.: A moment inequality and monotonicity of an algorithm. In: Kortanek, K.O., Fiacco, A.V. (eds.) Proceedings of the International Symposium on Semi-infinite Programming and Applications, Lecture Notes in Economics and Mathematical Systems, vol. 215. University of Texas, Austin, pp. 249–260 (1983)
Welsh, A.H., Wiens, D.P.: Robust model-based sampling designs. Stat. Comput. 23, 689–701 (2013)
Wiens, D.P.: Robust, minimax designs for multiple linear regression. Linear Algebra Its Appl. Second Spec. Issue Linear Algebra Stat. 127, 327–340 (1990)
Wiens, D.P.: Minimax designs for approximately linear regression. J. Stat. Plan. Inference 31, 353–371 (1992)
Wiens, D.P.: Robustness of Design, Chapter 20, Handbook of Design and Analysis of Experiments. Chapman & Hall/CRC, New York (2015)
Wynn, H.P.: The sequential generation of \(D\)-optimum experimental designs. Ann. Math. Stat. 5, 1655–1664 (1970)
Ye, J.J., Zhou, J.: Existence and symmetry of minimax regression designs. J. Stat. Plan. Inference 137, 344–354 (2007)
Yu, Y.: Strict monotonicity and convergence rate of Titterington’s algorithm for computing D-optimal designs. Comput. Stat. Data Anal. 54, 1419–1425 (2010a)
Yu, Y.: Monotonic convergence of a general algorithm for computing optimal designs. Ann. Stat. 38, 1593–1606 (2010b)
Acknowledgements
This work was carried out with the support of the Natural Sciences and Engineering Research Council of Canada. It has benefited greatly from the insightful comments of two anonymous referees.
Author information
Authors and Affiliations
Corresponding author
Appendix: Derivations
Appendix: Derivations
Proof of Theorem 1
First note that by (3), \(\varvec{\psi }\) is orthogonal to the columns of \( {\mathbf {Q}}\) and hence lies in the column space of the \(N\times N-p\) matrix \( {\mathbf {Q}}_{\perp }\), whose orthonormal columns span the orthogonal complement \(col\left( {\mathbf {Q}}\right) ^{\perp }\) and hence are orthogonal to those of \({\mathbf {Q}}\). Thus, using (4), \(\varvec{ \psi }=\left( \tau /\sqrt{n}\right) {\mathbf {Q}}_{\perp }{\mathbf {c}}\) for some \( {\mathbf {c}}\) with \(\left\| {\mathbf {c}}\right\| \le 1\). We repeatedly use the identity \({\mathbf {Q}}_{\perp }{\mathbf {Q}}_{\perp }^{\prime }={\mathbf {I}} _{N}-{{\mathbf {Q}}}{{\mathbf {Q}}}^{\prime }\), from which it follows that
In this notation, and writing the QR-decomposition as \({\mathbf {F}}={{\mathbf {Q}}}{{\mathbf {V}}} \) for a \(p\times p\) triangular \({\mathbf {V}}\), we have \(\mathbf {{\mathbf {M}}= {\mathbf {V}}^{\prime }{\mathbf {R}}\left( {\varvec{\xi }}\right) {\mathbf {V}}}\). Our assumption that \({{\mathbf {M}}}\) be non-singular implies that \({{\mathbf {R}}}\) and \({{\mathbf {V}}}\) are invertible. Also, \(\mathbf { b}=\left( \tau /\sqrt{n}\right) {{\mathbf {V}}^{\prime }Q^{\prime }D\left( {{\varvec{\xi }}}\right) Q}_{\perp }{\mathbf {c}}\) and so
For the I-criterion, \(\max _{\psi }{\mathcal {I}}\left( \psi ,{\varvec{\xi }} \right) \) is \(\left( \sigma _{\varepsilon }^{2}+\tau ^{2}\right) /n\) times
upon using (26). Here, we make use of the fact that the nonzero eigenvalues of a product \({{\mathbf {A}}}{{\mathbf {A}}}^{\prime }\) coincides with those of \( {\mathbf {A}}^{\prime }{\mathbf {A}}\). For the D-criterion, we have that \( \max _{\psi }{\mathcal {D}}\left( \psi ,{\varvec{\xi }}\right) \) is the p-th root of
now using (26) once again completes the proof. \(\square \)
Proof of Theorem 2
A necessary condition for a minimum at \({\varvec{\xi }}_{0}\) is that, for any design \( {\varvec{\xi }}_{1}\) and with \({\varvec{\xi }}_{t}=\left( 1-t\right) {\varvec{\xi }}_{0}+t{\varvec{\xi }}_{1}\),
We calculate that
whence
By Theorem 1 of Magnus (1985), \(\lambda _{I}\left( {\varvec{\xi }} _{t}\right) \) is differentiable, with
Now, let \(\left\{ {\mathbf {q}}_{i}^{\prime }\right\} _{i=1}^{N}\) be the rows of \({\mathbf {Q}}\), and substitute \({\mathbf {R}}\left( {\varvec{\xi }}_{1}\right) =\sum _{i=1}^{N}\xi _{1,i}{\mathbf {q}}_{i}{\mathbf {q}}_{i}^{\prime }\) and \( {\mathbf {Q}}^{\prime }{\mathbf {D}}\left( {\varvec{\xi }}_{1}\right) {\mathbf {D}} \left( {\varvec{\xi }}_{0}\right) {\mathbf {Q}}=\sum _{i=1}^{N}\xi _{1,i}\xi _{0,i}{\mathbf {q}}_{i}{\mathbf {q}}_{i}^{\prime }\) into (27), (28) and (29). One obtains
Thus (27) is equivalent to
Condition (31) implies that \({\mathbf {T}}_{ii}^{{\mathcal {I}}}\left( {\varvec{\xi }}_{0}\right) / \left[ \left( 1-\nu \right) tr{\mathbf {R}} ^{-1}\left( {\varvec{\xi }}_{0}\right) \right] -1\le 0\) for all i, and then, since
that equality must hold at the points at which \(\xi _{0,i}>0\). \(\square \)
Proof of Theorem 3
The method is the same as for the proof of Theorem 2 and so we mention only the differences. We phrase the necessary condition as
here we have used
From the definitions of \({\mathbf {v}}\left( {\varvec{\xi }}\right) \) and \( {\mathbf {w}}\left( {\varvec{\xi }}\right) \), \({\mathbf {v}}^{\prime }\left( {\varvec{\xi }}_{t}\right) {\mathbf {w}}\left( {\varvec{\xi }}_{t}\right) =1 \) and
Another appeal to Magnus (1985) gives
Then, after a calculation, (32) becomes (9), with equality if \(\xi _{0i}>0\) since
\(\square \)
Proof of Theorem 4
Note that for a fixed positive definite matrix \({{\mathbf {R}}}\), the class \(\varXi _{{{\mathbf {R}}}}\) of designs \({\varvec{\xi }}\) for which \({{\mathbf {R}}\left( {\varvec{\xi }}\right) ={\mathbf {R}}}\) is convex, and for \( {\varvec{\xi }}\in \varXi _{{{\mathbf {R}}}}\) both \({\mathcal {I}}_{\nu }\left( {\varvec{\xi }}\right) \) and \({\mathcal {D}}_{\nu }\left( \varvec{ \xi }\right) \) are convex. In the case of \({\mathcal {I}}_{\nu }\left( {\varvec{\xi }}\right) \) this follows from the fact that for fixed \({{\mathbf {R}}}\), \({\mathcal {I}}_{\nu }\left( {\varvec{\xi }}\right) \) is a linear function of
now note that, with \({\varvec{\xi }}_{t}=\left( 1-t\right) {\varvec{\xi }}_{0}+t{\varvec{\xi }}_{1}\) (\(t\in [0,1]\)), and with \(\preccurlyeq \) denoting the Loewner ordering by positive semi-definiteness,
Thus, \({\mathbf {U}}\left( {\varvec{\xi }}_{t}\right) \preccurlyeq \left( 1-t\right) {\mathbf {U}}\left( {\varvec{\xi }}_{0}\right) +t{\mathbf {U}}\left( {\varvec{\xi }}_{1}\right) \) and (since the maximum eigenvalue of a sum is less than or equal to the sum of the maximum eigenvalues),
The verification of the convexity of \({\mathcal {D}}_{\nu }\left( \varvec{ \xi }\right) \) for \({\varvec{\xi }}\in \varXi _{{{\mathbf {R}}}}\) is similar.
The losses \({\mathcal {I}}_{\nu }\left( {\varvec{\xi }}\right) \) and \( {\mathcal {D}}_{\nu }\left( {\varvec{\xi }}\right) \) are minimized by minimizing \({\mathcal {I}}_{\nu }\left( {\varvec{\xi }}|{{\mathbf {R}}} \right) =ch_{\max }{\mathbf {U}}\left( {\varvec{\xi }}\right) \) and \(\mathcal { D}_{\nu }\left( {\varvec{\xi }}|{{\mathbf {R}}}\right) =ch_{\max } {{\mathbf {R}}}^{1/2}{\mathbf {U}}\left( {\varvec{\xi }}\right) { {\mathbf {R}}}^{1/2}\), respectively. We first consider the minimization of \( {\mathcal {I}}_{\nu }\left( {\varvec{\xi }}|{{\mathbf {R}}}\right) \). In order that \({\varvec{\xi }}_{0}\) minimize \({\mathcal {I}}_{\nu }\left( {\varvec{\xi }}|{{\mathbf {R}}}\right) \) in \(\varXi _{{{\mathbf {R}}} } \), it is sufficient that the function
be minimized unconditionally at \(t=0\) and satisfy the side conditions. We consider only \({\varvec{\xi }}_{1}\) with non-negative elements; that these sum to one is enforced by the Lagrangian. Here, \(\alpha \) and the symmetric matrix \({\varvec{\varLambda }}\) are Lagrange multipliers. Since the maximum eigenvalue is simple, we have by Magnus (1985) that \({\mathcal {I}}_{\nu }\left( {\varvec{\xi }}_{t}|{{\mathbf {R}}}\right) \), hence \(\phi \left( t\right) \), is differentiable as well as convex, so that minimization of \(\phi \left( t\right) \) at \(t=0\) is equivalent to the condition \(\phi ^{\prime }\left( 0\right) \ge 0\), for all \({\varvec{\xi }}_{1}.\)
Denote by \({\mathbf {z}}_{I}\left( {\varvec{\xi }}_{0}|{{\mathbf {R}}} \right) \) the unit eigenvector belonging to the maximum eigenvalue of \( {\mathbf {U}}\left( {\varvec{\xi }}_{0}|{{\mathbf {R}}}\right) ={\mathbf {R}} ^{-1}{\mathbf {S}}\left( {\varvec{\xi }}_{0}\right) {\mathbf {R}}^{-1}\), so that \(\left\| {\mathbf {z}}_{I}\left( {\varvec{\xi }}_{0}|{{\mathbf {R}}} \right) \right\| =1\) and \({\mathcal {I}}_{\nu }\left( {\varvec{\xi }}_{0}| {{\mathbf {R}}}\right) ={\mathbf {z}}_{I}^{\prime }({\varvec{\xi }}_{0}| {{\mathbf {R}}})\left[ {\mathbf {R}}^{-1}{\mathbf {S}}\left( {\varvec{\xi }} _{0}\right) {\mathbf {R}}^{-1}\right] {\mathbf {z}}_{I}({\varvec{\xi }}_{0}| {{\mathbf {R}}})\). By Theorem 1 of Magnus (1985),
whence
In order that \(\phi ^{\prime }\left( 0\right) \) be non-negative for all \(\xi _{1}\) it is necessary and sufficient that
where \(\varvec{\beta }={\mathbf {R}}^{-1}{\mathbf {z}}_{I}({\varvec{\xi }} _{0}|\mathbf {{\mathbf {R}}})\).
We verify the sufficiency of (34); the proof of necessity runs along the same lines. If (34) holds, then the set \(\left\{ i\left| {}\right. \xi _{0,i}>0\right\} \) coincides with the set \(\left\{ i\left| {}\right. {\mathbf {q}}_{i}^{\prime }{\varvec{\varLambda }}{\mathbf {q}}_{i}+\alpha >0\right\} \) and so (33) is
for non-negative \(\left\{ \xi _{1,i}\right\} \).
The multipliers \(\alpha \) and \({\varvec{\varLambda }}\) are to satisfy \( {\mathbf {1}}_{N}^{\prime }{\varvec{\xi }}_{0}=1\) and
since (34) is overparameterized we can as well take \(\left\| \varvec{\beta }\right\| =1\). Then, \({\mathbf {R}}\) is varied so as to minimize \({\mathcal {I}}_{\nu }\left( {\varvec{\xi }}\right) =\left( 1-\nu \right) tr{\mathbf {R}}^{-1}\left( {\varvec{\xi }}\right) +\ \nu {\mathcal {I}} _{\nu }\left( {\varvec{\xi }}|{{\mathbf {R}}}\right) \). It is simpler however to define \({\varvec{\xi }}_{0}\) parametrically by (34) and to then choose the parameters to minimize (7a), subject to \(\mathbf {1 }_{N}^{\prime }{\varvec{\xi }}_{0}=1\) and \(\left\| \varvec{\beta } \right\| =1\), and with \({\mathbf {R}}\) defined by (35).
In this formulation it is clear that the solution for the D-criterion has the same form. \(\square \)
We give only the proof of Theorem 5, omitting the very similar proof of Theorem 6.
Proof of Theorem 5
Once (16) is established, the proof of Theorem 5 is completed by noting that, using (12) and then the definition of \( i^{*}\),
This implies (17). The inequality in (36) is strict, i.e. the maximum strictly exceeds the weighted average, unless \(\varvec{ \xi }_{n}\) places all mass on points satisfying (18).
The \(O\left( n^{-1}\right) \) term in (16) may be obtained from ( 30), by setting \(t=1/\left( n+1\right) \), \({\varvec{\xi }}_{0}= {\varvec{\xi }}_{n}\) and \({\varvec{\xi }}_{1}\) the one-point design \( \varDelta _{i}\) with all mass at \({\mathbf {x}}_{i}\) in the expansion
But a discussion of the order of the remainder requires us to take a longer, more detailed approach. We first require updated values of \({\mathbf {R}}_{n}= {\mathbf {R}}\left( {\varvec{\xi }}_{n}\right) \), \({\mathbf {U}}_{n}={\mathbf {U}} \left( {\varvec{\xi }}_{n}\right) \), and the maximum eigenvalue and corresponding eigenvector \(\lambda _{n}=\lambda \left( {\varvec{\xi }} _{n}\right) \) and \({\mathbf {z}}_{n}={\mathbf {z}}\left( {\varvec{\xi }} _{n}\right) \) of \({\mathbf {U}}_{n}\). From
we obtain
Then
with
for
Substituting (37) into \({\mathbf {U}}_{n+1}^{(i)}=\left[ {\mathbf {R}} _{n+1}^{(i)}\right] ^{-1}{\mathbf {S}}_{n+1}^{(i)}\left[ {\mathbf {R}}_{n+1}^{(i)} \right] ^{-1}\) yields, after a calculation,
for
Substituting these expressions into \({\mathcal {I}}_{\nu }\left( \varvec{ \xi }_{n+1}^{(i)}\right) =\left( 1-\nu \right) tr\left[ {\mathbf {R}} _{n+1}^{(i)}\right] ^{-1}+\nu ch_{\max }\left[ {\mathbf {U}}_{n+1}^{(i)}\right] \) gives
where we define \(\rho _{i}\left( t\right) =ch_{\max }{\mathbf {A}}_{i}\left( t\right) \) for \({\mathbf {A}}_{i}\left( t\right) ={\mathbf {U}}_{n}-t{\mathbf {V}} _{n,i}-t^{2}{\mathbf {W}}_{n,i}\). Now let \({\mathbf {z}}_{(i)}\left( t\right) \) denote the eigenvector of unit norm corresponding to \(\rho _{i}\left( t\right) \), set \({\mathbf {Y}}_{i}\left( t\right) =\phi \left( t\right) \mathbf { I}_{p}-{\mathbf {A}}_{i}\left( t\right) \) and denote by \({\mathbf {Y}}_{i}^{+}\) the Moore-Penrose inverse. Then, for some \(t_{n}\in \left[ 0,1\right] \), \( \rho _{i}\left( \frac{1}{n}\right) =\rho _{i}\left( 0\right) +\rho _{i}^{\prime }\left( 0\right) /n+\frac{1}{2}\rho _{i}^{\prime \prime }\left( \frac{t_{n}}{n}\right) /n^{2}\). Using Theorems 1 and 4 of Magnus (1985) to evaluate the derivatives, this becomes
where
Substituting (40) and (39) into (38) gives ( 16), with
Under the assumption on the eigenvalues of \({\mathbf {R}}_{n}\) the elements of \( {\mathbf {R}}_{n}^{-1}\), \({\mathbf {V}}_{n,i}\), and \({\mathbf {W}}_{n,i}\) remain bounded, hence so does \(\beta _{n,i}^{I}\). \(\square \)
Proof of Theorem 7
(i) Write the statements of Theorems 5 and 6 as
where
Summing (41) over \(M_{1}\le n\le M_{2}-1\) and defining a positive, increasing sequence \(\left\{ a_{M_{2}}\right\} _{M_{2}>M_{1}}\) by \( a_{M_{2}}=\sum _{M_{1}}^{M_{2}-1}\left( \alpha _{n}/n\right) \) gives \( l_{M_{1}}-l_{M_{2}}=a_{M_{2}}-\sum _{M_{1}}^{M_{2}-1}\left( \beta _{n}/n^{2}\right) \). Since \(\sum _{n}n^{-2}\) is convergent and \(\left\{ \beta _{n}\right\} \) is bounded, for given \(\varepsilon >0\), we can then find \( M_{0} \) such that for \(M_{2}>M_{1}>M_{0}\),
This implies first that \(l_{M_{2}}\in \left( 0,l_{M_{1}}+\varepsilon /2\right) \), so that \(\left\{ l_{n}\right\} \) is bounded, and then that \( a_{M_{2}}\in \left( 0,l_{M_{1}}-l_{M_{2}}+\varepsilon /2\right) \), so that \( \left\{ a_{M_{2}}\right\} _{M_{2}>M_{1}}\) is bounded as well as increasing, hence convergent, hence Cauchy. So there is \(M_{0}^{\prime }\) such that for \( M_{2}>M_{1}>M_{0}^{\prime }\) we have
This together with (42) gives \(\left| l_{M_{1}}-l_{M_{2}}\right| <\varepsilon \) for \(M_{2}>M_{1}>\max \left( M_{0},M_{0}^{\prime }\right) \), i.e. \(\left\{ l_{n}\right\} \) is Cauchy, hence convergent.(ii) From (43) \(\sum _{n}\left( \alpha _{n}/n\right) \) converges even though \(\sum _{n}n^{-1}\) diverges; hence for every \( \varepsilon >0\), there must be terms \(\alpha _{n}<\varepsilon \). Thus, there is a subsequence \(\left\{ \alpha _{n_{j}}\right\} _{j=1}^{\infty }\) with \( \alpha _{n_{j}}\downarrow 0\). This implies (21):
\(\square \)
Rights and permissions
About this article
Cite this article
Wiens, D.P. I-robust and D-robust designs on a finite design space. Stat Comput 28, 241–258 (2018). https://doi.org/10.1007/s11222-017-9728-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-017-9728-8
Keywords
- Alphabetic optimality
- Asymptotic monotonicity
- Asymptotic optimality
- D-optimal
- D-robust
- I-optimal
- I-robust
- Linear regression
- Model misspecification
- Nonlinear regression
- Particle swarm optimization
- Response-adaptive design
- Sequential design