Bayesian sequential D-D optimal model-robust designs

https://doi.org/10.1016/j.csda.2003.09.014Get rights and content

Abstract

Alphabetic optimal design theory assumes that the model for which the optimal design is derived is known. However in practice, this assumption may not be credible, as models are rarely known in advance. Therefore, optimal designs derived under the classical approach may be the best design but for the wrong assumed model. The Bayesian two-stage approach to design experiments for the general linear model when initial knowledge of the model is poor, is reviewed and extended. A Bayesian optimality procedure that works well under model uncertainty is used in the first stage and the second stage design is then generated from an optimality procedure that incorporates the improved model knowledge from the first stage. In this way, the Bayesian D-D optimal model-robust design is developed. Results show that the Bayesian D-D optimal designs are in general superior in performance to the classical one-stage D-optimal and the one-stage Bayesian D-optimal designs. The ratio of sample sizes for the two stages and the minimum sample size desirable in the first stage is also examined in a simulation study.

Introduction

In the context of the classical linear model y=Xβ+ε,y denotes the n×1 vector of observations and X is the extended design matrix of size n×p. β is the p×1 vector of regression coefficients and ε is the n×1 vector of random error terms assumed to be independent and identically normally distributed with zero mean and common variance σ2In. In the full rank model, XX is non-singular and the least-squares estimate of β is β̂=(XX)−1Xy which is equivalent to the maximum likelihood estimator under normal errors. Also the variance–covariance matrix of β̂ is equal to cov(β̂)=σ2(XX)−1.

Classical design optimality theory involves selecting the rows of X so as to optimize some function of the Fisher's information matrix (XX). The most popular measure of design criterion is the D-optimality criterion which minimizes the determinant of the variance–covariance matrix of the parameter estimates. As can be seen, by minimizing |(XX)−1| a lot of effort is concentrated on precise estimation of parameters of the model at the expense of checking whether the assumed model is appropriate. Therefore in the literature, criticisms on optimal design theory have centered around the fact that they are frequently quite sensitive to the form of the model used (see for example Box, 1982 or the historical review by Myers et al., 1989). It is thus fundamental that designs which are less dependent on the effect of model misspecification be developed. Box and Draper (1959) were the first authors to consider this issue in depth. They argued that a more appropriate criterion for such model-robust designs is one which uses the average mean squared error over the region of interest. Steinberg and Hunter (1984) give a nice overview with extensive references on these model-robust and model-sensitive designs. See also Chang and Notz (1996) for their good summary on model robust designs.

The Bayesian approach to design optimality has gained popularity among research workers over the recent years as a way to address this problem of strong model dependence of optimality criteria. The basic idea to Bayesian design optimality is to choose the design that maximizes posterior information about some or all of the β's conditional on the prior information available. The notion of a Bayesian optimality criterion for linear models was introduced by Covey-Crump and Silvey (1970). For general discussions on Bayesian designs see for example, Chaloner and Verdinelli (1995) and Clyde (2001).

Dette (1993) also developed Bayesian D-optimal and model-robust designs in linear regression models. The interesting work by DuMouchel and Jones (1994) illustrates a very elegant use of Bayesian methods to obtain designs which are more resistant to the bias caused by an incorrect model. Similar type of work has been done by Andere-Rendon et al. (1997) for mixture models where they show that the performance of their Bayesian designs are superior to standard D-optimal designs by producing smaller bias errors and improved coverage over the factor space.

Another strategy to develop robust designs found in the literature is the use of Bayesian sequential procedures. With this approach, it is possible to develop designs in two or more stages that lead to less dependence on model specification. The idea behind the sequential designs is intuitively appealing in the sense that the experimenter could have the opportunity to revise the design and model in the course of the experiment.

Box and Lucas (1959) were among the first authors to discuss designs in non-linear situations and the general notion of a sequential approach. Throughout the optimal design literature, the sequential approach has been studied and developed within the non-linear framework. This is natural as the classical alphabetic optimality criterion requires prior knowledge of the model parameters β due to the non-linearity of the problem. In this context, several two-stage designs have been developed. Abdelbasit and Plackett (1983) suggested a two-stage procedure to derive optimal designs for binary responses. Minkin (1987) generalized the idea of Abdelbasit and Plackett and proposed an improvement to their two-stage proposal. Letsinger (1995) developed two-stage designs for the logistic regression model. Sitter and Wu (1995) proposed two-stage designs for quantal response studies where there may be insufficient knowledge on a new therapeutic treatment or compound for the dose levels to be chosen properly. Myers et al. (1996) also proposed a two-stage procedure for the logistic regression that uses D-optimality in the first stage followed by Q-optimality in the second. Myers (1999) gives a good discussion on Bayesian and two-stage designs in his reflection paper on the current status and future directions of response surface methodology.

Development of two-stage designs for linear models has been scant in the literature. Neff (1996) developed Bayesian two-stage designs under model uncertainty for mean estimation models. Montepiedra and Yeh (1998) developed a two-stage strategy for the construction of D-optimal approximate designs for linear models. Lin et al. (2000) developed Bayesian two-stage D-D optimal designs for mixture models.

In this paper, we review the approach of Neff (1996) for generating two-stage designs with reduced dependence on regressor specification. We propose some modifications in her algorithm to handle uncertainty in the prior distribution specifications using the approach proposed by Box and Meyer (1992) and also used by Lin et al. (2000). A few examples of two-stage designs are presented in Section 3 using the updated algorithm. In Section 4, the two-stage designs are compared relative to the classical one-stage designs and the approach of Neff (1996). Finally, in Section 5, a simulation study is conducted to investigate the allocation of sample sizes in the two stages and the desirable number of runs in the first stage.

Section snippets

Bayesian two-stage designs under model uncertainty

In this section, we look at the approach of Neff (1996) for developing Bayesian two-stage D-D optimal designs in the context of linear models when initial knowledge of the form of the model is poor. This is accomplished by using the Bayesian D-optimality criterion of DuMouchel and Jones (1994) in the first stage and the second stage design is then generated by incorporating improved model knowledge gathered from the first stage.

Some examples of two-stage designs

We now present a few examples of the Bayesian two-stage D-D optimal designs under different assumed true models with our updated algorithm. Let us suppose that the full model under consideration by the experimenter comprises the following regressors:x(f)={1,x1,x2,x12,x3,x1x2,x1x3,x2x3}with p=4 primary terms, {1,x1,x2,x12} and q=4 potential terms, {x3,x1x2,x1x3,x2x3}. Since the second stage design is a random variable depending on the first stage, response data from the first stage experiment

Evaluation of Bayesian two-stage D-D optimal designs

The performance of the Bayesian two-stage D-D optimal designs presented in 2.1 Selection of the first stage design, 2.2 Analysis of first stage design, 2.3 Selection of second stage Bayesian will now be evaluated relative to the classical one-stage designs and the approach of Neff (1996). Since the second stage design is a random variable dependent on first stage data through the Box and Meyer posterior probabilities, we need a simulation approach to evaluate the performance of the two-stage

Ratio of sample size between the two stages

We now investigate in a limited simulation study the distribution of sample sizes between the two stages and the number of design points desirable in the first stage. A simulation approach is also needed here as it is difficult to reach the goals analytically. Letsinger (1995) and Myers et al. (1996) conclude from their study on logistic regression that the best performance of the two-stage designs was achieved when the first stage contained 30% of the combined design size and 70% are reserved

Discussion

The primary goal of the two-stage procedures comes from the need to handle model uncertainty and the modification of the Bayesian D-D optimal design originally developed by Neff (1996), satisfactorily deals with model misspecification as reflected by better performance in comparison to its unique stage competitors. The choice of the various priors in the work are standard ones and perform well for the two-stage approach.

The two-stage sequential procedure developed here is very general and can

Acknowledgements

This work was supported by the Fund for Scientific Research Flanders-Belgium. The authors would like to thank the Editor and Associate Editor. Special gratitude to the referees for very constructive comments and suggestions which greatly improved the manuscript. Moreover, a note of thanks to Dr. Peter Goos for his useful advice.

References (25)

  • K.M. Abdelbasit et al.

    Experimental designs for binary data

    J. Amer. Statist. Assoc

    (1983)
  • J. Andere-Rendon et al.

    Design of mixture experiments using Bayesian D-optimality

    J. Quality Technol

    (1997)
  • G.E.P. Box

    Choice of response surface design and alphabetic optimality

    Utilitas Math

    (1982)
  • G.E.P. Box et al.

    A basis for the selection of a response surface design

    J. Amer. Statist. Assoc

    (1959)
  • G.E.P. Box et al.

    Design of experiments in non-linear situations

    Biometrika

    (1959)
  • Box, G.E.P., Meyer, R.D., 1992. Finding the active factors in fractionated screening experiments. CQPI Report 80,...
  • G.E.P. Box et al.

    Finding the active factors in fractionated screening experiments

    J. Quality Technol

    (1993)
  • K. Chaloner et al.

    Bayesian experimental designa review

    Statist. Sci

    (1995)
  • Chang, Y-J., Notz, W.I., 1996. Model robust designs. In: Ghosh, S., Rao, C.R. (Eds.), Handbook of Statistics, Vol. 13,...
  • Clyde, M.A., 2001. Experimental design, a Bayesian perspective. ISDS Discussion Paper 01–05, University of...
  • P.A.K. Covey-Crump et al.

    Optimal regression with previous observations

    Biometrika

    (1970)
  • H. Dette

    Bayesian D-optimal and model robust designs in linear regression models

    Statistics

    (1993)
  • Cited by (16)

    • Sequential model-based design of experiments for development of mathematical models for thin film deposition using chemical vapor deposition process

      2020, Chemical Engineering Research and Design
      Citation Excerpt :

      All model MBDoE technique can be used for a sequential experimental design (Agarwal and Brisk, 1985; Atkinson and Fedorov, 1975; Trust, 2016). A sequential experimental design is appealing because it provides the model developers with a chance to modify the experimental strategy after each round of experiments, using the additional information that becomes available (Issanchou et al., 2003; Ruggoo and Vandebroek, 2004). To the best of our knowledge, there are few applications of the MBDoE in the CVD literature.

    • Data-driven multistratum designs with the generalized Bayesian D-D criterion for highly uncertain models

      2019, Computational Statistics and Data Analysis
      Citation Excerpt :

      If this is the case, then the designs selected by the two criteria may not be efficient for fitting the true model. To deal with the problem of highly uncertain models, some authors have suggested conducting an experiment in two stages (see Lin et al., 2000; Ruggoo and Vandebroek, 2004). The idea is that a smaller design is first selected and used to collect data in the first-stage experiment; then the information about the true model is extracted from the data and used to select a design for the second-stage experiment.

    • A method for augmenting supersaturated designs

      2019, Journal of Statistical Planning and Inference
    • Augmenting supersaturated designs with Bayesian D-optimality

      2014, Computational Statistics and Data Analysis
      Citation Excerpt :

      As such, it is difficult to pick which model or models to build upon in the follow-up runs. Therefore, instead of adding runs based on a model-discrimination criterion like in Ruggoo and Vandebroek (2004), we add runs based on a categorization of factors. A model-dependent augmentation strategy is computationally expensive.

    • Model-sensitive sequential optimal designs

      2006, Computational Statistics and Data Analysis
    View all citing articles on Scopus
    View full text