An algorithm for generating good mixed level factorial designs

https://doi.org/10.1016/j.csda.2019.01.020Get rights and content

Abstract

An algorithm for the creation of mixed level arrays with generalized minimum aberration (GMA) is proposed. GMA mixed level arrays are particularly useful for experiments involving qualitative factors: for these, the number of factor levels is often a consequence of subject matter requirements, while a priori assumptions on a statistical model are not made, apart from assuming lower order effects to be more important than higher order effects. The proposed algorithm creates GMA arrays using mixed integer optimization with conic quadratic constraints. Fully achieving GMA is feasible for small problems; for larger problems, the optimization task is reduced to considering the confounding of low-order effects only. Lower bounds for the lowest-order confounding are provided (given the number of experimental runs). Where one of these bounds is actually attainable, the algorithm is often fast in providing an array which attains it. Examples illustrate the scope and usefulness of the algorithm, which is implemented in an R package, using one of two commercial optimizers.

Introduction

Factorial experiments are often run using orthogonal arrays. For example, engineers make ample use of the collection of arrays provided by the Japanese engineer Genichi Taguchi (see e.g. NIST/SEMATECH, 2016). Mixed level experiments, i.e. experiments for which not all factors have the same number of levels, are common in applications, especially if some factors are qualitative in nature. If a particular mixed level experiment is required, availability of a suitable array can be an issue. It is common to create a factorial design from a subset of the columns of a published array, and Grömping (2018c) discussed ways to improve this type of usage by optimizing the choice of columns.

Of course, optimization of column selection from an existing array cannot be better than the creation of a tailor-made optimized array for the task at hand. For experiments with some qualitative factors, one will often not have a particular model in mind, but will aim for the estimation of main effects, perhaps also of two-factor interactions, assuming that lower order effects are more important than higher order effects. For such a context, Fontana (2017) introduced an algorithm for the creation of an orthogonal array in a given number of runs that fulfills the quality criterion “generalized minimum aberration” (GMA, see Section 2.1); his formulation relied on the complex coding (Bailey, 1982). For m factors, Fontana’s algorithm used a sequence of m mixed integer quadratic optimization steps, with all but the first problems also having conic quadratic constraints. Because of the complex coding, the matrices for the target function and for the constraints of each step were constructed using the Cholesky decomposition of an NxN matrix, where N denotes the number of runs needed for a full factorial experiment in the factors to be investigated, i.e. is usually very large. The intriguing feature about the approach is its principal ability to algorithmically create an optimized array for a specific experimental situation with mixed numbers of levels that is at the same time model-robust. Unfortunately, the details of the algorithm’s implementation were such that its application in practice would only work for very small data situations; we set out to improve that situation.

We modify and improve Fontana’s proposal in the following ways: Based on Grömping (2018a), we substantially reduce the sizes of the matrices in the optimization problems. Furthermore, we reduce the number of optimization problems to be considered, by replacing the first few conic quadratic problems with a single linear one that enforces zero confounding for effects up to a specified order (Fontana alluded to such a possibility; its implementation in his algorithm would have required the use of strata according to Fontana, 2013, because of using the complex coding). For the objective of the first actual minimization, we derive a lower bound, which, if attained, allows to finish this optimization step fast (see Sections 2.3 Lower bounds for, 4.4 Practical aspects). Furthermore, we implement a loop over optimizations of different problem representations which proves very successful in increasing the chance that the first optimization step finds an optimum. In addition, we choose a pragmatic approach which allows us to provide good designs for larger problems: due to the nature of the optimization problem to be solved by mixed integer optimization, establishing overall GMA is extremely time consuming or even impossible for many problems of relevant sizes. Therefore, we primarily aim for minimizing confounding of lowest-order effects. There still remain cases for which neither confirmation of an optimum nor further improvement is possible; for these, we consider it valuable to provide “improved” arrays, even if they are not necessarily optimal. All these – GMA designs, designs with optimized lowest-order confounding and “improved designs” – are what we call “good designs”. We propose to use our algorithm for algorithmically obtaining good mixed level designs for situations for which a GMA array is not readily available.

Many specialized algorithms in the design community, e.g. the enumeration algorithms by Bulutoglu and Margot (2008) for symmetric orthogonal arrays or by Schoen et al. (2010) for general orthogonal arrays, are intended as a tool for design researchers, for enumerating all existing designs or for providing an authoritative list of designs that practitioners can turn to. Other algorithms, e.g. the Kobilinsky et al. (2017) algorithm, can be used in design creation of a (mixed level) factorial design for a specific experiment. Our algorithm is of that latter nature. It is more general than the Kobilinsky et al. (2017) algorithm, in that it incorporates (part of) the GMA quality criterion (see Section 2.1) into design creation; furthermore, the Kobilinsky et al. (2017) algorithm is restricted to the construction of arrays that can be obtained from design keys (using pseudofactors), whereas our algorithm can construct arrays from the larger class of all orthogonal arrays. On the other hand, the Kobilinsky et al. (2017) algorithm can account for randomization restrictions, which is not in the scope of our algorithm. Both algorithms have been implemented in an R package (R Core Team, 2018, Grömping, 2018b, R package DoE.MIParray; Kobilinsky et al. (2018), R package planor).

This paper derives the algorithm and provides examples that evidence its usefulness. All calculations have been done using the afore-mentioned R package DoE.MIParray, which offers functions based on two different commercial solvers for mixed integer optimization problems (Mosek and Gurobi, as documented in Mosek ApS, 2017 and Gurobi Optimization, Inc., 2017). Both vendors provide free academic licenses and R packages that support access from within R. Mixed integer optimization is an area of active research; thus, what seems currently extremely challenging may become easily doable with future improvements to mixed integer algorithms. It is therefore possible that the range of applicability for the algorithm will broaden over time. Nevertheless, mixed integer optimization remains an NP hard problem. We present more detail than Fontana (2017) on the algorithm’s implementation in the two solvers; the purpose of presenting the specifics of inputs to the commercial solvers is to enable readers to create their own implementations in other professional solvers, e.g. CPLEX. Apart from giving a high-level overview, we do not discuss any specifics of how mixed integer optimization is implemented within the solvers we use; readers are referred to the literature, e.g. the very accessible modeling cookbook provided by Mosek ApS (2018) and references therein.

Section 2 introduces the necessary fundamentals and some basic results. Section 3 provides a motivating example that illustrates the type of design that benefits most from our algorithm, Section 4 describes the algorithm, and Section 5 applies it to the test cases of Fontana (2017) (5.1) and to further interesting mixed level requests for orthogonal arrays (5.2) or supersaturated strength 1 arrays (5.3). The discussion highlights the merits of our proposal and points out needs for further research.

Section snippets

Basics and auxiliary results

An array d for accommodating m factors in n experimental runs can be written as an n×m array of symbols. The jth factor has sj levels, which are denoted as 1,,sj, and the n rows of the array d contain the factor level combinations for n experimental runs. An orthogonal array (OA) of strength t in n runs with m1 factors at s1 levels, …, mk factors at sk levels is denoted as OA(n,s1m1skmk,t): for an OA of strength t, each t-tuple of factors has each level combination the same number of times.

A motivating example

Larger arrays with strength 2 and far from saturating main effects df cannot be found easily in published tables; for example, an optimized OA(72, 243241, 2) is neither found on the Eendebak and Schoen (2010) website (which covers strength 2 arrays with up to 70 runs only) nor on the Kuhfeld (2009) website; the latter provides many instances of 72 run arrays with many main effects df. Grömping (2018c) reported on the creation of an OA(72, 243241, 2) by optimizing column selection from an OA(72,

The algorithm

The purpose of this section is to explain the use of Mosek or Gurobi for finding a GMA design or –more often –for finding a resolution R design and optimizing AR. The integer optimization itself is left to these tools and not described here in any detail, as it would be unwise to program this instead of relying on the highly sophisticated existing tools.

Examples

The calculations were done using two threads on a Windows machine with i7 processor with four cores and 32 GB RAM, using the R package DoE.MIParray with R 3.4.1, Mosek version 8.1 and Gurobi version 8.0.1.

Discussion

Mixed integer optimization can provide confirmed GMA arrays for small problems. For medium size problems, we have seen that it is often possible to obtain a confirmed optimum for the shortest word length AR; this happens, whenever the GMA word length attains the lower bound provided in Proposition 1. It is then possible to further optimize the second shortest word length; however, for word lengths other than the shortest, there is generally no lower bound different from zero, so that it is

Acknowledgments

Ulrike Grömping’s work was partially supported by Deutsche Forschungsgemeinschaft Grant GR 3843/2. We appreciate input by Hongquan Xu regarding validity of Proposition 3 for the sum of the Ai values in the mixed-level case. Roberto Fontana acknowledges the excellent research environment of his department which has also been recognized by the MIUR grant Dipartimenti di Eccellenza 2018-2022 programme.

References (27)

  • GrömpingU.

    DoE.MIPArray: creation of arrays by mixed integer optimization. R package version 0.10

  • GrömpingU.

    R package DoE.base for factorial designs

    J. Statist. Softw.

    (2018)
  • GrömpingU. et al.

    Generalized resolution for orthogonal arrays

    Ann. Statist.

    (2014)
  • Cited by (0)

    View full text