Probability models for data-Driven global sensitivity analysis

https://doi.org/10.1016/j.ress.2018.12.003Get rights and content

Highlights

Abstract

This paper presents a probability model-based global sensitivity analysis (PM-GSA) framework, to compute various Sobol’ indices when only input-output data are available. The PM-GSA framework consists of two main elements, namely data extraction and probability model training. The data extraction step extracts data of the variables of interest (VoI) and quantity of interest (QoI) from an input-output data matrix. Following that, a probability model is built to approximate the joint probability density function between the VoI and QoI. The learned probability model is then used to compute various Sobol’ indices. The implementation of the PM-GSA framework is investigated through three probability models including Gaussian copula model, Gaussian mixture model, and a new Gaussian mixture copula model. The number of dimensions of the probability model in the PM-GSA framework, is independent of the number of input variables and is always N+1 (e.g. 2 for the first-order index), where N is the order of the Sobol’ index. In addition, the PM-GSA framework is applicable to global sensitivity analysis with not only independent input variables, but also with dependent input variables and for sets of variables. Four numerical examples are used to demonstrate the effectiveness of the proposed method and analyze the advantages and disadvantages of the different probability models.

Introduction

Global sensitivity analysis (GSA), which quantifies the contributions of input random variables to the variability of an output quantity of interest (QoI) [1], [2], [3], has been widely used to rank the importance of input random variables and thus achieve the purpose of dimension reduction [4], uncertainty reduction [5], and resource allocation [6]. During the past decades, various approaches have been proposed to perform GSA, such as the Fourier amplitude sensitivity test (FAST) methods [7], [8], methods based on correlation ratio [9], [10], Kullback–Leibler divergence based approaches [11], [12], and Sobol’ indices related methods [13], [14], [15]. Among these GSA methods, variance decomposition-based Sobol’ indices, which is also the focus of this work, is one of the most widely used. Two types of Sobol’ indices are usually computed, namely first-order Sobol’ indices and total-effect Sobol’ indices [6]. The first-order Sobol’ index measures the individual contribution of each variable to the variability of the QoI without considering its interactions with other variables, while the interactions with other variables are included in the total-effect Sobol’ indices [16], [17] and other joint effects indices of multiple variables. A straightforward way of computing Sobol’s indices is to implement a double-loop Monte Carlo simulation (MCS). This double-loop procedure, however, requires a large number of evaluations of the prediction model and is unaffordable if the prediction model is expensive.

To overcome the computational effort challenge in GSA, various approaches have been proposed in recent years [18], which can be roughly grouped in three directions [19], [20]. Note that the classification of GSA methods here is not exhaustive. The first direction is to reduce the required number of samples in computing Sobol’ indices using MCS by deriving efficient numerical algorithms [4], [19], [21], [22], [23]. For instance, Sobol’ discussed how to efficiently estimate the Sobol’ indices using MCS [24]. In this scheme, the required number of samples in the double-loop procedure is reduced to a number which is proportional to the dimension of the input variables. The accuracy of Sobol’s scheme was further improved by Homma and Saltelli in Ref [4]. Similarly, Glen and Isaacs developed an approach to compute the Sobol’ indices by switching the columns of two separately generated MCS sample matrices [25]. Instead of directly using the Monte Carlo samples, the second direction uses spectral approaches [7], [8], [26] or design of experiments methods [27], [28], [29], [30] to reduce the required number of samples in GSA. For example, the aforementioned FAST method [7], [8] is a spectral approach to perform GSA. Tissot and Prieur [29] developed a randomized orthogonal array-based procedure for the estimation of Sobol’ indices. A detailed review of various sampling techniques and MCS approaches is available in Ref [20]. The third direction is to replace the original (expensive) prediction model with a cheap algebraic surrogate model [31], [32], such as a regression model, Polynomial Chaos Expansion (PCE) model [33], or a Kriging model [34], [35]. Based on the surrogate model, the Sobol’ indices are calculated using either analytical approaches or direct MCS-based methods. For PCE, Sudret [13] derived analytical expressions for the Sobol’ indices by post-processing the PCE coefficients. For Kriging, Chen et al. [36] suggested analytical ways to compute Sobol’ indices when the input variables follow normal or uniform distributions. Gratiet et al. [37] later developed a more generalized approach to estimate the Sobol’ indices through a surrogate model by considering both estimation and surrogate model errors. They also extended the proposed approach to mltifidelity computer simulations [37].

In practical engineering applications, it is quite often that we may only have a group of numerical samples of input-output pairs, nothing more. The distributions, correlations, and interactions between different variables need to be learned purely based on the available numerical data. In that situation, direct MCS-based or sampling-based GSA methods cannot be adopted to rank the importance of variables for a given QoI due to the fact that the prediction model is not available and current MCS or sampling-based GSA approaches require at least two separated input-output data matrices [25]. Surrogate model-based GSA approaches [13], [31], [37] are still applicable in this situation. However, this is not always the case. In some situations, it is observed that the surrogate model-based GSA approaches may become inapplicable or are not suggested for the following reasons [38], [39]:

  • Current surrogate modeling techniques are only applicable to moderate-sized problems (i.e., dimension of input variables less than 30). They cannot be applied to problems with high dimensions due to the curse of dimensionality [40].

  • Even if surrogate models (e.g. Kriging) sometimes can provide analytical expressions of Sobol’ indices, as been pointed out in Ref. [39], most of these analytical expressions (except PCE-based expressions) involve multidimensional integrals that are tractable only when the conditional probability densities of the input random variables are known. This is apparently not the case when only a matrix of input-output data is given.

  • When algebraic surrogate model-based GSA approaches are applied to GSA for given data, an algebraic model is built first. The probability distributions of the input variables are then learned from the data. Based on the learned probability distributions, Sobol’ indices are computed through the constructed algebraic surrogate model. This introduces several extra steps in computing Sobol’ indices.

Motivated by answering the question of how to effectively perform GSA for given data, various approaches have been proposed in recent years. For example, Li and Mahadevan [38] presented a modularized method to estimate the first-order Sobol’ indices based on stratification of available samples; Jia and Taflanidis [41] developed an auxiliary probability density function approach to estimate the first-order Sobol’ indices based on data; and Sparkman et al. [42] proposed an importance sampling approach to compute Sobol’ indices from available data by introducing weights to different data points. In addition to these efforts, one of the dominant directions is to employ smoothing-based methods [43], [44], such as locally weighted regressions, local polynomial smoothing [18], [39], and recursive portioning regression. The smoothing-based methods [18], [39], [43], [44] may require the smallest sample size when they are applied to compute the first-order Sobol’ indices.

The above reviewed approaches for GSA with given samples [18], [38], [39], [42], are all limited to estimating the first-order Sobol’ indices of individual variables (i.e. single variables), and have difficulty in dealing with GSA of sets of input variables and in computing the total-effect indices. This paper aims to overcome these drawbacks [38], [42] by developing a generalized GSA framework, which is able to perform various GSA computations (e.g. first-order, higher-order, grouped, and total-effects Sobol’ indices) purely based on data. In the proposed method, a probability model is built first based on the available data to capture the joint probability distribution of the system inputs and outputs. Based on the learned probability model, approaches are developed to effectively compute different types of Sobol’ indices. Three approaches, namely Gaussian copula model, Gaussian mixture model, and a new Gaussian mixture copula model, are explored in this paper to build the probability model for use in GSA.

The contributions of this paper can be summarized as: (1) A generalized probability model-based GSA (PM-GSA) framework is developed to efficiently compute different types of Sobol’ indices (e.g. first-order, higher-order, grouped, and total-effect Sobol’ indices); this overcomes the limitations of current approaches for GSA with given samples [18], [38], [39], [42], [43], which can only compute the first-order Sobol’ indices of single input variables. (2) A new Gaussian mixture copula model is proposed and studied to build the probability model for GSA, in addition to Gaussian copula and Gaussian mixture models; The proposed Gaussian mixture copula model can benefit not only GSA, but also many other uncertainty quantification problems. (3) Implementation approaches are developed for GSA based on different types of probability models; and (4) Advantages and disadvantages of different probability models for GSA are investigated using numerical examples.

The remainder of this paper is organized as follows. Section 2 reviews background concepts of variance-based GSA and data-driven GSA. Section 3 develops the proposed method. Section 4 considers three numerical examples to illustrate the proposed method, and Section 5 gives concluding remarks.

Section snippets

Variance-based global sensitivity analysis

Defining YR as a QoI and its underlying physics model or computer code given by Y=g(X), where X=(X1,X2,,Xn)Rn is a vector of random input variables, the variance Var(Y) of Y can be decomposed as follows [13], [24]:Var(Y)=i=1nVi+1i<jnVij+···+V12n,where Vi=VarXi(EXi(Y|Xi)) is the variance of Y caused by Xi without considering its interactions with other input variables (i.e. X ∼ i), E(·) is the expectation operator, and V1k,k=2,,n represents the proportion of Var(Y) caused by variables

Proposed method

Before discussing the details of the proposed method, we define two sets of random input variables: XcRnc and XrRnnc, where X=(Xc,Xr), Xc represents the random input variables of interest (VoI), nc is the number of variables in Xc, for which the Sobol’ indices need to be computed, and Xr represents the remaining random input variables including remaining known and unknown random input variables, where the unknown random variables could be any unknown noise sources in the data.

Based on this

Numerical examples

In this section, the Sobol’ indices of each example are computed using seven approaches: Gaussian copula-based approach one (GC1), Gaussian copula-based approach two (GC2), Gaussian mixture model-based approach one (GMM1), Gaussian mixture model-based approach two (GMM2), Gaussian mixture copula-based approach one (GMC1), Gaussian mixture copula-based approach two (GMC2), and MGSA [38]. In the first three benchmark mathematical examples, the seven methods are compared with Sobol’ indices which

Conclusions

This paper presents methods to compute various Sobol’ indices such as the first-order, second-order, and total-effect Sobol’ indices, and Sobol’ indices of a set of variables, purely based on available input-output data. The data may be available from physical experiments, field observations, etc. Noises and unknown uncertainty variables are also allowed to be presented in the data. In the proposed methods, data of the variables of interest are extracted first from the available data matrix.

Acknowledgement

The research reported in this paper was supported by the Air Force Office of Scientific Research Grant no. FA9550-15-1- 0018, Technical Monitor: Dr. Jaimie Tiley). The support is gratefully acknowledged.

References (71)

  • A. Saltelli et al.

    A quantitative model-independent method for global sensitivity analysis of model output

    Technometrics

    (1999)
  • T.A. Mara et al.

    Variance-based sensitivity indices for models with dependent inputs

    Reliab Eng Syst Saf

    (2012)
  • E. Borgonovo et al.

    Sensitivity analysis: a review of recent advances

    Eur J Oper Res

    (2016)
  • T. Homma et al.

    Importance measures in global sensitivity analysis of nonlinear models

    Reliab Eng Syst Saf

    (1996)
  • Z. Hu et al.

    Global sensitivity analysis-enhanced surrogate (GSAS) modeling for reliability analysis

    Struct Multidiscip Optim

    (2016)
  • S. Sankararaman et al.

    Test resource allocation in hierarchical systems using Bayesian networks

    AIAA J

    (2013)
  • A. Saltelli et al.

    An alternative way to compute fourier amplitude sensitivity test (FAST)

    Comput Stat Data Anal

    (1998)
  • G.J. McRae et al.

    Global sensitivity analysis a computational implementation of the fourier amplitude sensitivity test (FAST)

    Comput Chem Eng

    (1982)
  • C. Xu et al.

    Extending a global sensitivity analysis technique to models with correlated parameters

    Comput Stat Data Anal

    (2007)
  • D. Lewandowski et al.

    Sample-based estimation of correlation ratio with polynomial approximation

    ACM Trans Model Comput Simul (TOMACS)

    (2007)
  • H. Liu et al.

    Probabilistic sensitivity analysis methods for design under uncertainty

    Proceedings of the tenth AIAA/ISSMO multidisciplinary analysis and optimization conference

    (2004)
  • S. Da Veiga

    Global sensitivity analysis with dependence measures

    J Stat Comput Simul

    (2015)
  • B. Sudret

    Global sensitivity analysis using polynomial chaos expansions

    Reliab Eng Syst Saf

    (2008)
  • J. Nossent et al.

    Sobol sensitivity analysis of a complex environmental model

    Environ Model Softw

    (2011)
  • C. Zhang et al.

    Sobol sensitivity analysis for a distributed hydrological model of Yichun river basin, china

    J Hydrol (AMST)

    (2013)
  • Z. Hu et al.

    Uncertainty quantification in prediction of material properties during additive manufacturing

    Scr Mater

    (2017)
  • E. Plischke et al.

    Global sensitivity measures from given data

    Eur J Oper Res

    (2013)
  • B. Iooss et al.

    A review on global sensitivity analysis methods

    Uncertainty management in simulation-optimization of complex systems

    (2015)
  • A. Saltelli et al.

    Variance based sensitivity analysis of model output. design and estimator for the total sensitivity index

    Comput Phys Commun

    (2010)
  • C. Prieur et al.

    Variance-based sensitivity analysis: theory and estimation algorithms, Handbook of uncertainty quantification

    (2016)
  • M.J. Jansen

    Analysis of variance designs for model output

    Comput Phys Commun

    (1999)
  • A. Janon et al.

    Asymptotic normality and efficiency of two Sobol index estimators

    ESAIM Probab Stat

    (2014)
  • A. Saltelli et al.

    Sensitivity analysis for nonlinear mathematical models: numerical experience

    Matematicheskoe Modelirovanie

    (1995)
  • I.M. Sobol

    Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates

    Math Comput Simul

    (2001)
  • G. Glen et al.

    Estimating Sobol sensitivity indices using correlations

    Environ Model Softw

    (2012)
  • Tissot J.-Y., Prieur C.. Variance-based sensitivity analysis using harmonic analysis; 2012a. Working Paper;...
  • X. Wang et al.

    The effective dimension and quasi-Monte Carlo integration

    J Complex

    (2003)
  • J.-Y. Tissot et al.

    Bias correction for the estimation of sensitivity indices based on random balance designs

    Reliab Eng Syst Saf

    (2012)
  • J.-Y. Tissot et al.

    A randomized orthogonal array-based procedure for the estimation of first-and second-order Sobol’ indices

    J Stat Comput Simul

    (2015)
  • S. Tarantola et al.

    Random balance designs for the estimation of first order global sensitivity indices

    Reliab Eng Syst Saf

    (2006)
  • L.L. Gratiet et al.

    Metamodel-based sensitivity analysis: polynomial chaos expansions and gaussian processes, Handbook of uncertainty quantification

    (2016)
  • A. Marrel et al.

    Global sensitivity analysis of stochastic computer models with joint metamodels

    Stat Comput

    (2012)
  • D. Xiu et al.

    The Wiener–Askey polynomial chaos for stochastic differential equations

    SIAM J Sci Comput

    (2002)
  • A. Janon et al.

    Uncertainties assessment in global sensitivity indices estimation from metamodels

    Int J Uncertain Quantif

    (2014)
  • Z. Hu et al.

    Mixed efficient global optimization for time-dependent reliability analysis

    J Mech Des

    (2015)
  • Cited by (0)

    View full text