Importance analysis for models with correlated variables and its sparse grid solution

https://doi.org/10.1016/j.ress.2013.06.036Get rights and content

Highlights

  • The contribution of correlated variables to the variance of the output is analyzed.

  • A novel interpretation for variance-based indices of correlated variables is proposed.

  • Two solutions for variance-based importance measures of correlated variables are built.

Abstract

For structural models involving correlated input variables, a novel interpretation for variance-based importance measures is proposed based on the contribution of the correlated input variables to the variance of the model output. After the novel interpretation of the variance-based importance measures is compared with the existing ones, two solutions of the variance-based importance measures of the correlated input variables are built on the sparse grid numerical integration (SGI): double-loop nested sparse grid integration (DSGI) method and single loop sparse grid integration (SSGI) method. The DSGI method solves the importance measure by decreasing the dimensionality of the input variables procedurally, while SSGI method performs importance analysis through extending the dimensionality of the inputs. Both of them can make full use of the advantages of the SGI, and are well tailored for different situations. By analyzing the results of several numerical and engineering examples, it is found that the novel proposed interpretation about the importance measures of the correlated input variables is reasonable, and the proposed methods for solving importance measures are efficient and accurate.

Introduction

Sensitivity analysis (SA), especially global SA, is widely used in engineering design and probability safety assessment. Global SA, also known as importance analysis, aims at determining which of the input parameters influences output the most in the whole uncertainty range of the inputs. Indicators created for global SA purposes are defined as that uncertainty in the output can be apportioned to different sources of uncertainty in the model input [1]. At present, many importance analysis techniques and indices are available, such as nonparametric techniques [2], [3], [4], variance-based importance measure indices [5], [6], [7], [8], and moment-independent importance measures [9], [10], [11]. Among these methods, variance-based importance measure is known as a versatile and effective tool in uncertainty analysis.

Most of the existing importance analysis techniques assume input variables independence for sake of computability. However, in many cases the input variables are correlated with one another and these correlations present among the variables may affect the importance ranking of the inputs dramatically. Therefore, more and more importance analysis techniques are proposed over the past ten years to take the correlation of the variables into consideration, such as the methods proposed in Refs. [12], [13], [14], [15]. Nevertheless, these early researches for correlated input variables only provide an overall importance measure of one input variable, which does not distinguish the correlated or uncorrelated contribution of one input variable. To explore the origin of the uncertainty of the output response clearly in case of correlated input variables involved, the contribution of uncertainty to output response by an individual input variable is divided into two parts in Ref. [16]: the uncorrelated contribution and the correlated one. This distinction of contribution for an individual variable can provide engineers a better understanding of the composition of the output uncertainty, and help them to decide whether the uncorrelated part or the correlated part should be focused on. However, the regression-based method proposed in Ref. [16] for decomposing the contribution of the input variables depends on the assumption that the relationship between output response and input variables is approximately linear. Based on covariance decomposition of the unconditional variance of the output, a similar treatment for correlated input variables was proposed in Ref. [17] where the total contribution of an input variable or a subset of input variables to the variance of the output response was decomposed into structural contribution and correlative one. Although this treatment can deal with both linear and nonlinear response function, it relies on the determination of how many components are included in the decomposition. A set of variance-based sensitivity indices is proposed in Ref. [18] to perform importance analysis of models with correlated inputs. The definition of those indices is based on a specific orthogonalisation of the inputs and ANOVA-representations of the model output. They cannot only support nonlinear models and nonlinear dependences, but also reflect the effect of the interaction among variables when decomposing the total variance of the model output. However, the method depends on the order of the inputs in the original set. To obtain the uncorrelated and correlated variance contributions of each input variable, it is essential to transform different sets of circular permutations of the correlated inputs into independent and orthogonal ones, and then calculate the corresponding variance-based importance measures of the transformed variables. Therefore, it is a computationally expensive process to perform analysis with this method. A generalization of the variance-based importance measures for the case of correlated variables is presented in Ref. [19]. The generalized importance measures can perfectly preserve the advantages of the original variance-based importance measures without necessity of determining functional decomposition or orthogonalization of the input factor space and so on. However, it does not consider the differences between the contribution to the output uncertainty by the independent variables and that by the correlated ones, but directly extends the significations of importance measures of the independent variables to those of the correlated ones. Therefore, the interpretation of the importance measures in the case of correlated variables is not reasonable. Furthermore, the solution of the measures in Ref. [19] still relies on numerical simulation method, which usually consumes a relatively heavy computational burden.

In this work the variance-based importance measures of the independent variables are reinterpreted in the context of the correlated ones, and the new meanings of these importance measures are proposed. The new interpretation is based directly on the contribution of the correlated inputs to the uncertainty of the output, and decomposes the contribution of the correlated input variables to the variance of the output into main effect of correlated and uncorrelated variations and the total effect of uncorrelated variation. This decomposition cannot only reflect the origin of the output uncertainty, but also interpret the contribution of the correlated input variables clearly and reasonably. In addition, by employing the high efficiency of the sparse grid integration (SGI), two SGI based methods are proposed to perform importance analysis of the correlated inputs in different situations. The proposed methods avoid the sampling procedure, which usually consumes a heavy computational burden, and can be used as effective tools to deal with uncertainty analysis involving correlated inputs.

The rest of the paper is organized as follows: in Section 2 the contribution of the correlated input variables to the variance of the output is firstly analyzed, on which the new significations of the variance-based importance measures are proposed. This new interpretation for the variance contribution of the correlated input variables is later discussed and compared with the existing ones. In Section 3, the basic theory of the SGI in case of independent variables is reviewed. Two new proposed methods that incorporate SGI to perform importance analysis for correlated input variables are detailed in Section 4. Several numerical and engineering examples are used to illustrate the new interpretation and the results are discussed to demonstrate the efficiency of the proposed solutions on SGI in Section 5. Finally, the conclusion comes at the end of the paper.

Section snippets

Importance measures for independent input variables

Considering a mathematical or computational model of the form y=g(x), where x=(x1,x2,,xd)T is the input vector with d-dimensional variables. The variance-based importance measures of these input variables are related to a decomposition of the function g(x) itself into terms of increasing dimensionality (High Dimensional Model Representation, HDMR) [5], i.e.g(x)=g0+igi(xi)+ij>igij(xi,xj)++g12d(x1,,xd)where the various terms in Eq. (1) are defined as g0=E(y), gi(xi)=E(y|xi)g0, and gij(xi,x

Review on sparse grid integration (SGI)

The sparse grid method based on Smolyak algorithm has become more and more popular since it was introduced. In this method, multivariate quadrature formulas are constructed using combinations of tensor products of suitable one-dimensional formulas. In this way, the number of function evaluations and the numerical accuracy become independent of the dimension of the problem up to logarithmic factors [21]. In view of this, it has been widely used in numerical integration [21], [22], interpolation

Double-loop nested SGI (DSGI)

Expressions of SiTc and SiTu, i.e.SiTc=V(E(y|xi))V(y)andSiTu=V(y)V(E(y|xi))V(y),show that the key point in calculating the importance measures of the correlated input variables is to estimate the variance contributions V(E(y|xi)) and V(E(y|xi)). Both of them are double integrations, which can be seen as the nested integration of expectation and variance operator. Taking V(E(y|xi)) as an example, it can be seen as the nested integration ofE(y|xi)=Rd1gXi|Xi(xi)fXi|Xi(xi)dxiandV(E(y|xi))=

Examples

Example 1

Consider the linear model Y=g(x)=x1+x2+x3 in Ref. [19], where all the input variables are normally distributed with zero mean and covariance matrix Cx=(100012ρ02ρ22). The computational results of the importance measures by the two proposed SGI methods are listed in Table 2. Additionally, the results of the quasi Monte Carlo (QMC) method as well as the analytical (ANA) results in Ref. [19] are also presented for comparison.

Example 2

Consider the nonlinear model Y=g(x)=x1x3+x2x4 in Ref. [19], where (x1,x2,x

Conclusions

A novel interpretation for the variance-based importance measures in case of correlated input variables is presented. The interpretation is directly based on the contribution of the correlated input variables to the output variance, and divides the contribution of the correlated input variables into main effect of correlated and uncorrelated variations and the total effect of uncorrelated variation. Therefore, compared with the generalization-based measures, it cannot only reflect the

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. NSFC 51175425), the Doctorate Foundation of Northwestern Polytechnical University (Grant No. CX201205), the Ministry of Education Fund for Doctoral Students Newcomer Awards of China and the Excellent Doctorate Foundation of Northwestern Polytechnical University (Grant No. DJ201301). Additionally, the authors would like to thank the anonymous reviewers for their valuable comments.

References (32)

  • R Lebrun et al.

    An innovating analysis of the Nataf transformation from the copula viewpoint

    Probabilistic Engineering Mechanics

    (2009)
  • A. Saltelli

    Making best use of model evaluations to compute sensitivity indices

    Computer Physics Communications

    (2002)
  • A. Satelli

    Sensitivity analysis for importance assessment

    Risk Analysis

    (2002)
  • JC Helton et al.

    Sampling-based methods

  • IM. Sobol’

    Sensitivity analysis for non-linear mathematical models

    Mathematical Modeling and Computational Experiment

    (1993)
  • IM. Sobol

    Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates

    Mathematics and Computers in Simulation

    (2001)
  • Cited by (8)

    • Copula-based decomposition approach for the derivative-based sensitivity of variance contributions with dependent variables

      2018, Reliability Engineering and System Safety
      Citation Excerpt :

      Thus, for the case of dependent variables, Saltelli et al. [10,11] proposed approaches to perform the SA with dependent variables, but these researches only provide an overall sensitivity of one variable, which cannot distinguish the independent or dependent influence of one variable. To make a better understanding of the dependence in variance-based SA, Xu and Gertner [12] and Li et al. [13,14] divided the variance contribution of individual variable into the independent part and the dependent part, but this approach is constructed based on the approximation of linear model. Li et al. [15] decomposed the total variance contribution of one variable or a set of variables into the structural contribution and correlative contribution based on the covariance decomposition, which can deal with both the linear and nonlinear models.

    • A generalized separation for the variance contributions of input variables and their distribution parameters

      2017, Applied Mathematical Modelling
      Citation Excerpt :

      Thus, for the analysts, it is worth to pay more attention on the variability of the section size b and the length of the beam L, and it is also possible to make a decision to neglecting the uncertainties of the distribution parameters. In this engineering practice, the proposed sensitivity measures are used to analyze a wing box model [35]. The diagram of the wing box is shown in Fig. 9.

    • Relative importance of factors influencing building energy in urban environment

      2016, Energy
      Citation Excerpt :

      For uncorrelated input factors, variable importance can be analyzed by using conventional sensitivity indicators, including SRC, t-value, correlation coefficient [4,33]. However, variable importance analysis becomes more difficult if input variables are highly correlated [34]. The various metrics proposed for correlated variables include partial correlation coefficient, LMG (Lindeman, Merenda, and Gold), CAR score, PMVD (proportional marginal variance decomposition), Genizi, and conditional random forest [14,15].

    • Variable importance measure system based on advanced random forest

      2021, CMES - Computer Modeling in Engineering and Sciences
    View all citing articles on Scopus
    View full text