Estimating Sobol sensitivity indices using correlations
Highlights
► We present Sobol's method in terms of correlation coefficients. ► We derive new error correction terms for the method that remove spurious correlation. ► Multiple estimation methods are compared for precision, accuracy, and efficiency.
Introduction
Sobol's method is a global sensitivity analysis (SA) technique which determines the contribution of each input (or group of inputs) to the variance of the output. The usual Sobol sensitivity indices include the main and total effects for each input, but the method can also provide specific interaction terms, if desired. Sobol's method was originally presented in Russian by Sobol' (1990); the article was reprinted in English in Sobol', (1993). The method is notable because it works well without simplifying approximations, even for models with very large numbers of random variables. The method is superior to traditional sensitivity methods (such as local methods that examine parameters one at a time) when considering cases where the assumption of linearity is invalid (Saltelli and Annoni, 2010), and has been shown to be robust (Yang, 2011). Sobol's method has been applied successfully to complex environmental models to identify critical input parameters and major sources of uncertainty (e.g., Nossent et al., 2011; Vezzaro and Mikkelsen, 2012; Confalonieri et al., 2010; Estrada and Diaz, 2010). In addition, the method has been used as a basis for multiple criteria analyses (Annoni et al., 2011).
In this paper, new formulations of Sobol indices in terms of Pearson correlation coefficients are presented. These formulations suggest the inclusion of “correction terms” that remove some spurious correlation. Such correction terms are presented, and multiple estimation methods are compared for precision, accuracy, and efficiency, using the G function as a test case. The G function has the advantage that the theoretical values for all the sensitivity indices are known, so the accuracy of various estimation techniques can be evaluated.
Section snippets
Variance decomposition, main effects, total effects, and interaction terms
Many SA methods are based on an analysis of the variance of the model output (Chan et al., 2000); the theoretical basis of several of these methods is variance decomposition. Such techniques include Fourier Amplitude Sensitivity Test (FAST; Cukier et al., 1973, Cukier et al., 1978; Saltelli et al., 1999), High Dimensional Model Representation (HDMR; Rabitz and Alis, 1999), random balance designs (Tarantola et al., 2006a), and traditional ANOVA methods. In variance decomposition, the model
Methods
Computing the Sobol indices numerically requires evaluating the Pearson correlation coefficients between the output vectors from pairs of model runs. There are multiple variations on either Equation (3) which uses the “raw” model output, or on Equation (5) which uses standardized output. We present equations for a total of 12 previously published and new methods in this section. These equations are evaluated using the “G function” (Davis and Rabinowitz, 1984).
Results
Summary statistics from all the test runs are presented in Table 2. The prefix ‘avg’ indicates that the reported statistics are averaged across the 12 first-order indices (6 main and 6 total effects). Emad is the mean absolute deviation of the 50,000 estimates from the theoretical value, Estd is the standard deviation of the estimates, Serr is the standard error in the mean, and Aerr is the absolute deviation of the overall mean from the theoretical value.
The standardized methods B1–B3 score
Bootstrapping
For Table 3, the full analysis of 2J + 2 model runs was performed M = 50,000 with different random samples, with each of these sets producing N = 200,000 samples of model output. For all practical purposes, the mean of M sets (each producing N estimates) provides the same precision as one set producing (M*N) estimates. However, the former approach has two advantages: first, one can calculate not only the mean value for each index, but also the standard error of the mean. Second, breaking the
Conclusion
Sobol's method of sensitivity analysis is well suited to high-dimensional stochastic computer models, and has been successfully implemented in SHEDS. The methods presented herein provide a simple correlation-based numerical approach to calculating the estimates and reducing errors associated with spurious correlation. The use of 2J + 2 model runs to obtain double estimates provides good estimates of all the first and second-order main and total effect indices. The method is easy to implement,
Disclaimer
The U.S. Environmental Protection Agency through its Office of Research and Development partially funded the research described here under contract number EP-D-05-065 to Alion Science and Technology, Inc. It has been subjected to Agency review and approved for publication.
References (33)
- et al.
Partial order investigation of multiple indicator systems using variance-based sensitivity analysis
Environ. Modell. Softw.
(2011) - et al.
Sensitivity analysis of the rice model WARM in Europe: exploring the effects of different locations, climates and methods of analysis on model sensitivity to crop parameters
Environ. Modell. Softw.
(2010) - et al.
Nonlinear sensitivity analysis of multiparameter model systems
J. Comput. Phy.
(1978) - et al.
Global sensitivity analysis in the development of first principle-based eutrophication models
Environ. Modell. Softw.
(2010) - et al.
Importance measures in global sensitivity analysis of model output
Reliab. Eng. Sys. Saf.
(1996) Analysis of variance designs for model output
Comput. Phys. Comm.
(1999)- et al.
Sobol’ sensitivity analysis of a complex environmental model
Environ. Modell. Softw.
(2011) - et al.
Some new techniques in sensitivity analysis of model output
Comput. Stat. Data Anal.
(1993) - et al.
About the use of rank transformation in sensitivity analysis of model output. Reliability Engrg
System Safety
(1995) Making best use of model evaluations for compute sensitivity indices
Comput. Phys. Comm.
(2002)
Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index
Comput. Phys. Comm.
How to avoid a perfunctory sensitivity analysis
Environ. Modell. Softw.
Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates
Math. Comput. Simul.
On the use of variance reducing multipliers in Monte Carlo computations of a global sensitivity index
Comput. Phys. Comm.
Estimating the approximation error when fixing unessential factors in global sensitivity analysis
Reliab. Eng. Sys. Saf.
Random balance designs for the estimation of first order global sensitivity indices
Reliab. Eng. Sys. Saf.
Cited by (0)
- 1
Tel.: +1 919 406 2157.