Estimating Sobol sensitivity indices using correlations

doi:10.1016/j.envsoft.2012.03.014

Environmental Modelling & Software

Volume 37, November 2012, Pages 157-166

https://doi.org/10.1016/j.envsoft.2012.03.014 Get rights and content

Abstract

Sensitivity analysis is a crucial tool in the development and evaluation of complex mathematical models. Sobol's method is a variance-based global sensitivity analysis technique that has been applied to computational models to assess the relative importance of input parameters on the output. This paper introduces new notation that describes the Sobol indices in terms of the Pearson correlation of outputs from pairs of runs, and introduces correction terms to remove some of the spurious correlation. A variety of estimation techniques are compared for accuracy and precision using the G function as a test case.

Highlights

► We present Sobol's method in terms of correlation coefficients. ► We derive new error correction terms for the method that remove spurious correlation. ► Multiple estimation methods are compared for precision, accuracy, and efficiency.

Introduction

Sobol's method is a global sensitivity analysis (SA) technique which determines the contribution of each input (or group of inputs) to the variance of the output. The usual Sobol sensitivity indices include the main and total effects for each input, but the method can also provide specific interaction terms, if desired. Sobol's method was originally presented in Russian by Sobol' (1990); the article was reprinted in English in Sobol', (1993). The method is notable because it works well without simplifying approximations, even for models with very large numbers of random variables. The method is superior to traditional sensitivity methods (such as local methods that examine parameters one at a time) when considering cases where the assumption of linearity is invalid (Saltelli and Annoni, 2010), and has been shown to be robust (Yang, 2011). Sobol's method has been applied successfully to complex environmental models to identify critical input parameters and major sources of uncertainty (e.g., Nossent et al., 2011; Vezzaro and Mikkelsen, 2012; Confalonieri et al., 2010; Estrada and Diaz, 2010). In addition, the method has been used as a basis for multiple criteria analyses (Annoni et al., 2011).

In this paper, new formulations of Sobol indices in terms of Pearson correlation coefficients are presented. These formulations suggest the inclusion of “correction terms” that remove some spurious correlation. Such correction terms are presented, and multiple estimation methods are compared for precision, accuracy, and efficiency, using the G function as a test case. The G function has the advantage that the theoretical values for all the sensitivity indices are known, so the accuracy of various estimation techniques can be evaluated.

Section snippets

Variance decomposition, main effects, total effects, and interaction terms

Many SA methods are based on an analysis of the variance of the model output (Chan et al., 2000); the theoretical basis of several of these methods is variance decomposition. Such techniques include Fourier Amplitude Sensitivity Test (FAST; Cukier et al., 1973, Cukier et al., 1978; Saltelli et al., 1999), High Dimensional Model Representation (HDMR; Rabitz and Alis, 1999), random balance designs (Tarantola et al., 2006a), and traditional ANOVA methods. In variance decomposition, the model

Methods

Computing the Sobol indices numerically requires evaluating the Pearson correlation coefficients between the output vectors from pairs of model runs. There are multiple variations on either Equation (3) which uses the “raw” model output, or on Equation (5) which uses standardized output. We present equations for a total of 12 previously published and new methods in this section. These equations are evaluated using the “G function” (Davis and Rabinowitz, 1984).

Results

Summary statistics from all the test runs are presented in Table 2. The prefix ‘avg’ indicates that the reported statistics are averaged across the 12 first-order indices (6 main and 6 total effects). E_mad is the mean absolute deviation of the 50,000 estimates from the theoretical value, E_std is the standard deviation of the estimates, S_err is the standard error in the mean, and A_err is the absolute deviation of the overall mean from the theoretical value.

The standardized methods B₁–B₃ score

Bootstrapping

For Table 3, the full analysis of 2J + 2 model runs was performed M = 50,000 with different random samples, with each of these sets producing N = 200,000 samples of model output. For all practical purposes, the mean of M sets (each producing N estimates) provides the same precision as one set producing (M*N) estimates. However, the former approach has two advantages: first, one can calculate not only the mean value for each index, but also the standard error of the mean. Second, breaking the

Conclusion

Sobol's method of sensitivity analysis is well suited to high-dimensional stochastic computer models, and has been successfully implemented in SHEDS. The methods presented herein provide a simple correlation-based numerical approach to calculating the estimates and reducing errors associated with spurious correlation. The use of 2J + 2 model runs to obtain double estimates provides good estimates of all the first and second-order main and total effect indices. The method is easy to implement,

Disclaimer

The U.S. Environmental Protection Agency through its Office of Research and Development partially funded the research described here under contract number EP-D-05-065 to Alion Science and Technology, Inc. It has been subjected to Agency review and approved for publication.

References (33)

P. Annoni et al.
Partial order investigation of multiple indicator systems using variance-based sensitivity analysis
Environ. Modell. Softw.
(2011)
R. Confalonieri et al.
Sensitivity analysis of the rice model WARM in Europe: exploring the effects of different locations, climates and methods of analysis on model sensitivity to crop parameters
Environ. Modell. Softw.
(2010)
R.I. Cukier et al.
Nonlinear sensitivity analysis of multiparameter model systems
J. Comput. Phy.
(1978)
V. Estrada et al.
Global sensitivity analysis in the development of first principle-based eutrophication models
Environ. Modell. Softw.
(2010)
T. Homma et al.
Importance measures in global sensitivity analysis of model output
Reliab. Eng. Sys. Saf.
(1996)
M.J.W. Jansen
Analysis of variance designs for model output
Comput. Phys. Comm.
(1999)
J. Nossent et al.
Sobol’ sensitivity analysis of a complex environmental model
Environ. Modell. Softw.
(2011)
A. Saltelli et al.
Some new techniques in sensitivity analysis of model output
Comput. Stat. Data Anal.
(1993)
A. Saltelli et al.
About the use of rank transformation in sensitivity analysis of model output. Reliability Engrg
System Safety
(1995)
A. Saltelli
Making best use of model evaluations for compute sensitivity indices
Comput. Phys. Comm.
(2002)

A. Saltelli et al.

Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index

Comput. Phys. Comm.

(2010)

A. Saltelli et al.

How to avoid a perfunctory sensitivity analysis

Environ. Modell. Softw.

(2010)

I.M. Sobol'

Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates

Math. Comput. Simul.

(2001)

I.M. Sobol' et al.

On the use of variance reducing multipliers in Monte Carlo computations of a global sensitivity index

Comput. Phys. Comm.

(1999)

I.M. Sobol' et al.

Estimating the approximation error when fixing unessential factors in global sensitivity analysis

Reliab. Eng. Sys. Saf.

(2007)

S. Tarantola et al.

Random balance designs for the estimation of first order global sensitivity indices

Reliab. Eng. Sys. Saf.

(2006)

Cited by (0)

¹: Tel.: +1 919 406 2157.

View full text

Published by Elsevier Ltd.

Estimating Sobol sensitivity indices using correlations

Abstract

Highlights

Introduction

Section snippets

Variance decomposition, main effects, total effects, and interaction terms

Methods

Results

Bootstrapping

Conclusion

Disclaimer

Environ. Modell. Softw.

Environ. Modell. Softw.

J. Comput. Phy.

Environ. Modell. Softw.

Reliab. Eng. Sys. Saf.

Comput. Phys. Comm.

Environ. Modell. Softw.

Comput. Stat. Data Anal.

System Safety

Comput. Phys. Comm.

Comput. Phys. Comm.

Environ. Modell. Softw.

Math. Comput. Simul.

Comput. Phys. Comm.

Reliab. Eng. Sys. Saf.

Reliab. Eng. Sys. Saf.