Elsevier

Environmental Modelling & Software

Volume 37, November 2012, Pages 157-166
Environmental Modelling & Software

Estimating Sobol sensitivity indices using correlations

https://doi.org/10.1016/j.envsoft.2012.03.014Get rights and content

Abstract

Sensitivity analysis is a crucial tool in the development and evaluation of complex mathematical models. Sobol's method is a variance-based global sensitivity analysis technique that has been applied to computational models to assess the relative importance of input parameters on the output. This paper introduces new notation that describes the Sobol indices in terms of the Pearson correlation of outputs from pairs of runs, and introduces correction terms to remove some of the spurious correlation. A variety of estimation techniques are compared for accuracy and precision using the G function as a test case.

Highlights

► We present Sobol's method in terms of correlation coefficients. ► We derive new error correction terms for the method that remove spurious correlation. ► Multiple estimation methods are compared for precision, accuracy, and efficiency.

Introduction

Sobol's method is a global sensitivity analysis (SA) technique which determines the contribution of each input (or group of inputs) to the variance of the output. The usual Sobol sensitivity indices include the main and total effects for each input, but the method can also provide specific interaction terms, if desired. Sobol's method was originally presented in Russian by Sobol' (1990); the article was reprinted in English in Sobol', (1993). The method is notable because it works well without simplifying approximations, even for models with very large numbers of random variables. The method is superior to traditional sensitivity methods (such as local methods that examine parameters one at a time) when considering cases where the assumption of linearity is invalid (Saltelli and Annoni, 2010), and has been shown to be robust (Yang, 2011). Sobol's method has been applied successfully to complex environmental models to identify critical input parameters and major sources of uncertainty (e.g., Nossent et al., 2011; Vezzaro and Mikkelsen, 2012; Confalonieri et al., 2010; Estrada and Diaz, 2010). In addition, the method has been used as a basis for multiple criteria analyses (Annoni et al., 2011).

In this paper, new formulations of Sobol indices in terms of Pearson correlation coefficients are presented. These formulations suggest the inclusion of “correction terms” that remove some spurious correlation. Such correction terms are presented, and multiple estimation methods are compared for precision, accuracy, and efficiency, using the G function as a test case. The G function has the advantage that the theoretical values for all the sensitivity indices are known, so the accuracy of various estimation techniques can be evaluated.

Section snippets

Variance decomposition, main effects, total effects, and interaction terms

Many SA methods are based on an analysis of the variance of the model output (Chan et al., 2000); the theoretical basis of several of these methods is variance decomposition. Such techniques include Fourier Amplitude Sensitivity Test (FAST; Cukier et al., 1973, Cukier et al., 1978; Saltelli et al., 1999), High Dimensional Model Representation (HDMR; Rabitz and Alis, 1999), random balance designs (Tarantola et al., 2006a), and traditional ANOVA methods. In variance decomposition, the model

Methods

Computing the Sobol indices numerically requires evaluating the Pearson correlation coefficients between the output vectors from pairs of model runs. There are multiple variations on either Equation (3) which uses the “raw” model output, or on Equation (5) which uses standardized output. We present equations for a total of 12 previously published and new methods in this section. These equations are evaluated using the “G function” (Davis and Rabinowitz, 1984).

Results

Summary statistics from all the test runs are presented in Table 2. The prefix ‘avg’ indicates that the reported statistics are averaged across the 12 first-order indices (6 main and 6 total effects). Emad is the mean absolute deviation of the 50,000 estimates from the theoretical value, Estd is the standard deviation of the estimates, Serr is the standard error in the mean, and Aerr is the absolute deviation of the overall mean from the theoretical value.

The standardized methods B1–B3 score

Bootstrapping

For Table 3, the full analysis of 2J + 2 model runs was performed M = 50,000 with different random samples, with each of these sets producing N = 200,000 samples of model output. For all practical purposes, the mean of M sets (each producing N estimates) provides the same precision as one set producing (M*N) estimates. However, the former approach has two advantages: first, one can calculate not only the mean value for each index, but also the standard error of the mean. Second, breaking the

Conclusion

Sobol's method of sensitivity analysis is well suited to high-dimensional stochastic computer models, and has been successfully implemented in SHEDS. The methods presented herein provide a simple correlation-based numerical approach to calculating the estimates and reducing errors associated with spurious correlation. The use of 2J + 2 model runs to obtain double estimates provides good estimates of all the first and second-order main and total effect indices. The method is easy to implement,

Disclaimer

The U.S. Environmental Protection Agency through its Office of Research and Development partially funded the research described here under contract number EP-D-05-065 to Alion Science and Technology, Inc. It has been subjected to Agency review and approved for publication.

References (33)

Cited by (0)

1

Tel.: +1 919 406 2157.

View full text