Comparing methods of randomizing Sobol′ sequences for improving uncertainty of metrics in variance-based global sensitivity estimation

https://doi.org/10.1016/j.ress.2021.107499Get rights and content

Highlights

  • Highlight issues with using Sobol sequence for estimating Sobol sensitivity indices.

  • Proposed the Column Shift method for generating replicates.

  • Need to consider standard error of replicates to estimate confidence in sensitivity analysis.

Abstract

This paper introduces an alternative way of randomizing Sobol′ sequences, called the Column Shift method, for reconstructing replicates to improve estimation of the uncertainty in sensitivity indices. The Column Shift method provides reliable results when applied to variance-based sensitivity analysis of the V-function, with much higher accuracy than commonly used randomization methods in most circumstances. It also addresses the error spikes caused by determinism within the Sobol′ sequence. The Column Shift method is compared with other popular randomization methods for the Sobol′ sequence, and it is shown to be the most consistent of those tested. In addition, the inclusion of standard error in the mean of sensitivity indices in an analysis of replicates provides a good indication of underestimation of errors in simulation results. The relationship between the number of samples and replicates is also discussed.

Introduction

Good modelling practice is crucial in the modelling process as it affects the quality and relevance of a model’s outcome. Scoping, problem framing and model formulation, analysis and assessment of options, and communication of model findings are basic steps of the modelling process [1]. Indeed, quality assurance in the modelling process not only helps in determining a model’s accuracy, credibility and transparency, but it also helps end-users with recommendations for future model development and can support decision-making on the problem in question. Building models for addressing complex issues such as environmental problems needs to consider the resources available including the budget for the modelling exercise, the precise quantities of interest in the model, the limitations of the model, and the uncertainties in the results. Many papers provide good reviews of modelling practice, such as [1], [2], [3].

Sensitivity analysis is an important step in good modelling practice, as it is a study on how the uncertainty of model outputs can be attributed to the influences of various input factors (parameters and exogenous input variables), as well as the impacts of interaction between inputs [4]. To serve this purpose well, a reliable sampling method that provides a good convergence rate and coverage of parameter space is needed, and the Sobol sequence is a popular and frequently used Quasi-Monte Carlo sequence for such studies.

In this paper, we examine some basic issues in using the Sobol sequence and its current randomization methods for sensitivity analysis to provide some insights into its practical use. Tarantola et al. [5] used variance-based sensitivity analysis applied to the so-called V-function to compare the efficiency of randomized Sobol Quasi-Monte Carlo design and Latin Supercube sampling methods. Sun et al. [6] gave an initial comparison of the efficiency of the random, Latin hypercube and Sobol sampling methods applied to the variance-based sensitivity analysis method with two different total-effect estimators (Sobol 2007 [7] and Jansen 1999 [8]). It was confirmed that the Sobol sequence produces fewer errors in most circumstances, but a common issue existed in all the above studies as the determinism of the Sobol sequence structure may lead to occasional large errors in sensitivity indices of input factors.

Inspired by these studies, we will discuss the current existing randomization methods for the Sobol sequence to attempt resolving the error spike issue, leading to an alternative way of randomizing the Sobol sequence to construct replicates for sensitivity analysis as recommended in this paper. Large amplitude errors will greatly impact the performance of sensitivity analysis, as inconsistent values can induce modellers or end-users to mistakenly identify insensitive inputs as sensitive or vice versa. This alternative approach requires a modest increase in computational cost to produce the replicates, but is consistent and reliable in reducing the variance of errors (i.e. the scatter in the index values derived from sampling for inputs that have identical analytical values).

The outline of this paper is as follows: Section 2 will provide a brief introduction to the variance-based sensitivity analysis method; Section 3 will present an overview of the test function and error indicators used; Section 4 will cover existing randomization methods for the Sobol sequence and what the issues are for our experiments; Section 5 compares results between the original Sobol sequence and randomized Sobol sequences in different case scenarios, and motivates the purpose of using replicating model runs; Section 6 contains the conclusions from the results and suggests future ways forward.

Section snippets

Variance-based sensitivity analysis

There is much literature regarding how one should estimate variance-based indices [7], [8], [9], [10], [11]. In our paper, we will use the estimators described by Tarantola et al. [5]. Here, we will briefly summarize the idea of first-order and total-effect sensitivity indices based on variance decomposition methods. We define a function f(X) as f(X)=f(X1,X2,,Xn),where the domain is a n-dimensional unit cube Ω={X|0Xi1,i=1n}.Then we can write the decomposition of function f as: f=f0+i=1nfi+

Test function and error indicators

We invoke the benchmark V-function used previously by Tarantola et al. [5] for sensitivity analysis as our test function: Y=V(X1,X2,,Xk,a1,,ak)=i=1kvi(Xi,ai)where vi(Xi,ai)=|4Xi2|+ai1+ai.The sampling points are Xi,i=1k, and these are obtained through the sampling designs indicated in the next section. In addition, we use coefficients ai,i=1k to control whether the corresponding input is a dominated input or not.

The V-function is an attempt to combine non-smooth and smooth models, as there

Randomizing the Sobol sequence

There have been many attempts to randomize Quasi-Monte Carlo sequences. Niederreiter et al. [15] have pointed out several issues with Quasi-Monte Carlo sequences, and have summarized past attempts on the randomization [16], [17]. Cranley et al. [18] initiated the idea of a random shift modulo 1 for the standard Lattice rule, then Tuffin [19] adapted the random shift modulo 1 to low-discrepancy sequences. Another popular randomization method is the Scramble method, first proposed by Owen [20],

Simulation results

Here we have used the same set of coefficient values as in Kucherenko et al. [4]. Type A is the case where the inputs can have different analytical sensitivity index values (two groups corresponding to ai=0 or 6.52). Type B and C have only one analytical sensitivity index value (either ai=0 or 6.52), so the relative absolute error does not constitute a significant difference for these cases. In this section, we concentrate on the properties of the Column Shift method.

Conclusion

For variance-based sensitivity analysis, we have compared the relative absolute errors from sampling with the Sobol sequence and randomized Sobol sequence methods for the test case of the benchmark V-function. A similar test was reported for the random shifted Sobol sequence and Latin Supercube in Tarantola et al. [5]. Here we have modified the original test to enhance the amount of information gained from the test results. Relative measures are used here as opposed to absolute measures, and

CRediT authorship contribution statement

Xifu Sun: Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing - original draft, Visualization. Barry Croke: Formal analysis, Writing - review & editing, Supervision. Stephen Roberts: Writing - review & editing, Supervision. Anthony Jakeman: Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Xifu Sun’s research was funded by a scholarship provided by the Mathematical Sciences Institute, Australia, and the Hilda John Bequest of the Australian National University .

References (37)

  • MatoušekJ.

    On the L2-discrepancy for anchored boxes

    J Complexity

    (1998)
  • FeinbergJ. et al.

    Chaospy: An open source tool for designing methods of uncertainty quantification

    J Comput Sci

    (2015)
  • QianG. et al.

    Sensitivity analysis methods in the biomedical sciences

    Math Biosci

    (2020)
  • BlackD. et al.

    Guidelines for water management modelling: Towards best-practice model application

    (2011)
  • SunX. et al.

    A comparison of global sensitivity techniques and sampling method

  • Sobol’I.M.

    Global sensitivity analysis indices for the investigation of nonlinear mathematical models

    Mat Model

    (2007)
  • SobolI.M.

    Sensitivity estimates for nonlinear mathematical models

    Math Model Comput Exp

    (1993)
  • CaflischR.E. et al.

    Valuation of mortgage backed securities using Brownian bridges to reduce effective dimension, vol. 24

    (1997)
  • Cited by (14)

    • Identification of critical uncertain factors of distribution networks with high penetration of photovoltaics and electric vehicles

      2023, Applied Energy
      Citation Excerpt :

      Programs are developed with Python 3.7 on a PC with an Intel Core i7-9700 3 GHz CPU and 24 GB RAM. Sobol low-discrepancy sequence is used to improve the sample efficiency [42]. The OpenDSS engine [43] is used to solve the deterministic power flow calculations.

    • Research on wettability of nickel coating changes induced in the electrodeposition process

      2022, Journal of Electroanalytical Chemistry
      Citation Excerpt :

      CAi expresses the water contact angle of the freshly prepared nickel coating after electrodeposition, the CAf expresses the water contact angle of the nickel coating after 15 days exposed in the air. Varieties of experimental design methods have been adopted to construct experimental plans such as SOBOL sequence design [22,23], DOE full factorial design [24,25], Central Composite Designs (CCD) method [26], Latin hypercube sampling (LHS) [27,28], et al. Considering the interaction of the main factors, five main parameters have been selected as independent variables, which are expressed as cs, ca, pH, cb, J, and their value limits are exhibited in Table 1.

    View all citing articles on Scopus
    View full text