Comparing methods of randomizing Sobol′ sequences for improving uncertainty of metrics in variance-based global sensitivity estimation

doi:10.1016/j.ress.2021.107499

Reliability Engineering & System Safety

Volume 210, June 2021, 107499

https://doi.org/10.1016/j.ress.2021.107499 Get rights and content

Highlights

•
Highlight issues with using Sobol $^{'}$ sequence for estimating Sobol $^{'}$ sensitivity indices.
•
Proposed the Column Shift method for generating replicates.
•
Need to consider standard error of replicates to estimate confidence in sensitivity analysis.

Abstract

This paper introduces an alternative way of randomizing Sobol′ sequences, called the Column Shift method, for reconstructing replicates to improve estimation of the uncertainty in sensitivity indices. The Column Shift method provides reliable results when applied to variance-based sensitivity analysis of the V-function, with much higher accuracy than commonly used randomization methods in most circumstances. It also addresses the error spikes caused by determinism within the Sobol′ sequence. The Column Shift method is compared with other popular randomization methods for the Sobol′ sequence, and it is shown to be the most consistent of those tested. In addition, the inclusion of standard error in the mean of sensitivity indices in an analysis of replicates provides a good indication of underestimation of errors in simulation results. The relationship between the number of samples and replicates is also discussed.

Introduction

Good modelling practice is crucial in the modelling process as it affects the quality and relevance of a model’s outcome. Scoping, problem framing and model formulation, analysis and assessment of options, and communication of model findings are basic steps of the modelling process [1]. Indeed, quality assurance in the modelling process not only helps in determining a model’s accuracy, credibility and transparency, but it also helps end-users with recommendations for future model development and can support decision-making on the problem in question. Building models for addressing complex issues such as environmental problems needs to consider the resources available including the budget for the modelling exercise, the precise quantities of interest in the model, the limitations of the model, and the uncertainties in the results. Many papers provide good reviews of modelling practice, such as [1], [2], [3].

Sensitivity analysis is an important step in good modelling practice, as it is a study on how the uncertainty of model outputs can be attributed to the influences of various input factors (parameters and exogenous input variables), as well as the impacts of interaction between inputs [4]. To serve this purpose well, a reliable sampling method that provides a good convergence rate and coverage of parameter space is needed, and the Sobol $^{'}$ sequence is a popular and frequently used Quasi-Monte Carlo sequence for such studies.

In this paper, we examine some basic issues in using the Sobol $^{'}$ sequence and its current randomization methods for sensitivity analysis to provide some insights into its practical use. Tarantola et al. [5] used variance-based sensitivity analysis applied to the so-called V-function to compare the efficiency of randomized Sobol $^{'}$ Quasi-Monte Carlo design and Latin Supercube sampling methods. Sun et al. [6] gave an initial comparison of the efficiency of the random, Latin hypercube and Sobol $^{'}$ sampling methods applied to the variance-based sensitivity analysis method with two different total-effect estimators (Sobol $^{'}$ 2007 [7] and Jansen 1999 [8]). It was confirmed that the Sobol $^{'}$ sequence produces fewer errors in most circumstances, but a common issue existed in all the above studies as the determinism of the Sobol $^{'}$ sequence structure may lead to occasional large errors in sensitivity indices of input factors.

Inspired by these studies, we will discuss the current existing randomization methods for the Sobol $^{'}$ sequence to attempt resolving the error spike issue, leading to an alternative way of randomizing the Sobol $^{'}$ sequence to construct replicates for sensitivity analysis as recommended in this paper. Large amplitude errors will greatly impact the performance of sensitivity analysis, as inconsistent values can induce modellers or end-users to mistakenly identify insensitive inputs as sensitive or vice versa. This alternative approach requires a modest increase in computational cost to produce the replicates, but is consistent and reliable in reducing the variance of errors (i.e. the scatter in the index values derived from sampling for inputs that have identical analytical values).

The outline of this paper is as follows: Section 2 will provide a brief introduction to the variance-based sensitivity analysis method; Section 3 will present an overview of the test function and error indicators used; Section 4 will cover existing randomization methods for the Sobol $^{'}$ sequence and what the issues are for our experiments; Section 5 compares results between the original Sobol $^{'}$ sequence and randomized Sobol $^{'}$ sequences in different case scenarios, and motivates the purpose of using replicating model runs; Section 6 contains the conclusions from the results and suggests future ways forward.

Section snippets

Variance-based sensitivity analysis

There is much literature regarding how one should estimate variance-based indices [7], [8], [9], [10], [11]. In our paper, we will use the estimators described by Tarantola et al. [5]. Here, we will briefly summarize the idea of first-order and total-effect sensitivity indices based on variance decomposition methods. We define a function $f (X)$ as $f (X) = f (X_{1}, X_{2}, \dots, X_{n}),$ where the domain is a n-dimensional unit cube $Ω = {X | 0 \leq X_{i} \leq 1, i = 1 \dots n} .$ Then we can write the decomposition of function $f$ as: $f = f_{0} + \sum_{i = 1}^{n} f_{i} + \sum$

Test function and error indicators

We invoke the benchmark V-function used previously by Tarantola et al. [5] for sensitivity analysis as our test function: $Y = V (X_{1}, X_{2}, \dots, X_{k}, a_{1}, \dots, a_{k}) = \prod_{i = 1}^{k} v_{i} (X_{i}, a_{i})$ where $v_{i} (X_{i}, a_{i}) = \frac{| 4 X_{i} - 2 | + a_{i}}{1 + a_{i}} .$ The sampling points are $X_{i}, i = 1 \dots k$ , and these are obtained through the sampling designs indicated in the next section. In addition, we use coefficients $a_{i}, i = 1 \dots k$ to control whether the corresponding input is a dominated input or not.

The V-function is an attempt to combine non-smooth and smooth models, as there

Randomizing the Sobol $^{'}$ sequence

There have been many attempts to randomize Quasi-Monte Carlo sequences. Niederreiter et al. [15] have pointed out several issues with Quasi-Monte Carlo sequences, and have summarized past attempts on the randomization [16], [17]. Cranley et al. [18] initiated the idea of a random shift modulo 1 for the standard Lattice rule, then Tuffin [19] adapted the random shift modulo 1 to low-discrepancy sequences. Another popular randomization method is the Scramble method, first proposed by Owen [20],

Simulation results

Here we have used the same set of coefficient values as in Kucherenko et al. [4]. Type A is the case where the inputs can have different analytical sensitivity index values (two groups corresponding to $a_{i} = 0$ or 6.52). Type B and C have only one analytical sensitivity index value (either $a_{i} = 0$ or 6.52), so the relative absolute error does not constitute a significant difference for these cases. In this section, we concentrate on the properties of the Column Shift method.

Conclusion

For variance-based sensitivity analysis, we have compared the relative absolute errors from sampling with the Sobol $^{'}$ sequence and randomized Sobol $^{'}$ sequence methods for the test case of the benchmark V-function. A similar test was reported for the random shifted Sobol $^{'}$ sequence and Latin Supercube in Tarantola et al. [5]. Here we have modified the original test to enhance the amount of information gained from the test results. Relative measures are used here as opposed to absolute measures, and

CRediT authorship contribution statement

Xifu Sun: Conceptualization, Methodology, Software, Formal analysis, Investigation, Writing - original draft, Visualization. Barry Croke: Formal analysis, Writing - review & editing, Supervision. Stephen Roberts: Writing - review & editing, Supervision. Anthony Jakeman: Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Xifu Sun’s research was funded by a scholarship provided by the Mathematical Sciences Institute, Australia, and the Hilda John Bequest of the Australian National University .

References (37)

JakemanA.J. et al.
Ten iterative steps in development and evaluation of environmental models
Environ Model Softw
(2006)
BadhamJ. et al.
Effective modeling for Integrated Water Resource Management: a guide to contextual practices by phases and steps and future opportunities
Environ Model Softw
(2019)
KucherenkoS. et al.
The identification of model effective dimensions using global sensitivity analysis
Reliab Eng Syst Saf
(2011)
TarantolaS. et al.
A comparison of two sampling methods for global sensitivity analysis
Comput Phys Comm
(2012)
JansenM.J.
Analysis of variance designs for model output
Comput Phys Comm
(1999)
HommaT. et al.
Importance measures in global sensitivity analysis of nonlinear models
Reliab Eng Syst Saf
(1996)
SaltelliA. et al.
Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index
Comput Phys Comm
(2010)
SaltelliA.
Making best use of model evaluations to compute sensitivity indices
Comput Phys Commun
(2002)
SobolI.M. et al.
Estimating the approximation error when fixing unessential factors in global sensitivity analysis
Reliab Eng Syst Saf
(2007)
NiederreiterH.
Some current issues in quasi-Monte Carlo methods
J Complexity
(2003)

MatoušekJ.

On the L $_{2}$ -discrepancy for anchored boxes

J Complexity

(1998)

FeinbergJ. et al.

Chaospy: An open source tool for designing methods of uncertainty quantification

J Comput Sci

(2015)

QianG. et al.

Sensitivity analysis methods in the biomedical sciences

Math Biosci

(2020)

BlackD. et al.

Guidelines for water management modelling: Towards best-practice model application

(2011)

SunX. et al.

A comparison of global sensitivity techniques and sampling method

Sobol’I.M.

Global sensitivity analysis indices for the investigation of nonlinear mathematical models

Mat Model

(2007)

SobolI.M.

Sensitivity estimates for nonlinear mathematical models

Math Model Comput Exp

(1993)

CaflischR.E. et al.

Valuation of mortgage backed securities using Brownian bridges to reduce effective dimension, vol. 24

(1997)

Cited by (14)

Non-probabilistic sensitivity analysis method for multi-input-multi-output structures considering correlations
2024, International Journal of Mechanical Sciences
Sensitivity analysis is an important step to investigate the influence level of input parameters on output responses. However, many traditional sensitivity analysis methods based on probability models are difficult to apply to multi-input-multi-output (MIMO) structures with limited samples. To solve the above issues, a sensitivity analysis method for MIMO structures based on non-probabilistic (NP) variance considering correlations is proposed. Firstly, the sensitivity analysis problems can be categorized into two groups based on multiple-input-single-output (MISO) structures and MIMO structures.. Secondly, the multidimensional ellipsoidal model is adopted to quantify the NP uncertainties and correlations of input parameters with limited samples. Subsequently, the NP variance propagation equation is newly derived to evaluate the NP variance of the output responses. More importantly, the NP variance of output responses is decomposed as the NP variance and NP covariance items of each parameter and the total contribution, independent contribution, and correlated contribution are defined to quantify the influence level of each input parameter on the NP variances of output responses for MISO and MIMO structures. Finally, two numerical examples and two experimental examples are investigated to verify the accuracy and effectiveness of the proposed method.
The nonzero gain coefficients of Sobol's sequences are always powers of two
2023, Journal of Complexity
When a plain Monte Carlo estimate on n samples has variance $σ^{2} / n$ , then scrambled digital nets attain a variance that is $o (1 / n)$ as $n \to \infty$ . For finite n and an adversarially selected integrand, the variance of a scrambled $(t, m, s)$ -net can be at most $Γ σ^{2} / n$ for a maximal gain coefficient $Γ < \infty$ . The most widely used digital nets and sequences are those of Sobol'. It was previously known that $Γ ⩽ 2^{t} 3^{s}$ for any nets in base 2. For digital nets, Dick and Pillichshammer (2010) obtained the bound $2^{t + s}$ . In this paper we study digital nets in base 2 and show that $Γ ⩽ 2^{t + s - 1}$ for such nets. This bound is a simple, but apparently unnoticed, consequence of a microstructure analysis by Niederreiter and Pirsic in 2001. We obtain a sharper bound that is smaller than this for some digital nets. Our main finding is that all nonzero gain coefficients must be powers of two. A consequence of this latter fact is a simplified algorithm for computing gain coefficients of digital nets in base 2.
Identification of critical uncertain factors of distribution networks with high penetration of photovoltaics and electric vehicles
2023, Applied Energy
Citation Excerpt :
Programs are developed with Python 3.7 on a PC with an Intel Core i7-9700 3 GHz CPU and 24 GB RAM. Sobol low-discrepancy sequence is used to improve the sample efficiency [42]. The OpenDSS engine [43] is used to solve the deterministic power flow calculations.
The increasing penetration of photovoltaics and electric vehicles exacerbates uncertainties of distribution networks, resulting in serious challenges to secure operation. To tackle the volatility caused by renewable generators and charging loads, a critical uncertain factors identification method is proposed to guide the allocation of flexible resources in this paper. First, diverse uncertainties in distribution networks are quantified and the low-rank approximation (LRA) approach is proposed to evaluate system voltage risk with the consideration of multivariate uncertainties. Then, global sensitivity analysis (GSA) is put forward to identify the critical uncertain factors under independent or correlated circumstances. The guidance for flexible resource allocation is further formulated based on the rank of global sensitivities. Numerical studies on the modified IEEE 33-node and IEEE 123-node systems indicate that the proposed method can effectively deal with the high-dimensional and non-Gaussian randomness in distribution networks. The system voltage risk can be alleviated through var capacity allocation of inverters in an economic manner. In addition, the proposed method has a light computational burden compared with Monte Carlo simulation and the polynomial chaos expansion method.
Multi-method global sensitivity analysis of mathematical models
2022, Journal of Theoretical Biology
Increasingly-sophisticated parameter-sensitivity analysis techniques continue to be developed, and each technique comes with its own set of advantages and disadvantages. Selecting which parameter-sensitivity method to use for a particular model, however, is not a straightforward task. In this work, we present a multi-method framework that incorporates three global sensitivity analysis methods: two variance-based methods and one derivative-based method. The two variance-based methods are Sobol’s method and MeFAST. The derivative-based method is known as DGSM (Derivative-based Global Sensitivity Measures). MeFAST (Multi test eFAST) is a new parameter sensitivity analysis implementation we built upon the eFAST (Extended Fourier Amplitude Sensitivity Test) algorithm. The improvements incorporated into MeFAST address some important aspects of prior eFAST implementations. We present an intuitive description of each implemented algorithm along with MATLAB codes and a guide to tuning algorithm hyper-parameters for better efficiency. We demonstrate the full methodology and workflow using two example mathematical models of different complexity: the first is a model of HIV disease progression and the second is a model of tumor growth. The computational framework we provide generates graphics for visualizing and comparing the results of all three sensitivity analysis algorithms (DGSM, Sobol, and MeFAST). This algorithm output comparison tool allows one to make a more informed decision when assessing which parameters most importantly influence model outcomes.
Editorial for the special issue on “sensitivity analysis of model outputs” reliability engineering and system safety
2022, Reliability Engineering and System Safety
Research on wettability of nickel coating changes induced in the electrodeposition process
2022, Journal of Electroanalytical Chemistry
Citation Excerpt :
CAi expresses the water contact angle of the freshly prepared nickel coating after electrodeposition, the CAf expresses the water contact angle of the nickel coating after 15 days exposed in the air. Varieties of experimental design methods have been adopted to construct experimental plans such as SOBOL sequence design [22,23], DOE full factorial design [24,25], Central Composite Designs (CCD) method [26], Latin hypercube sampling (LHS) [27,28], et al. Considering the interaction of the main factors, five main parameters have been selected as independent variables, which are expressed as cs, ca, pH, cb, J, and their value limits are exhibited in Table 1.
In this study, nickel coatings were electrodeposited on copper substrate. The effect of electrodeposition process parameters on surface morphology and wettability of nickel coatings were studied, such as ions source, anode activator, pH buffer, brightener and current density. According to the experiment plan established by Central Composite Design (CCD) with five parameters as mentioned above, nickel coatings have been fabricated on the copper sample using one-step electrodeposition method. Then, response surface methodology was utilized to establish surrogate models based on the static contact angle of nickel coatings which were measured by water liquids. 3D digital microscope, scanning electron microscope (SEM) and energy dispersive X-Ray spectroscopy (EDS) showed rough hierarchical structure, surface morphology and chemical composition on the coating surface, respectively. Subsequently, a sampling-based stochastic model was devoted to weigh the effects of uncertainty in each electrodeposition process parameter on the variability of water contact angle. According to the obtained results, the concentration of brightener as well as ions source greatly affects the wettability of nickel coating. The method of uncertainty analysis can be applied to study the uncertain influence factors in electrochemistry research.

View all citing articles on Scopus

View full text

Comparing methods of randomizing Sobol′ sequences for improving uncertainty of metrics in variance-based global sensitivity estimation

Highlights

Abstract

Introduction

Section snippets

Variance-based sensitivity analysis

Test function and error indicators

Randomizing the Sobol′ sequence

Simulation results

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Environ Model Softw

Environ Model Softw

Reliab Eng Syst Saf

Comput Phys Comm

Comput Phys Comm

Reliab Eng Syst Saf

Comput Phys Comm

Comput Phys Commun

Reliab Eng Syst Saf

J Complexity

J Complexity

J Comput Sci

Math Biosci

Guidelines for water management modelling: Towards best-practice model application

A comparison of global sensitivity techniques and sampling method

Global sensitivity analysis indices for the investigation of nonlinear mathematical models

Mat Model

Sensitivity estimates for nonlinear mathematical models

Math Model Comput Exp

Valuation of mortgage backed securities using Brownian bridges to reduce effective dimension, vol. 24

Randomizing the Sobol $^{'}$ sequence