Combining screening and metamodel-based methods: An efficient sequential approach for the sensitivity analysis of model outputs

https://doi.org/10.1016/j.ress.2014.08.009Get rights and content

Highlights

  • Quasi-OTEE and Kriging-based SA are reviewed and combined as the sequential SA.

  • The sequential SA produces similar results as the analytical calculations and variance-based SA.

  • The sequential SA takes over 50 times less computational cost than the variance-based SA.

  • An efficient SA for high-dimensional and computationally expensive models.

Abstract

Sensitivity analysis (SA) is able to identify the most influential parameters of a given model. Application of SA is usually critical for reducing the complexity in the subsequent model calibration and use. Unfortunately it is hardly applied, especially when the model is in the form of a computationally expensive black-box computer program. A possible solution concerns applying SA to the metamodel (i.e., an approximation of the computationally expensive model) instead. Among the other options, the use of Gaussian process metamodels (also known as Kriging metamodels) has been recently proposed for the SA of computationally expensive traffic simulation models. However, the main limitation of this approach is its dependence on the model dimensionality. When the model is high-dimensional, the estimation of the Kriging metamodel may still be problematic due to its high computational cost.

In order to overcome this problem, in the present paper, the Kriging-based approach has been combined with the quasi-optimized trajectory based elementary effects (quasi-OTEE) approach for the SA of high-dimensional models. The quasi-OTEE SA is used first to screen the influential and non-influential parameters of a high-dimensional model; then the Kriging-based SA is used to calculate the variance-based sensitivity indices, and to rank the most influential parameters in a more accurate way. The application of the proposed sequential SA is illustrated with several numerical experiments. Results show that the method can properly identify the most influential parameters and their ranks, while the number of model evaluations is considerably less than the variance-based SA (e.g., in one of the tests the sequential SA requires over 50 times less model evaluations than the variance-based SA).

Introduction

Simulation models are widely used in various scientific disciplines nowadays for e.g., system design, evaluation and optimization purposes, etc. The reliability of the simulation results always depends on the quality of the calibration. Hence, the calibration is essential, yet it is usually complicated when the model itself contains hundreds or even thousands of parameters (i.e., high dimensional), and/or when running the model is computationally expensive.

Due to certain constrains in computation time and other resources, when dealing with a complex model that is high-dimensional and computationally expensive, one feasible solution for reducing the complexity in model calibration is to calibrate only the most influential input parameters, i.e., the parameters whose variations are expected to have significant impacts on the model output. In this way, it is expected that the model outputs can be efficiently adjusted towards the correct values by fine-tuning the influential parameters. The proper approach to identify the influential and non-influential parameters is sensitivity analysis (SA).

SA explores the relationship between model outputs and input parameters [1]. A proper SA could provide qualitative and/or quantitative information regarding the effects of different model parameters (and their variations) on the model outputs. Such information can be used to eliminate the least relevant parameters in the subsequent calibration, and help practitioners to better understand both the model and its parameters, especially when the model is high-dimensional or behaves like a “black-box”.

Due to its importance, SA has been extensively developed in the last decades [1]. Some of the widely known SA methods are briefly described below.

This method uses the one-at-a-time (OAT) design, i.e., varying one parameter at a time while keeping all other parameters fixed to a nominal value. The sensitivity measures of that varying parameter are estimated via computing the corresponding partial derivatives of the model response. This method just requires a few model evaluations for estimating the derivatives. However, it is not able to detect the interaction effects among parameters [2], and the derivatives are only informative at some fixed nominal points in the input space [3]. Some studies such as [2], [3], [4] and [5] have proposed approaches in which multi-dimensional averaging of the derivatives is used to explore the interaction effects. In this way the global sensitivity measures can be obtained.

This method is typically employed to identify non-influential parameters of a model. Examples of this kind of method can be found in [2], [6], [7], and [8]. The screening method usually requires a relatively low computational cost for running the model. This feature makes it quite attractive especially for complex models. It can also be used to prune the number of parameters to be considered, before applying a more complicated method such as the variance-based method. One drawback of the screening method is, according to Kucherenko et al. [3], that it is not able to provide straightforward information regarding the total effects (for details see the review in Section 2.1), and it lacks accuracy in ranking the parameters if compared with the variance-based method.

This method decomposes the total variance of the model outputs into the conditional variance of each individual parameter, and uses this measure to represent the importance of the parameter (for details see the review in Section 2.2). The development of the variance-based method can be found in [9], [10], [11], [12], [13]. The variance-based method is one of the best available methods today to compute the sensitivity indices purely based on model evaluation [1]. However, to achieve a good estimation of those quantitative sensitivity indices, a large number of model evaluations is usually required. Although an improved sampling method was developed in [14] to enhance its efficiency, the high computation demand still makes this method less practical for large computational models [2], [3].

A metamodel is an abstraction of the original model. When the original model behaves like a black-box, and/or when it has a very high cost to run, the metamodel can be used to approximate the original model (more details are given in Section 2.2). Since the metamodel itself is usually computationally cheap, the variance-based sensitivity indices can be efficiently estimated based on the metamodel rather than the original model. Examples of applying metamodels to estimate the total sensitivity indices can be found in [15], [16]. Most efforts are spent on developing the metamodel (e.g., mapping all possible interactions [17]), and calibrating the metamodel. As these efforts are generally dependent on the number of parameters contained in the model [17], the computational cost can still be huge when the original model contains many parameters. In addition, when the model itself is high-dimensional and the interactions among the parameters are not negligible, it is also difficult to achieve a perfect estimation of the metamodel.

The use of any specific SA method is highly related to the model to be analyzed, and the goal of the analysis [18]. Therefore, there is no universal SA method to fit all possible needs. As for complex models, it is important that the specific SA method should consider both accuracy and efficiency. However, it seems that none of the above methods can fully satisfy this requirement if they are used alone: the derivative-based and screening-based methods lack accuracy in estimating the total effects or the interaction effects, while the variance-based and metamodel-based methods are usually computationally expensive when used on complex models. To achieve correct and feasible SA for complex models, specifically high-dimensional and computationally expensive models, in this paper we propose a novel method that combines two recently developed global SA approaches, namely, the quasi-optimized trajectory based elementary effects (quasi-OTEE) approach, and the Kriging-based approach.

The quasi-OTEE approach belongs to the category of screening method. It was introduced in [19] and [20] based on the elementary effects (EE) method [6], but with much higher efficiency. The two validation experiments and the case study provided in [19] and [20] demonstrated that with a small number of model evaluations, this tool can properly identify the non-influential parameters from a computationally expensive model, for which other quantitative SA techniques are not feasible to be applied at the beginning. For example, in [20] it was shown that the quasi-OTEE approach yielded similar results to those obtained with the OTEE method in [7], but only required a small fraction of its computation time.

The Kriging-based approach belongs to the family of metamodel-based method. It adopts Sobol indices [1] calculated on a Kriging approximation of the simulation model. This method has been presented in [21], where a robust Kriging emulator was obtained based on the recursive use of the DACE tool [22]. Effectiveness of the method was also proven in [21]. The authors showed that the variance-based sensitivity indices estimated based on the Kriging emulator were approximately identical to those derived by the complete variance-based approach described in [1]. However, the Kriging-based SA only required 512 model evaluations, while the variance-based SA took almost 40,000 model evaluations.

These two approaches were successfully but individually used in previous studies for complex simulation models [19], [21]. In the comparison study [23], it was found that the quasi-OTEE SA is more advanced in identifying influential and non-influential parameters, while the Kriging-based SA has a higher accuracy in ranking parameters according to their sensitivity indices. To fully exploit their own strengths, it is reasonable and practical to sequentially apply these two approaches: the quasi-OTEE is used at first for screening non-influential parameters, and the Kriging-based approach is applied in the second stage for calculating the variance-based sensitivity indices, and ranking the most influential parameters.

In this paper we perform several numerical experiments on different test functions to evaluate the performance of the proposed approach. The test functions employed in the numerical experiments are commonly accepted benchmark functions for testing SA methods. The number of parameters in the SA ranges between 12 and 20 depending on the test functions, which are generally sufficient to define a model as high-dimensional. On the contrary, the test functions themselves are not strictly computationally expensive. Since the computation time is not necessarily related to the complexity of the model, we argue that the efficiency of the SA method is assessed in terms of the number of required model evaluations rather than the total computation time. In any case, it is obvious that the total computation time is proportional to the number of model evaluations.

In the numerical experiments, the sequential SA method is applied to screen and rank the most influential parameters in the chosen test functions, and the results are compared with the true results obtained from either analytical calculations or from a standard variance-based SA. It is found that the sequential SA method is able to derive an estimation of the variance-based sensitivity indices, which are very close to the theoretical values, for the most influential parameters. In addition, the proposed approach is proven to be much more efficient than a standard variance-based approach.

The paper is organized as follows. A brief review of the quasi-OTEE and Kriging-based approach is presented in Section 2. Following the review, the details about the numerical experiments for the SA are introduced in Section 3, and the results from the experiments are discussed in Section 4. Conclusions are given in Section 5.

Section snippets

Review of the two SA methods

A brief review of the above mentioned quasi-OTEE approach [20] and the Kriging-based approach [21] is provided in this section. For more details about the two approaches the interested readers may refer to [23], where a similar, but more in-depth review is given. Here, we summarize again the main features of each approach for the reader׳s convenience.

Test functions

In this study we include several numerical experiments to demonstrate and test the proposed sequential SA approach. To this end, we have chosen 4 different test functions that are commonly used as benchmarks in evaluating the SA approaches (e.g., [7], [3], [2], [17], [37]). Moreover, all of them are, loosely speaking, high-dimensional functions with more than 10 parameters. Below is a brief introduction of the test functions used in this study.

Results of the quasi-OTEE SA

The results of the quasi-OTEE SA for all tests are shown in Fig. 1a–e. The details are given below.

Conclusions

In this paper we proposed a new method that sequentially apply the quasi-OTEE and the Kriging-based approach, for the SA of high-dimensional and computationally expensive models. The influential parameters are first screened by the quasi-OTEE approach. Then based on the screening results, the Kriging-based approach is applied to further identify the rank of the most influential parameters.

If compared with a standalone screening method that can only provide qualitative information about

Acknowledgment

Research contained within this paper benefited from participation in the EU COST Action TU0903 – Methods and tools for supporting the Use caLibration and validaTIon of Traffic simUlation moDEls (MULTITUDE).

References (40)

  • A. Saltelli et al.

    Variance based sensitivity analysis of model output: design and estimator for the total sensitivity index

    Comput Phys Commun

    (2010)
  • M.V. Ruano et al.

    An improved sampling strategy based on trajectory design for application of the Morris method to systems with many input factors

    Environ Model Softw

    (2012)
  • I.M. Sobol׳

    Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates

    Math Comput Simul

    (2001)
  • A. Marrel et al.

    An efficient methodology for modeling complex computer codes with Gaussian processes

    Comput Stat Data Anal

    (2008)
  • S. Kucherenko et al.

    The identification of model effective dimensions using global sensitivity analysis

    Reliab Eng Syst Saf

    (2011)
  • I.M. Sobol׳

    Uniformly distributed sequences with additional uniformity properties

    USSR Comput Math Math Phys

    (1976)
  • A. Saltelli et al.

    Global sensitivity analysis – the primer

    (2008)
  • M.D. Morris

    Factorial sampling plans for preliminary computational experiments

    Technometrics

    (1991)
  • I.M. Sobol׳

    Sensitivity analysis for nonlinear mathematical models

    Math Models Comput Exp

    (1993)
  • A. Saltelli et al.

    A quantitative model-independent method for global sensitivity analysis of model output

    Technometrics

    (1999)
  • Cited by (49)

    • Screening analysis and unconstrained optimization of a small-scale vertical axis wind turbine

      2022, Energy
      Citation Excerpt :

      Following the GCI analysis, a numerical validation was performed using experimental data from Castelli et al. [7]. A quasi-optimal sampling [28] design of experiment (DOE) was generated to maximize the distance between trajectories for a better screening analysis via Morris' method [20] and also for RS training [23]. Additional cases were created via Uniform Latin Hypercube Sampling (ULHS) [29] to assess the RS quality.

    • A microsimulation based analysis of the price of anarchy in traffic routing: The enhanced Braess network case

      2022, Journal of Intelligent Transportation Systems: Technology, Planning, and Operations
    View all citing articles on Scopus
    View full text