Adaptive sequential strategy for risk estimation of engineering systems using Gaussian process regression active learning

doi:10.1016/j.engappai.2018.06.007

Engineering Applications of Artificial Intelligence

Volume 74, September 2018, Pages 146-165

https://doi.org/10.1016/j.engappai.2018.06.007 Get rights and content

Abstract

Maximizing the accuracy of the estimated risk, and minimizing the number of calls to the expensive-to-evaluate deterministic model are two major challenges engineers face. Monte Carlo method is the usual method of choice for risk estimation. Since each deterministic run for a complex engineering system may require a significant amount of time, Monte Carlo method may be very time-consuming and impractical. To reduce the computational expense of Monte Carlo method, surrogate models are presented.

In this paper, an adaptive sequential strategy based on the Monte Carlo method and Gaussian process regression active learning for risk estimation of engineering systems with minimum computational cost and acceptable accuracy is presented.

The proposed adaptive sequential strategy to build designs of experiments is illustrated using a simple One-dimensional explanatory example. Then, the efficiency and accuracy of the presented method are compared with the other available methodologies using several benchmark examples from literature. Finally, the applicability of the presented method for nonlinear and high-dimensional real-world problems are studied.

Introduction

Quantitative evaluation of risk associated with an engineering system is an important part of the risk assessment of that system. Risk estimation can help engineers to understand the magnitude of the risk to make wise decisions for separating the acceptable risk from unacceptable one. The acceptable risk refers to the level of risk that can be tolerated by the final user due to the constraints such as the extra cost.

Since uncertainty is part of the engineering design, risk of failure cannot be completely mitigated. The major sources of uncertainty in engineering design are noise, error, and bias in the sample data or error in model or approximation techniques used to solve a model. Due to the ubiquitous nature of uncertainty, estimating the safety of the system in abnormal operating condition or failure environment is part of the realistic modeling of a system (Oberkampf et al., 2002). In practice, such realistic modeling of a system requires the use of complex and time-consuming mathematical models. Therefore, efficient risk estimation models need to be used for estimating the probability of failure of a system. The wide application of such methodologies in the variety of asset-intensive industries such as Manufacturing, Oil and Gas, Utilities, Chemical, and Life Sciences, motivates the authors to propose their risk estimation model.

Direct Monte Carlo method (Rubinstein, 2008) is the most robust risk estimation model since it is not dependent on the dimension and complexity of the model. However, it is computationally expensive for systems with low probability of failure. At the expense of robustness, the efficiency of the direct Monte Carlo method can be increased using variance reduction techniques (Au, 2016). In order to increase the efficiency of the Monte Carlo method, many advanced Monte Carlo methods have been proposed such as Subset Simulation (Papaioannou et al., 2015), Directional Simulation [28], [21], Spherical Subset Simulation (Katafygiotis and Cheung, 2007), the Line Sampling method (de Angelis et al., 2015) and Asymptotic Sampling (Bucher, 2009). Recently, an alternative approach based on the Gaussian process model [16], [31] is proposed. The focus of the presented study takes the latter approach. However, the proposed method in this paper has significant differences with other Gaussian process-based models in the literature. For example, Echard et al. used the First Order Reliability Method (FORM) and a variance reduction technique known as importance sampling to find the most probable failure point. Then, they used the Gaussian process to predict the outcome of the expensive to evaluate system. The performance function in FORM is approximated by the first order Taylor expansion. This assumption can be a source of error for the nonlinear systems. On the other hand, Picheny et al. used a Gaussian process regression method known as the universal kriging model. It is known that the underlying variogram for such method cannot be calculated even with known drift function for irregularly gridded data (Wackernagel, 2013, P. 305). The proposed method in this paper uses an active learning strategy coupled with a covariance-based Gaussian process model to find the failure region capable of handling nonlinear performance functions in a multidimensional feature space.

Recently, several studies on coupling Gaussian process models with sampling-based methods have been conducted as well. For instance, Huang, Allen, Notz, and Miller combined Gaussian process regression based surrogate model and multiple fidelity data to increase the efficiency of the optimization problem (Huang et al., 2006). In order to estimate the small failure probability, one commonly used method is to couple the Polynomial Response Surface Method (PRSM) to the FORM [18], [19], or Bayesian framework to FORM [17], [8], [2]. Such methodologies provide biased estimates of the probability of failure since it relies on FORM estimation of the most probable failure point. Another commonly used alternative to FORM for estimating the probability of failure of the complex systems is Kriging meta-modeling technique (Drignei, 2017). For example, using a learning function based on the probability of the metamodel classification satisfy a constraint, Echard, Gayton, and Lemaire proposed a reliability method combining Gaussian process regression and Monte Carlo method (Echard et al., 2011). Bect et al. proposed a Bayesian decision theory framework in order to derive an optimal sequential strategy for the estimation of the probability of failure (Bect et al., 2012), and Dubourg, Sudret, and Deheeger proposed to couple importance sampling and a Gaussian process regression based surrogate model to approximate a quasi-optimal importance sampling density [14], [13]. Coupling Monte Carlo simulation with active learning algorithms such as neural networks (Sener and Savarese, 2017) and support vector machine (Tong and Koller, 2001) can be used to estimate the probability of failure of the engineering systems efficiently and accurately. However, neural networks active learning suffers from lack of interpretability, and support vector machine active learning tends to yield better results for binary classification problems rather than the regression problems. Besides, although these methods are proven to give good estimation of the expected value of the hypothesis, none of them can directly provide the variance of the prediction explicitly. The interpretability and estimating the variance of the model are two key factors for selecting the proper model for risk estimation. Gaussian process regression is a highly interpretable machine learning algorithm that can provide both expected value and the variance of the model. These observations motivate the authors to propose the Gaussian process regression active learning for risk estimation of engineering systems.

Section snippets

Background information

In measure theory, sample space $Ω$ is a finite or infinite set of all possible outcomes of an experiment, and any subset of the sample space is an event. A $σ$ -algebra $F$ on a set $Ω$ is a collection of subsets of $Ω$ if it is closed under complementation, and it is closed under taking countable unions. The pair $(Ω, F)$ is called a measurable space.

A measure on $(Ω, F)$ is a map $P : F \to [0, + \infty]$ such that $P$ is countably additive for every disjoint event. If $P (Ω) = 1$ , then $P$ is called a probability measure, and the triple $(Ω,)$

Gaussian process regression

Let $Z = Z (x)$ denote a random field with $x \in D \subseteq R^{d}$ in a d dimensional metric space, and $Z (x)$ be a random variable for each $x \in D$ . Moreover, let $Z ≔ {(Z (x_{1}), Z (x_{2}), \dots, Z (x_{m}))}^{T} \in R^{m}$ be a random vector containing the observed responses at the $m \in N$ neighboring sample points. The predicted value of the random variable $Z (x)$ at the unobserved point $x_{0} \in D$ is of interest. This value should be estimated in the presence of the model assumptions that the random variable $Z (x)$ is second order intrinsically stationary with known covariance $C (h)$

Example 1: Simple one-dimensional explanatory example

The objective is to find an algorithm to numerically estimate the probability $P (X < 0)$ , where $X : Ω = R \to R$ and $X (x_{1}) = 1 - \sqrt[3]{x_{1}} + sin (x_{1}), x_{1} \sim N (0, 1)$ when $x_{1}$ is a standard normal random variable.

Monte Carlo method estimates the probability $P (X < 0) \approx I_{n} = 0.003247$ with a sample size equal to $n = 10$ ⁶. Thus, using Eq. (5), there is 3.5% error in the estimator $I_{n}$ .

The proposed methodology separates all the failed and unfailed sample points in just eight observations. The graphical representation of this limit state function is shown

Conclusion

In this paper, an adaptive sequential strategy based on the Monte Carlo method and Gaussian process regression to build an active learning algorithm for performing risk estimation of engineering systems is presented. First, a Gaussian process regression is presented. Then, an active learning strategy is proposed. This sequential strategy consists of selection of initial training points, selection of the new training points, probability estimator function, and a stopping criterion.

Applying the