Elsevier

Neurocomputing

Volume 369, 5 December 2019, Pages 11-28
Neurocomputing

Robust inferential sensor development based on variational Bayesian Student’s-t mixture regression

https://doi.org/10.1016/j.neucom.2019.08.039Get rights and content

Abstract

Owing to the requirements of various product grades or operation conditions, most of industrial processes work with multiple modes. Gaussian mixture regression (GMR) is one of the most widely adopted methods to develop inferential sensors for these processes. However, outliers exist widely due to data points that are incorrectly observed, recorded, or imported and are very hard to be completely recognized and removed. These outliers render the predictive performance of the GMR-based inferential sensors quite disappointing. Aiming at resolving this problem, we propose a robust inferential sensing method based on variational Bayesian Student’s-t mixture regression (VBSMR). In the VBSMR, we first explicitly consider the dependency of quality variables on process variables, where the Bayesian regularization is enabled to find the regression coefficients, and the Student’s-t distributions are introduced for handling outliers. Subsequently, a computationally efficient parameter learning procedure for the VBSMR using the variational Bayesian expectation maximization (VBEM) technique is developed, where the optimal number of mixing components can be automatically determined. Experiments conducted on both numerical and practical industrial examples are provided to demonstrate the availability and flexibility of the developed inferential sensor.

Introduction

In process industries, a great many of measuring instruments are installed to gather data for real-time monitoring and control [1]. Nevertheless, lots of key quality variables, such as the melt index of polypropylene, the content of butane, the endpoint of crude oil, the concentration of oxygen, just to name a few, are very difficult to measure in real time by these traditional sensors [2]. Measuring the values of these quality variables is often completed by either using the expensive online analyzer, for example, mass spectrometer which causes high investment costs, or testing in the laboratory which results in large time delays [3]. On the other side, there are a great number of process variables that can be acquired quite easily and cheaply, such as temperatures, flow rates, and pressures. These easy-to-measure variables (called secondary variables) are closely related to the difficult-to-measure quality variables (called primary variables). Therefore, mathematical models can be constructed to capture the dependency of the primary variables on the secondary variables, and then be applied to infer the primary variables. Such a mathematical model is referred to as the ‘inferential sensor’, and has the merits of being low in cost, easy to maintain and free of measurement delay. Therefore, inferential sensors have been widely investigated and successfully applied to industrial processes for the purpose of monitoring and control over the past several decades [1], [4], [5]. Note that the idea of inferential sensor is also popular in non-industrial fields. For example, the orthogonal forward regression (ORF)-based proxy measurement model proposed by Guo et al. established the mathematical relationship between the vertical ground reaction forces (vGRF) and wearable accelerometer signals. The experimental results were evaluated by dataset collected from individual persons’ walking states, which demonstrate that the developed dynamic model can successively improve the estimation accuracy [6]. In fact, many applications, including the above proxy measurement and quality variable prediction share one task, i.e., the regression task.

Generally speaking, inferential sensors can be categorized into two types: the first-principle-based sensors [7] and data-driven sensors [8]. Because modern industrial processes are growing complicated, the first-principle-based sensors cannot acquire explicit evolution presentations for model dynamics. By benefiting from the distributed control systems and the large-capacity database techniques, abundant process data that reveal the real status of industrial operations could be gathered through apparatuses at the spot of industry [8], [9]. Therefore, data-driven inferential sensors have drawn increasing popularity and attentions in recent years. Over the past few years, lots of algorithms have been designed and used to build inferential sensor models. Partial least squares (PLS) [10] and principle component regression (PCR) [11] are commonly employed to describe linear relationships between the primary and secondary variables. To tackle process nonlinearities, applying the support vector machines (SVMs) [12] and artificial neural networks (ANNs) [13] to develop nonlinear inferential sensors has also been systematically studied. In addition, comprehensive reviews of the methods and applications of inferential sensors in process industries are easily acquired from which one can learn more about inferential sensors [4], [14].

Because of reasons such as multiple product grades or operating conditions, most of industrial processes work with multiple modes, which means a single inferential sensor model may fail to achieve satisfactory performance [15], [16], [17]. These processes are usually characterized by strong nonlinearity and non-Gaussianity. Aiming at modeling these multimode processes for inferential sensor development, finite mixture models (FMM) have been studied systematically and widely used. The most widespread adoption of the FMM family is perhaps the Gaussian mixture models (GMM), which linearly combines a group of Gaussian distributions to approximate complex non-Gaussian distributions. The GMM and its variants have shown more and more potential in the field of both process monitoring and quality forecasting [15], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28]. For instance, Yu et al. and Liu et al. have applied the GMM successfully to multimode process monitoring and multiphase batch process monitoring [18], [19], respectively. For the purpose of estimating quality variables, the GMM can be classified into two types. The first type includes inferential sensors based on a two-step strategy: mode identification and regression model construction. First, the GMM is used to cluster the data into several modes in the input space; subsequently, a regression model such as kernel PLS [20] and Gaussian process regression (GPR) [21] can be built in each mode. In addition, in [22], Fan et al. introduced the GMM into a just-in-time learning framework, and achieved more reliable prediction performance. However, solely considering the input space in the first step ignores the important information of the output space. Another type mainly includes Gaussian mixture regression (GMR) and its plentiful variants. In [23], Yuan et al. treated the input and output spaces together, rather than separately, for the joint probability density function (p.d.f.) estimations. The functional dependency of the target variables on the secondary variables can then be derived straightly from their joint p.d.f. This procedure has proven superior over the two-step procedure mentioned above, but the higher predictive accuracy depends on plenty of labeled samples. However, labeled samples in the inferential sensor application could be rare because of expensive cost and large time delay by labeling samples, making the performance of GMR-based inferential sensors disappointing. In contrast, there are many unlabeled samples that can be easily collected. To tackle this problem, Shao et al. developed the semisupervised GMR (S2GMR) and semisupervised Dirichlet process GMR (S2DPGMR), respectively, where the useful information of unlabeled samples is taken into account [15], [26]. To improve the computational efficiency for large-scale data modeling, scalable semisupervised GMM was designed in [27], and significant improvement in computational efficiency was demonstrated. Furthermore, with the usage of Bayesian treatment, the variational Bayesian GMR (VBGMR) was developed to realize automatic determination of the best mixing components number [28]. These studies vastly enrich the treasury of GMR-related methods.

Although inferential sensors based on GMR or its variants have delivered satisfactory performance, some researchers have recently demonstrated that the performance of GMM will significantly deteriorated by the presence of outliers. This is because learning the parameters of the GMM is very sensitive to outliers, which leads to significant distortion of the estimated p.d.f. over variables of interest or excessive components for explaining the information associated with outliers [29], [30], [31]. Outliers are those measurements that seem to diverge noticeably from the statistical ranges of gathered data [32]. In industrial datasets, outliers exist widely due to measurement data that are incorrectly observed, recorded, or imported [33], leading to skewed parameter estimation and plant-model mismatch for statistical analysis. Basically there are two types of outliers, namely conspicuous outliers and in-distinctive outliers. The conspicuous outliers are those which are readily detected and removed, such as data points containing values that are beyond their physical limitations. In contrast, the in-distinctive outliers are difficult to identify and address especially for those multimode processes where the outliers might be misunderstood as an ‘error mode’ by the training algorithm. What’s worse, normal samples can mistakenly be classified as indistinctive outliers and simply discarded. This situation can very likely cause loss of useful information and distort the original data distribution when the available samples are insufficient [34].

To be robust against outliers, researchers have put forward the Student’s-t mixture model (SMM) to overcome the shortcoming of GMM, as the SMM can tender stronger robustness with respect to outliers through heavier tails [35]. The heavier tails of the SMM step from an important parameter ν (referred to as the ‘degrees of freedom’) in the Student’s-t distribution. Recently, the SMM has been proven to achieve much better performance than GMM in various applications such as automatic gesture recognition [36], medical image segmentation [37], and persons’ fall detection [38]. Bayesian strategies for estimating probability density and clustering by utilizing mixture distributions enable automatic model selection (i.e., the determination of optimal component number under the FMM framework), thus the variational Bayesian SMM (VBSMM) has been widely studied and achieves relatively satisfactory results [39], [40], [41]. However, these studied and applied models based on the SMM are unsupervised, which merely uses the input information while neglects the supervision information. On the other hand, the inferential sensor model is supervised, where the supervision information (i.e., the samples of primary variables) is very important. Therefore, the merits of the unsupervised SMM or VBSMM in anti-outlier can not be directly used to develop robust inferential sensors.

Therefore, the motivation of this paper is to develop a supervised SMM-based inferential sensing approach (which we refer to as the ‘variational Bayesian Student’s-t mixture regression’), such that the merit of the FMM in dealing with multimode characteristics and the advantage of the Student’s-t distribution in addressing outliers can be both absorbed. Specifically, in the VBSMR, the functional dependency of primary variables on the secondary variables is taken into consideration. In addition, a variational Bayesian expectation maximization (VBEM)-based parameter learning algorithm for training the VBSMR is developed. Note that under the framework of variational Bayesian inference, all variables, including latent or hidden variables and model parameters, are treated as random variables and are assigned with corresponding prior distribution; for example, the Dirichlet distribution is chosen to govern the mixing coefficients. There are some substantial advantages in developing inferential sensors based on the VBSMR. First, the singularities that arise in inverting covariance matrices can be avoided by the Bayesian treatment. Second, over-fitting problem can be effectively mitigated by integrating out model parameters. Third, the Bayesian treatment is capable of automatically determining the best component number without resorting to techniques such as cross-validation [28], [42].

The remainder of this paper is structured as follows. Section 2 briefly introduces the Student’s-t distribution and shows the difference between the Student’s-t distribution and Gaussian distribution, followed by a detailed explanation of the VBSMR and of how to learn the model parameters as well as how to develop the VBSMR-based inferential sensor in Section 3. Then, in Section 4, two cases, consisting of a synthetic example and an actual industrial process, are supplied to validate the availability and flexibility of the VBSMR. Finally, conclusions and focuses in the future works are made.

Section snippets

Student’s-t distribution

The p.d.f. of Student’s-t distribution is given bySt(x|μ,Λ,ν)=Γ(ν/2+d/2)|Λ|1/2Γ(ν/2)(νπ)d/2×(1+(xμ)TΛ(xμ)ν)(ν+d)/2where μ represents the mean vector, Λ represents the precision matrix (which to some extent is like the inverse covariance for the Gaussian distribution), ν is called the degrees of freedom, d represents the dimensionality of the variable vector x and Γ(t)=0zt1ezdz denotes the Gamma function.

By taking a scalar variable x as an example, the p.d.f. of the Student’s-t

Variational Bayesian Student’s-t mixture regression

Let X={x1,,xN}TRN×d and Y={y1,,yN}TRN×1 be the input and output data matrices, respectively, where xn and yn are the nth sample for the input variables and output variable, respectively, d is the dimensionality of the input space and N is the size of the dataset. The basic thought of the SMM is that a complex non-Gaussian distribution can be approximated by the combination of a finite number of Student’s-t distributions. Therefore, the p.d.f. that is assumed to generate the data points {xn}

Case studies

In this section, two cases including a numerical example and an actual methanation furnace unit of the ammonia synthesis process are provided for evaluating the predictive performance of the proposed approach. In addition, the performance of PLS, GMR, and VBGMR are provided as benchmarks.

For quantitative evaluation of the predictive accuracies of these four approaches, the root mean square error (RMSE) is adopted, given byRMSE=i=1Ntst(yiy^i)2/Ntstwhere yi and y^i denote the real and predicted

Conclusions

In this paper, for developing industrial inferential sensor to estimate the quality variable in real-time, the variational Bayesian Student’s-t mixture regression (VBSMR) has been put forward to overcome the shortcoming of traditional Gaussian mixture regression (GMR), i.e., being sensitive to outliers. Our theoretical contributions can be twofold: (1) the proposal of the VBSMR, which enables robust regression under the FMM framework; (2) the development of an efficient learning algorithm for

Declaration of Competing Interest

The authors declare no conflict of interest.

Acknowledgment

This work was supported by the National Natural Science Foundation of China (Grant no. 61703367) and the China Postdoctoral Science Foundation (Grant nos. 2017M621929 and 2019T120516).

Jingbo Wang received the B.Eng. degree from Liangxin College, China Jiliang University, Hangzhou, China, in 2017. He is currently a master degree candidate with the State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, China. His current research interests include industrial process soft sensor modeling, Bayesian methods with application to classification and regression tasks.

References (44)

  • C. Mei et al.

    Dynamic soft sensor development based gaussian mixture regression for fermentation processes

    Chin. J. Chem. Eng.

    (2017)
  • W. Shao et al.

    Quality variable prediction for chemical processes based on semisupervised Dirichlet process mixture of Gaussians

    Chem. Eng. Sci.

    (2019)
  • S. Chatzis et al.

    Robust fuzzy clustering using mixtures of Student’s-t distributions

    Pattern Recognit. Lett.

    (2008)
  • M. Svensén et al.

    Robust Bayesian mixture modelling

    Neurocomputing

    (2005)
  • S. Khatibisepehr et al.

    Design of inferential sensors in the process industry: a review of Bayesian methods

    J. Process Control

    (2013)
  • X. Wei et al.

    The infinite Student’s t-factor mixture analyzer for robust clustering and classification

    Pattern Recognit.

    (2012)
  • Z. Ge et al.

    Data mining and analytics in the process industry: the role of machine learning

    IEEE Access

    (2017)
  • L. Fortuna et al.

    Soft Sensors for Monitoring and Control of Industrial Processes

    (2007)
  • Y. Guo et al.

    A new proxy measurement algorithm with application to the estimation of vertical ground reaction forces using wearable sensors

    Sensors

    (2017)
  • Z. Ge

    Process data analytics via probabilistic latent variable models: a tutorial review

    Ind. Eng. Chem. Res.

    (2018)
  • Z. Ge et al.

    Semisupervised Bayesian method for soft sensor modeling with unlabeled data samples

    AIChE J.

    (2011)
  • H. Kaneko et al.

    Database monitoring index for adaptive soft sensors and the application to industrial process

    AIChE J.

    (2014)
  • Cited by (12)

    • Multi-mode industrial soft sensor method based on mixture Laplace variational auto-encoder

      2024, Measurement: Journal of the International Measurement Confederation
    • A robust hybrid predictive model of mixed oil length with deep integration of mechanism and data

      2021, Journal of Pipeline Science and Engineering
      Citation Excerpt :

      Notwithstanding the aforementioned advancements of the above hybrid model, risk metrics of outliers and robustness of predictive model of mixed oil have rarely been discussed, and we find that Chen's model is quite vulnerable to the presence of outliers which causes the great biased parameter estimate. Outliers are measurements that deviate apparently from the statistical ranges of historical data (Khatibisepehr et al., 2013) and exist widely in industrial processes attributed to inaccurate measurement or record (Shao et al., 2020; Wang et al., 2019). Generally, there are two types of outliers including conspicuous outliers and in-distinctive outliers.

    • Nonlinear variational Bayesian Student's-t mixture regression and inferential sensor application with semisupervised data

      2021, Journal of Process Control
      Citation Excerpt :

      Apart from that, model selection (i.e., the determination of optimal component number) can be automatically completed with the Dirichlet prior on the mixing coefficients [28]. Based on VBSMM, variational Bayesian Student’s-t mixture regression (VBSMR) [29] was developed to estimate those difficult-to-measure quality variables. For compensating for the insufficiency of labeled data, semisupervised variational Bayesian Student’s-t mixture regression (SSVBSMR) [30] was developed for exploiting both labeled and unlabeled data.

    • Extended Gaussian mixture regression for forward and inverse analysis

      2021, Chemometrics and Intelligent Laboratory Systems
      Citation Excerpt :

      An expectation–maximization (EM) algorithm [12] is a common method of estimating the parameters of Gaussian mixture models (GMMs) [3] in GMR, or the parameters can be stably estimated by setting a prior distribution for each parameter using the variational Bayesian (VB) method [13]. The GMM parameters obtained with VB have been applied to GMR for robot learning [14], and VB-based GMR has also been applied to regression models for estimating product quality in an industrial plant [15]. Whether GMR is used to predict Y from X (regression or forward analysis) or to predict X from Y (inverse analysis), the predictive ability of GMR is important.

    • A Bayesian bias updating procedure for automatic adaptation of soft sensors

      2021, Computers and Chemical Engineering
      Citation Excerpt :

      Some SS based on Bayesian networks have special abilities to estimate variables under missing data conditions (Deng et al., 2013; Gonzalez et al., 2011); while other applications include semi-supervised learning strategies (Shang et al., 2014), and vector regression methods (Zhiqiang and Zhihuan, 2009). For complex chemical processes, several sub-models can be combined to characterize different operation states (Wang et al., 2019); and Bayesian algorithms have been used to develop online calibration and adaptation of the SS, in order to rearrange the overlapping modes of the model (Khatibisepehr et al., 2012), or to dynamically determine the share of each sub-model in the global estimation (Shao and Tian, 2015). Other works have used Bayesian approaches to deal with process nonlinearities (Yang et al., 2016).

    View all citing articles on Scopus

    Jingbo Wang received the B.Eng. degree from Liangxin College, China Jiliang University, Hangzhou, China, in 2017. He is currently a master degree candidate with the State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, China. His current research interests include industrial process soft sensor modeling, Bayesian methods with application to classification and regression tasks.

    Weiming Shao received his B.Eng. and Ph.D. degrees both from the College of Information and Control Engineering, China University of Petroleum, Qingdao, China, in 2009 and 2016, respectively. He was a Visiting Research Associate with the Department of Electrical Engineering in the Petroleum Institute, Abu Dhabi, UAE, from Nov. 2014 to Nov. 2015. He is currently a Postdoctoral Research Fellow with the State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, China. His research interests include machine learning and statistical learning methods and their applications to semisupervised, robust and adaptive soft sensor development.

    Zhihuan Song received the B.Eng. and M.Eng. degrees in industrial automation from Hefei University of Technology, Ahhui, China, in 1983 and 1986, respectively, and the Ph.D. degree in industrial automation from Zhejiang University, Hangzhou, China, in 1997. Since 1997, he has been in the Department of Control Science and Engineering, Zhejiang University, where he was first a Postdoctoral Research Fellow, then an Associate Professor, and is currently a Professor. He has published more than 200 papers in journals and conference proceedings. His research interests include the modeling and fault diagnosis of industrial processes, analytics and applications of industrial big data, and advanced process control technologies.

    View full text