Variational Bayesian inference for interval regression with an asymmetric Laplace distribution

doi:10.1016/j.neucom.2018.09.083

Neurocomputing

Volume 323, 5 January 2019, Pages 214-230

https://doi.org/10.1016/j.neucom.2018.09.083 Get rights and content

Highlights

•
Extending the Bayesian quantile regression to interval regression problem.
•
Combining the multitask nominal prior into Bayesian model to handle the endpoints of interval data as a whole.
•
Analyzing probability mass waste problem which appears in many interval models and giving a local correction strategy.
•
Giving the event occurrence probability prediction methods for Bayesian interval regression model.

Abstract

This paper proposes a Bayesian nonparametric interval regression model assuming the noise on the lower and upper bounds of interval data follows an asymmetric Laplace distribution. In order to address various uncertainties in real applications and make model training more convenient and efficient, the asymmetric Laplace distribution is represented as a scale mixture of Gaussian distribution, which is amenable to variational inference (VI for short) (Blei et al., 2016). Inspired by multi-task learning, the marginal quantile regression functions for lower and upper bounds relate each other by the correlation parameter of the coefficients in infinite feature space. This model can give point estimate of interval bounds or predicted occurrence probability for some events, which provides new potential for the problems which cannot be solved by previous interval regression models. In addition, the probability mass waste problem is analyzed and alleviated by a local correction strategy. Moreover, we modify the model to be fit for the interval data represented as center and radius. We verify above analysis results by numerical experiment on artificial and real dataset. Finally, the approximation ability of this model and that of the other existing methods are compared by applications on public data sets.

Introduction

Nowadays, it is well known that massive wealth of data is being generated day and night. The information contained in thesedata is often invaluable. Data-driven modeling methods are of great importance in various walks of real life, such as industrial applications, medical diagnosis, electronic commerce and financial analysis. However, we also have to deal with potential inevitable challenges in front of emerging opportunities. The real data are often contaminated by disturbances or noise, or affected by other uncertainties, such as the uncertainty of system structure and artificial intervention. Based on prior knowledge or the hypotheses about various uncertainties we can make, choosing an appropriate model is very important.

If the statistical properties of noise, such as the distribution or the finite statistical moment are given, regression models based on a probability hypothesis can not only give a point estimation but also provide the corresponding confidence intervals or estimated variance [11], [29], [30]. Under a little worse situation, we cannot determine the probability of disturbance or system uncertainty, but can collect just a little prior subjective knowledge, fuzzy models may help a great deal [17], [16], [34], [28], [19]. The most unfortunate situation is that we have no prior information about the uncertainty of the research object except for the upper and lower bounds of its indistinct description with some statistical technique or empirical observation and estimation, the interval data analysis methods are good choices. For example, the batch blanket and foam line location in large float glass furnace [35] cannot be described accurately by real number because they are two nearly parallel vague banded regions. In the semiconductor manufacturing, several wafers make up a Lot, which is treated as a group in the Chemical Mechanical Polishing (CMP) process, the thickness of the wafers in a Lot is estimated by measuring one representative which is randomly chosen from that Lot. This method introduces too much uncertainties. The polishing time for a Lot must be set manually according to multiple factors including the wafer type, the proposal Material Removal Rate (MRR) which characterizes the machine conditions and be given by standard experiment every 12 h, the pre and post-polish thickness of the wafers in the Lot waiting for processing, and the expected Material Removal Depth. Establishing the prediction model of polishing time with above inputs is important in operational optimization. However, it is not wise to apply regression model with real output to predict polishing time due to the measurement method for wafer thickness introduces much uncertainty. In the experiment section, we treat it as an interval regression problem. Sometimes, segregating a huge data set into subgroups can reduce the computational cost in modeling, then each subgroup can be described as an interval data by the minimum and maximum, the online-news dataset used in the final section is a good example.

Anyway, we should admit that there are many methods that can generate interval output, however, such methods require different assumptions. For example, fuzzy regression models [17], [20], [16], [34], [28] need fuzzy pre-processing for training data. The famous TSVR [27] was designed for real regression problems with asymmetric noise disturbance, even it can give two non-parallel upper and lower bounds. The models in this paper can be built with less prior knowledge compared to the above methods, and can solve as an identifier for a fuzzy model as what the possibility and necessity model does [20].

Interval regression is an important branch of symbolic data analysis problem [7]. The first linear model for interval data regression dates back to Tanaka and Ishibuchi [32], who represented the interval data as lower and upper bounds, or center and radius. This model was solved by linear programming to generate two types of regression model (possibility model or necessity model). Billard and Diday [6] built a linear regression model named the center model (CM) because it merely used the interval center to generate the linear coefficient. Later, Billard and Diday [8] proposed a MinMax method, which used two linear models to fit the upper and lower bounds. In order to fuse the information contained in center tendency and interval radius effectively, Lima Neto and De Carvalho [22] constructed two separate linear models for the interval center and radius. All the above interval regression models are linear and can be easily used to analyze statistical property, moreover, they also show the importance of fusing information contained in the upper and lower bounds.

However, most real data contain nonlinear information. That is why nonlinear regression models are more reasonable. Generalized linear models have been adopted in interval regression problems [26], and the resulting model was named BSMR. It was nonlinear if the nonlinear link functions, such as log, inverse link functions were adopted. However, the nonlinear link function in [26] was adapted to the characteristics of output noise, rather than approximating the underlining nonlinear map. Based on the above frameworks, Lima Neto and Dos Anjos [23] treated the upper and lower bounds as bivariate random vectors and used the copula functions to capture the output noise correlation. This model (named CIRM) can be trained with a non-iterative algorithm that takes little computation cost. However, it is difficult to choose proper initial parameters, and the training process is easy to collapse. The above methods merely focused on linear or generalized linear models, which have disadvantages in approximating complex nonlinear maps and dealing with correlation between upper and lower bounds.

To provide more practical method for nonlinear interval regression problems, kernel based models emerged in the past two decades. Fagundes et al. [12] constructed several kernel regression models for interval-input and interval-output problems. These methods focused on techniques for extending the ideas of [32], [6], [8] to nonlinear situations by local weighted strategy, which was extensively used in conventional kernel methods. Carrizosa et al. [10] used the Hausdorff distance to measure model forecasting precision and constructed an interval regression model under the SVR framework. It was trained by quadratic programming with 5*m (m is the size of the training dataset) constraints. Based on [33], Hong and Hwang [18] proposed an SVR model (named LqSVR) using quadratic loss, which introduced the central tendency in the optimization problem. Lingras and Butz [25] used modified ε-SVR to approximate the upper and lower bounds of interval data, which was called RSVR. Both [18] and [25] can find the conservative and aggressive model simultaneously. An et al. [4] extended method [33] for the interval-input and interval-output regression problem. Lingras and Butz [24] improved [4] with the conservative and aggressive ideal. In [8], the conservative model was called necessity model, and the aggressive model was called possibility model. The above kernel methods were the simple and direct extensions of classical kernel methods; they focused on approximation accuracy of the upper and lower bounds while ignoring the statistical analysis. Sometimes, analyzing system uncertainties is of great importance in real applications, and studying a nonlinear probability model for interval regression problems is necessary. Conservative and aggressive models were a weak step in this direction. The conservative model attempted to make the predicted upper bound less than the upper endpoint of the real interval and to make the predicted lower bound higher than the real lower endpoint. In contrast, the aggressive model ensured that the predicted upper bound was greater than the real upper endpoint while the predicted low bound was less than the real lower endpoint. Conservative and aggressive models can not only approximate the upper and lower bounds but also serve as a tool for identifying fuzzy or rough values.

In this paper, we intend to build a more flexible and practical model than the conservative and aggressive models for interval regression problems by combining the Bayesian inference and kernel methods. Bayesian method is used to handle various uncertainties and combine prior knowledge, while the kernel method is good at approximating complex nonlinear maps. The lower and upper bounds are represented as nonlinear kernel functions, moreover, the multi-task learning ideal is used to integrate the information carried by the upper and lower bounds. The output noise in each endpoints of interval data is assumed to follow an asymmetric Laplace distribution, which can be represented as a scale mixture of Gaussian distributions [21]. This model can give interval output with interval or real data as input, and handle various uncertainties by treating the mixing weights, the variance (real or matrix) of a Gaussian distribution and the coefficients of the regression functions as latent random variables. The approximate posterior distribution of these latent variables can be found by variational inference. We abbreviate this model as VBIKR in following sections.

The remainder of this paper is arranged as follows. In Section 2, the asymmetric Laplace distribution and related properties are reviewed. Then, VBIKR is proposed in Section 3. Numerical experiments and some real applications are given in Section 4; finally, the conclusion is presented in Section 5. The notation used in this paper is specified in Appendix A.1. Detailed derivations are given in Appendixes A.2–A.6

Section snippets

Asymmetric Laplace distribution and related properties

The univariate Laplace distribution is a continuous distribution with the probability density function $\begin{matrix} p (y | μ, σ) = \frac{1}{2 σ} \exp (- \frac{| y - μ |}{2 σ}) \end{matrix}$

Where σ is the dispersion scale, which must be positive, and the location parameter μ must be real. The Laplace distribution is symmetrical around location μ and has a heavier tail than Gaussian distribution. In regression analysis, treating the output noise as a Laplace distribution will improve the model's robustness against outliers.

The univariate asymmetric Laplace

Bayesian nonparametric interval regression

In this section, we build a regression model with real or interval input and interval output. Assuming the training dataset can be represented as $D = {x_{n}, y_{n}}_{n = 1}^{N}$ , where $x_{n} = [x_{n}^{l}, x_{n}^{u}]$ , $x_{n}^{l}$ and $x_{n}^{u}$ are the input for the lower and upper regression bounds, $x_{n}^{l}$ and $x_{n}^{u}$ need not have the same dimension or even be in the same domain, they are just a pair of inputs with the same index and can be used to construct a kernel matrix with certain kernel functions. However, $y_{n} = [y_{n}^{l}, y_{n}^{u}]$ must be an interval.

For

Numerical analysis for the properties of VBIKR

In this section, numerical experiments were used to verify that VBIKR can give the marginal quantile regression for lower and upper bounds and the estimated occurrence probability for certain events. The artificial data were generated by the following expressions: $\begin{matrix} y_{l} = B \cdot \exp (- x_{1}) \cos (2 π x_{1}) + 0.4 - | A \cdot \cos (π x_{2}) | - D + e_{l} \end{matrix}$ $\begin{matrix} y_{u} = B \cdot \exp (- x_{1}) \cos (2 π x_{1}) + | C \cdot \cos (π x_{2}) | + D + e_{u} \end{matrix}$

It is not difficult to find the meanings of the parameters $A - D$ ; they can be set deliberately for different purposes. In the first experiment $A = 0.3$ , $B =$

Conclusions

We built a probability interval regression model based on an asymmetric Laplace distribution. It extended the Bayesian quantile regression method to interval regression problem and combined a multi-task learning strategy to handle the upper and lower (or center and radius) marginal quantile regression functions simultaneously. By adjusting the quantile parameters, the resulting model would be more flexible than previous conservative or aggressive models. It can not only afford the point

Junhai Zhang was born in 1981. He received the M.S. degree from the High and New Technology Research Institute of Xi'an, Xi'an, China, in 2007. He is currently pursuing the Ph.D. degree with the Department of Automation, Tsinghua University, Beijing. His current research interests include machine learning, complex industrial process modeling and control.

References (35)

R.A.A. Fagundes et al.
Interval kernel regression
Neurocomputing
(2014)
M.Á. Gil et al.
Testing linear independence in linear models with interval-valued data
Comput. Stat. Data Anal.
(2007)
HongD.H. et al.
Support vector fuzzy regression machines
Fuzzy Sets Syst.
(2003)
E.D.A. Lima Neto et al.
Centre and Range method for fitting a linear regression model to symbolic interval data
Comput. Stat. Data Anal.
(2008)
P. Lingras et al.
Conservative and aggressive rough SVR modeling
Theor. Comput. Sci.
(2011)
P. Lingras et al.
Rough support vector regression
Eur. J. Oper. Res.
(2010)
PengX.
TSVR: an efficient twin support vector machine for regression
Neural Netw. Off. J. Int. Neural Netw. Soc.
(2010)
minFunc:...
S&P500:...
S. Abeywardana et al.
Variational Inference for Nonparametric Bayesian Quantile Regression
(2015)

W. An et al.

Support vector regression with interval-input interval-output

Int. J. Comput. Intell. Syst.

(2008)

C. Andrieu et al.

An introduction to MCMC for machine learning

Mach. Learn.

(2003)

L. Billard et al.

Regression analysis for interval-valued data

L. Billard et al.

Symbolic Data Analysis: Conceptual Statistics and Data Mining

(2007)

L. Billard et al.

Symbolic Regression Analysis

(2002)

D.M. Blei et al.

Variational Inference: A Review for Statisticians[J]

J. Am. Stat. Assoc.

(2016)

E. Carrizosa et al.

Kernel Support Vector Regression with Imprecise Output

(2008)

Cited by (6)

Linear asymmetric Laplace fuzzy information granule and its application in short-to-medium term prediction for financial time series
2024, Information Sciences
Gaussian fuzzy information granule (GFIG) and its linear form provide a novel perspective for time series modeling. However, numerous studies show that asymmetric Laplace distribution has unique advantages over the Gaussian one, which motivate us to construct asymmetric Laplace distribution based fuzzy information granules and pursue their properties and applications. Three main contributions are made in this paper. First, asymmetric Laplace fuzzy information granule (ALFIG) on one-dimensional fuzzy number space is proposed, then its linear operations and distance metric are discussed. Second, membership-weighted kernel line is proposed to construct linear-ALFIG for extracting plenty of trend information contained within time series specially. Third, a linear-ALFIG based Long Short-Term Memory model (A-LSTM) is proposed for short-to-medium term prediction of financial time series. Experiment results show that: (i) Fitting errors of linear-ALFIG are significantly lower than that of linear-GFIG for datasets that exactly obey asymmetric Laplace distribution; (ii) A-LSTM has statistically absolute advantages in short-to-medium term prediction under significance level $α = 0.1$ (in most cases $α = 0.05$ in fact), which not only predicts direction, amplitude and changepoints of future trends, but also delivers comprehensive, transparent and user-oriented results.
A bivariate Bayesian method for interval-valued regression models
2022, Knowledge-Based Systems
Citation Excerpt :
However, few works are devoted to constructing the Bayesian framework for interval-valued data. Zhang et al. [9] proposed the Bayesian nonparametric regression models by assuming that the upper and lower of the interval were distributed as an asymmetric Laplace distribution. The data generating model proposed by Zhang et al. [19] implied the relationships between interval-valued likelihood functions and Bayesian hierarchical methods.
As typical symbolic data, interval-valued data offer a useful tool to handle massive datasets. There has been a lot of literature focusing on researching regression models for interval-valued data based on the center and range method (CRM). However, few works are devoted to exploring Bayesian methods for interval-valued data. In this paper, we extend CRM for interval-valued regression models to the Bayesian framework for the first time. We propose a bivariate Bayesian regression model based on CRM with a known and an unknown covariance matrices, respectively. The experimental results of synthetic and real datasets show that, in contrast with classical models, the proposed Bayesian model has advantages on forecasting performances.
The minimum covariance determinant estimator for interval-valued data
2024, Statistics and Computing
The minimum covariance determinant estimatorfor interval-valued data
2023, Research Square
A Bayesian parametrized method for interval-valued regression models
2023, Statistics and Computing
A Variational Bayesian Multisource Data Fusion Method Based on L2 Second Order Penalty Combined with Entropy Empirical Pool
2023, Proceedings of the 35th Chinese Control and Decision Conference, CCDC 2023

Min Liu received the Ph.D. degree from Tsinghua University, Beijing, China, in 1999. He is currently a Professor with the Department of Automation, Tsinghua University, Associate Director of Automation Science and Technology Research Department of Tsinghua National Laboratory for Information Science and Technology, Director of Control and Optimization of Complex Industrial Process, Tsinghua University, Director of China National Committee for Terms in Automation Science and Technology, Director of Intelligent Optimization Committee of China Artificial Intelligence Association. His main research interests are in optimization scheduling of complex manufacturing process and intelligent operational optimization of complex manufacturing process or equipment. He led more than 20 important research projects including the project of the National 973 Program of China, the project of the National Science and Technology Major Project of China, the project of the National Science Fund for Distinguished Young Scholars of China, the project of the National 863 High-Tech Program of China, and so on. He has published more than 100 papers and a monograph supported by the National Defense Science and Technology Book Publishing Fund. He won the National Science and Technology Progress Award.

Mingyu Dong was born in 1978. He received the Ph.D. degree from Tsinghua University, Beijing, China, in 2006. His current research interests include modeling, intelligent scheduling and optimization of complex manufacturing systems.

View full text

Variational Bayesian inference for interval regression with an asymmetric Laplace distribution

Highlights

Abstract

Introduction

Section snippets

Asymmetric Laplace distribution and related properties

Bayesian nonparametric interval regression

Numerical analysis for the properties of VBIKR

Conclusions

Neurocomputing

Comput. Stat. Data Anal.

Fuzzy Sets Syst.

Comput. Stat. Data Anal.

Theor. Comput. Sci.

Eur. J. Oper. Res.

Neural Netw. Off. J. Int. Neural Netw. Soc.

Variational Inference for Nonparametric Bayesian Quantile Regression

Support vector regression with interval-input interval-output

Int. J. Comput. Intell. Syst.

An introduction to MCMC for machine learning

Mach. Learn.

Regression analysis for interval-valued data

Symbolic Data Analysis: Conceptual Statistics and Data Mining

Symbolic Data Analysis: Conceptual Statistics and Data Mining

Symbolic Regression Analysis

Variational Inference: A Review for Statisticians[J]

J. Am. Stat. Assoc.

Kernel Support Vector Regression with Imprecise Output