Variational Bayesian inference for interval regression with an asymmetric Laplace distribution
Introduction
Nowadays, it is well known that massive wealth of data is being generated day and night. The information contained in thesedata is often invaluable. Data-driven modeling methods are of great importance in various walks of real life, such as industrial applications, medical diagnosis, electronic commerce and financial analysis. However, we also have to deal with potential inevitable challenges in front of emerging opportunities. The real data are often contaminated by disturbances or noise, or affected by other uncertainties, such as the uncertainty of system structure and artificial intervention. Based on prior knowledge or the hypotheses about various uncertainties we can make, choosing an appropriate model is very important.
If the statistical properties of noise, such as the distribution or the finite statistical moment are given, regression models based on a probability hypothesis can not only give a point estimation but also provide the corresponding confidence intervals or estimated variance [11], [29], [30]. Under a little worse situation, we cannot determine the probability of disturbance or system uncertainty, but can collect just a little prior subjective knowledge, fuzzy models may help a great deal [17], [16], [34], [28], [19]. The most unfortunate situation is that we have no prior information about the uncertainty of the research object except for the upper and lower bounds of its indistinct description with some statistical technique or empirical observation and estimation, the interval data analysis methods are good choices. For example, the batch blanket and foam line location in large float glass furnace [35] cannot be described accurately by real number because they are two nearly parallel vague banded regions. In the semiconductor manufacturing, several wafers make up a Lot, which is treated as a group in the Chemical Mechanical Polishing (CMP) process, the thickness of the wafers in a Lot is estimated by measuring one representative which is randomly chosen from that Lot. This method introduces too much uncertainties. The polishing time for a Lot must be set manually according to multiple factors including the wafer type, the proposal Material Removal Rate (MRR) which characterizes the machine conditions and be given by standard experiment every 12 h, the pre and post-polish thickness of the wafers in the Lot waiting for processing, and the expected Material Removal Depth. Establishing the prediction model of polishing time with above inputs is important in operational optimization. However, it is not wise to apply regression model with real output to predict polishing time due to the measurement method for wafer thickness introduces much uncertainty. In the experiment section, we treat it as an interval regression problem. Sometimes, segregating a huge data set into subgroups can reduce the computational cost in modeling, then each subgroup can be described as an interval data by the minimum and maximum, the online-news dataset used in the final section is a good example.
Anyway, we should admit that there are many methods that can generate interval output, however, such methods require different assumptions. For example, fuzzy regression models [17], [20], [16], [34], [28] need fuzzy pre-processing for training data. The famous TSVR [27] was designed for real regression problems with asymmetric noise disturbance, even it can give two non-parallel upper and lower bounds. The models in this paper can be built with less prior knowledge compared to the above methods, and can solve as an identifier for a fuzzy model as what the possibility and necessity model does [20].
Interval regression is an important branch of symbolic data analysis problem [7]. The first linear model for interval data regression dates back to Tanaka and Ishibuchi [32], who represented the interval data as lower and upper bounds, or center and radius. This model was solved by linear programming to generate two types of regression model (possibility model or necessity model). Billard and Diday [6] built a linear regression model named the center model (CM) because it merely used the interval center to generate the linear coefficient. Later, Billard and Diday [8] proposed a MinMax method, which used two linear models to fit the upper and lower bounds. In order to fuse the information contained in center tendency and interval radius effectively, Lima Neto and De Carvalho [22] constructed two separate linear models for the interval center and radius. All the above interval regression models are linear and can be easily used to analyze statistical property, moreover, they also show the importance of fusing information contained in the upper and lower bounds.
However, most real data contain nonlinear information. That is why nonlinear regression models are more reasonable. Generalized linear models have been adopted in interval regression problems [26], and the resulting model was named BSMR. It was nonlinear if the nonlinear link functions, such as log, inverse link functions were adopted. However, the nonlinear link function in [26] was adapted to the characteristics of output noise, rather than approximating the underlining nonlinear map. Based on the above frameworks, Lima Neto and Dos Anjos [23] treated the upper and lower bounds as bivariate random vectors and used the copula functions to capture the output noise correlation. This model (named CIRM) can be trained with a non-iterative algorithm that takes little computation cost. However, it is difficult to choose proper initial parameters, and the training process is easy to collapse. The above methods merely focused on linear or generalized linear models, which have disadvantages in approximating complex nonlinear maps and dealing with correlation between upper and lower bounds.
To provide more practical method for nonlinear interval regression problems, kernel based models emerged in the past two decades. Fagundes et al. [12] constructed several kernel regression models for interval-input and interval-output problems. These methods focused on techniques for extending the ideas of [32], [6], [8] to nonlinear situations by local weighted strategy, which was extensively used in conventional kernel methods. Carrizosa et al. [10] used the Hausdorff distance to measure model forecasting precision and constructed an interval regression model under the SVR framework. It was trained by quadratic programming with 5*m (m is the size of the training dataset) constraints. Based on [33], Hong and Hwang [18] proposed an SVR model (named LqSVR) using quadratic loss, which introduced the central tendency in the optimization problem. Lingras and Butz [25] used modified ε-SVR to approximate the upper and lower bounds of interval data, which was called RSVR. Both [18] and [25] can find the conservative and aggressive model simultaneously. An et al. [4] extended method [33] for the interval-input and interval-output regression problem. Lingras and Butz [24] improved [4] with the conservative and aggressive ideal. In [8], the conservative model was called necessity model, and the aggressive model was called possibility model. The above kernel methods were the simple and direct extensions of classical kernel methods; they focused on approximation accuracy of the upper and lower bounds while ignoring the statistical analysis. Sometimes, analyzing system uncertainties is of great importance in real applications, and studying a nonlinear probability model for interval regression problems is necessary. Conservative and aggressive models were a weak step in this direction. The conservative model attempted to make the predicted upper bound less than the upper endpoint of the real interval and to make the predicted lower bound higher than the real lower endpoint. In contrast, the aggressive model ensured that the predicted upper bound was greater than the real upper endpoint while the predicted low bound was less than the real lower endpoint. Conservative and aggressive models can not only approximate the upper and lower bounds but also serve as a tool for identifying fuzzy or rough values.
In this paper, we intend to build a more flexible and practical model than the conservative and aggressive models for interval regression problems by combining the Bayesian inference and kernel methods. Bayesian method is used to handle various uncertainties and combine prior knowledge, while the kernel method is good at approximating complex nonlinear maps. The lower and upper bounds are represented as nonlinear kernel functions, moreover, the multi-task learning ideal is used to integrate the information carried by the upper and lower bounds. The output noise in each endpoints of interval data is assumed to follow an asymmetric Laplace distribution, which can be represented as a scale mixture of Gaussian distributions [21]. This model can give interval output with interval or real data as input, and handle various uncertainties by treating the mixing weights, the variance (real or matrix) of a Gaussian distribution and the coefficients of the regression functions as latent random variables. The approximate posterior distribution of these latent variables can be found by variational inference. We abbreviate this model as VBIKR in following sections.
The remainder of this paper is arranged as follows. In Section 2, the asymmetric Laplace distribution and related properties are reviewed. Then, VBIKR is proposed in Section 3. Numerical experiments and some real applications are given in Section 4; finally, the conclusion is presented in Section 5. The notation used in this paper is specified in Appendix A.1. Detailed derivations are given in Appendixes A.2–A.6
Section snippets
Asymmetric Laplace distribution and related properties
The univariate Laplace distribution is a continuous distribution with the probability density function
Where σ is the dispersion scale, which must be positive, and the location parameter μ must be real. The Laplace distribution is symmetrical around location μ and has a heavier tail than Gaussian distribution. In regression analysis, treating the output noise as a Laplace distribution will improve the model's robustness against outliers.
The univariate asymmetric Laplace
Bayesian nonparametric interval regression
In this section, we build a regression model with real or interval input and interval output. Assuming the training dataset can be represented as , where , and are the input for the lower and upper regression bounds, and need not have the same dimension or even be in the same domain, they are just a pair of inputs with the same index and can be used to construct a kernel matrix with certain kernel functions. However, must be an interval.
For
Numerical analysis for the properties of VBIKR
In this section, numerical experiments were used to verify that VBIKR can give the marginal quantile regression for lower and upper bounds and the estimated occurrence probability for certain events. The artificial data were generated by the following expressions:
It is not difficult to find the meanings of the parameters ; they can be set deliberately for different purposes. In the first experiment ,
Conclusions
We built a probability interval regression model based on an asymmetric Laplace distribution. It extended the Bayesian quantile regression method to interval regression problem and combined a multi-task learning strategy to handle the upper and lower (or center and radius) marginal quantile regression functions simultaneously. By adjusting the quantile parameters, the resulting model would be more flexible than previous conservative or aggressive models. It can not only afford the point
Junhai Zhang was born in 1981. He received the M.S. degree from the High and New Technology Research Institute of Xi'an, Xi'an, China, in 2007. He is currently pursuing the Ph.D. degree with the Department of Automation, Tsinghua University, Beijing. His current research interests include machine learning, complex industrial process modeling and control.
References (35)
- et al.
Interval kernel regression
Neurocomputing
(2014) - et al.
Testing linear independence in linear models with interval-valued data
Comput. Stat. Data Anal.
(2007) - et al.
Support vector fuzzy regression machines
Fuzzy Sets Syst.
(2003) - et al.
Centre and Range method for fitting a linear regression model to symbolic interval data
Comput. Stat. Data Anal.
(2008) - et al.
Conservative and aggressive rough SVR modeling
Theor. Comput. Sci.
(2011) - et al.
Rough support vector regression
Eur. J. Oper. Res.
(2010) TSVR: an efficient twin support vector machine for regression
Neural Netw. Off. J. Int. Neural Netw. Soc.
(2010)- minFunc:...
- S&P500:...
- et al.
Variational Inference for Nonparametric Bayesian Quantile Regression
(2015)
Support vector regression with interval-input interval-output
Int. J. Comput. Intell. Syst.
An introduction to MCMC for machine learning
Mach. Learn.
Regression analysis for interval-valued data
Symbolic Data Analysis: Conceptual Statistics and Data Mining
Symbolic Data Analysis: Conceptual Statistics and Data Mining
Symbolic Regression Analysis
Variational Inference: A Review for Statisticians[J]
J. Am. Stat. Assoc.
Kernel Support Vector Regression with Imprecise Output
Cited by (6)
A bivariate Bayesian method for interval-valued regression models
2022, Knowledge-Based SystemsCitation Excerpt :However, few works are devoted to constructing the Bayesian framework for interval-valued data. Zhang et al. [9] proposed the Bayesian nonparametric regression models by assuming that the upper and lower of the interval were distributed as an asymmetric Laplace distribution. The data generating model proposed by Zhang et al. [19] implied the relationships between interval-valued likelihood functions and Bayesian hierarchical methods.
The minimum covariance determinant estimator for interval-valued data
2024, Statistics and ComputingThe minimum covariance determinant estimatorfor interval-valued data
2023, Research SquareA Bayesian parametrized method for interval-valued regression models
2023, Statistics and ComputingA Variational Bayesian Multisource Data Fusion Method Based on L2 Second Order Penalty Combined with Entropy Empirical Pool
2023, Proceedings of the 35th Chinese Control and Decision Conference, CCDC 2023
Junhai Zhang was born in 1981. He received the M.S. degree from the High and New Technology Research Institute of Xi'an, Xi'an, China, in 2007. He is currently pursuing the Ph.D. degree with the Department of Automation, Tsinghua University, Beijing. His current research interests include machine learning, complex industrial process modeling and control.
Min Liu received the Ph.D. degree from Tsinghua University, Beijing, China, in 1999. He is currently a Professor with the Department of Automation, Tsinghua University, Associate Director of Automation Science and Technology Research Department of Tsinghua National Laboratory for Information Science and Technology, Director of Control and Optimization of Complex Industrial Process, Tsinghua University, Director of China National Committee for Terms in Automation Science and Technology, Director of Intelligent Optimization Committee of China Artificial Intelligence Association. His main research interests are in optimization scheduling of complex manufacturing process and intelligent operational optimization of complex manufacturing process or equipment. He led more than 20 important research projects including the project of the National 973 Program of China, the project of the National Science and Technology Major Project of China, the project of the National Science Fund for Distinguished Young Scholars of China, the project of the National 863 High-Tech Program of China, and so on. He has published more than 100 papers and a monograph supported by the National Defense Science and Technology Book Publishing Fund. He won the National Science and Technology Progress Award.
Mingyu Dong was born in 1978. He received the Ph.D. degree from Tsinghua University, Beijing, China, in 2006. His current research interests include modeling, intelligent scheduling and optimization of complex manufacturing systems.