Robust relevance vector machine for classification with variational inference

Hwang, Sangheum; Jeong, Myong K.

doi:10.1007/s10479-015-1890-9

Robust relevance vector machine for classification with variational inference

Data Mining and Analytics
Published: 30 August 2015

Volume 263, pages 21–43, (2018)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Sangheum Hwang¹ &
Myong K. Jeong²

615 Accesses
5 Citations
Explore all metrics

Abstract

The relevance vector machine (RVM) is a widely employed statistical method for classification, which provides probability outputs and a sparse solution. However, the RVM can be very sensitive to outliers far from the decision boundary which discriminates between two classes. In this paper, we propose the robust RVM based on a weighting scheme, which is insensitive to outliers and simultaneously maintains the advantages of the original RVM. Given a prior distribution of weights, weight values are determined in a probabilistic way and computed automatically during training. Our theoretical result indicates that the influences of outliers are bounded through the probabilistic weights. Also, a guideline for determining hyperparameters governing a prior is discussed. The experimental results from synthetic and real data sets show that the proposed method performs consistently better than the RVM if a training data set is contaminated by outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Sparse Bayesian Regression with Variational Inference for Parameter Estimation

Simple Learning with a Teacher via Biased Regularized Least Squares

Gaussian Opposite Maps for Reduced-Set Relevance Vector Machines

Notes

It can be found at http://www.stats.ox.ac.uk/pub/PRNN/.
http://www.miketipping.com/.
This breast cancer databases was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg.
http://archive.ics.uci.edu/ml/.

References

An, L. T. H., & Tao, P. D. (1997). Solving a class of linearly constrained indefinite quadratic problems by D.C. algorithms. Journal of Global Optimization, 11(3), 253–285.
Article Google Scholar
Bishop, C. M., & Tipping, M. E. (2000), Variational relevance vector machine. In Proceedings of the 16th conference on uncertainty in artificial intelligence (pp. 46–53).
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.
Article Google Scholar
Caruana, R., & Niculescu-Mizil, A. (2004). Data mining in metric space: An empirical analysis of supervised learning performance criteria. In Proceedings of the 10th international conference on knowledge discovery and data mining (pp. 69–78).
Chang, C. C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27:21–27:27.
Article Google Scholar
Christmann, A., & Steinwart, I. (2004). On robustness properties of convex risk minimization methods for pattern recognition. Journal of Machine Learning Research, 5, 1007–1034.
Google Scholar
Debruyne, M., Serneels, S., & Verdonck, T. (2009). Robustified least squares support vector classification. Journal of Chemometrics, 23(9), 479–486.
Article Google Scholar
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
Google Scholar
Fang, Y., & Jeong, M. K. (2008). Robust probabilistic multivariate calibration model. Technometrics, 50, 305–316.
Article Google Scholar
Frank, A., & Asuncion, A. (2010). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science.
Google Scholar
Hwang, S., Yum, B., & Jeong, M. K. (2014). Robust relevance vector machine with variational inference for improving virtual metrology accuracy. IEEE Transaction on Semiconductor Manufacturing, 27, 1–12.
Article Google Scholar
Hwang, S., Kim, N., Jeong, M. K., & Yum, B. (2015). Robust kernel based regression with bounded influence for outliers. Journal of Operations Research Society (to appear).
Jaakkola, T. S. (2000). Tutorial on variational approximation methods. Technical Report, MIT Artificial Intelligence Lab.
Jaakkola, T. S., & Jordan, M. I. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing, 10(1), 25–37.
Article Google Scholar
Lee, K., Kim, N., & Jeong, M. K. (2014). A sparse signomial model for classification and regression. Annals of Operations Research, 216, 257–286.
Article Google Scholar
Lin, X. W., Wahba, G., Xiang, D., Gao, F. Y., Klein, R., & Klein, B. (2000). Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV. Annals of Statistics, 28(6), 1570–1600.
Article Google Scholar
Ling, C. X., Huang, J., & Zhang, H. (2003). AUC: A better measure than accuracy in comparing learning algorithms. In Proceedings of the 2003 Canadian artificial intelligence conference (pp. 329–341).
Ma, Z., & Leijon, A. (2011). Bayesian estimation of beta mixture models with variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11), 2160–2173.
Article Google Scholar
Mackay, D. J. C. (1992). The evidence framework applied to classification networks. Neural Computation, 4(5), 720–736.
Article Google Scholar
Neal, R. M. (1996). Bayesian learning for neural networks. New York: Springer.
Book Google Scholar
Ormerod, J. T., & Wand, M. P. (2010). Explaining variational approximations. The American Statistician, 64(2), 140–153.
Article Google Scholar
Park, S. Y., & Liu, Y. (2011). Robust penalized logistic regression with truncated loss functions. Canadian Journal of Statistics, 39(2), 300–323.
Article Google Scholar
Ratsch, G., Onoda, T., & Muller, K. R. (2001). Soft margins for AdaBoost. Machine Learning, 42(3), 287–320.
Article Google Scholar
Song, Q., Hu, W., & Xie, W. (2002). Robust support vector machine with bullet hole image classification. IEEE Transactions on Systems Man and Cybernetics Part C Applications and Reviews, 32(4), 440–448.
Article Google Scholar
Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.
Google Scholar
Wu, Y., & Liu, Y. (2007). Robust truncated hinge loss support vector machines. Journal of the American Statistical Association, 102(479), 974–983.
Article Google Scholar

Download references

Acknowledgments

The authors thank the anonymous reviewers and editors for their helpful and constructive comments that greatly contributed to improving the paper.

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea
Sangheum Hwang
RUTCOR (Rutgers Center for Operations Research), Rutgers, The State University of New Jersey, Piscataway, NJ, USA
Myong K. Jeong

Authors

Sangheum Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Myong K. Jeong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Myong K. Jeong.

Appendices

Appendix 1: Proof of Proposition 1

The weight value ${{\mathbb {E}}}(w)$ is computed as the mean of $Gamma\left( {w|\tilde{c},\tilde{d}}\right) $, that is

$$\begin{aligned} {{\mathbb {E}}}(w)=\frac{\tilde{c}}{\tilde{d}}=\frac{c}{d-\ln {{\mathbb {E}}}\left( {h\left( {{\varvec{\upbeta }},\xi } \right) }\right) }. \end{aligned}$$

Since the logistic loss function is equivalent to a negative log likelihood function, the weighted logistic loss function can be written as

$$\begin{aligned} {{\mathbb {E}}}(w)l\left\{ {\left( {2t-1}\right) {\varvec{\upbeta }}^{T}{\varvec{\phi }} (\mathbf{x})} \right\}= & {} -{{\mathbb {E}}}(w)\ln p\left( {t|{\varvec{\upbeta }}}\right) \le -{{\mathbb {E}}}(w)\ln {{\mathbb {E}}}\left( {h\left( {{\varvec{\upbeta }},\xi } \right) }\right) \\= & {} -\frac{c\ln {{\mathbb {E}}}\left( {h\left( {{\varvec{\upbeta }},\xi } \right) }\right) }{d-\ln {{\mathbb {E}}}\left( {h\left( {{\varvec{\upbeta }},\xi } \right) }\right) }\le c. \end{aligned}$$

Thus, the weighted logistic loss function is bounded by c.

Appendix 2: Proof of Proposition 2

Recall that $p\left( {t|{\varvec{\upbeta }}}\right) \ge h\left( {{\varvec{\upbeta }},\xi } \right) $. Taking the expectations on both sides with respect to ${\varvec{\upbeta }}$ yields the following result:

$$\begin{aligned} {{\mathbb {E}}}\left( {p\left( {t|{\varvec{\upbeta }}}\right) }\right) =p\left( {t|{\hat{\varvec{{\beta }}}}}\right) \ge {{\mathbb {E}}}\left( {h\left( {{\varvec{\upbeta }},\xi } \right) }\right) \end{aligned}$$

where ${\hat{\varvec{{\beta }}}}$ denotes the expectation of ${\varvec{\upbeta }}$. Since $0\le p\left( {t|{\hat{\varvec{{\beta }}}}}\right) \le 1$, it is always true that ${{\mathbb {E}}}\left( {h\left( {{\varvec{\upbeta }},\xi } \right) }\right) \le 1\Leftrightarrow \ln {{\mathbb {E}}}\left( {h\left( {{\varvec{\upbeta }},\xi } \right) }\right) \le 0$.

$(\Rightarrow )$ If $\ln {{\mathbb {E}}}\left( {h\left( {{\varvec{\upbeta }},\xi _i} \right) }\right) $ is 0, then the weight ${{\mathbb {E}}}(w)$ should be 1 since $\ln {{\mathbb {E}}}\left( {h\left( {{\varvec{\upbeta }},\xi _i} \right) }\right) $ approaches to 0 as $p\left( {t|{\hat{\varvec{{\beta }}}}}\right) $ goes to 1. Therefore, if

$$\begin{aligned} 0\le {{\mathbb {E}}}(w)=\frac{c}{d-\ln {{\mathbb {E}}}\left( {h\left( {{\varvec{\upbeta }},\xi } \right) }\right) }\le 1 \end{aligned}$$

then c should be equal to d.

$(\Leftarrow )$ If $c=d\equiv r$, then

$$\begin{aligned} 0\le {{\mathbb {E}}}(w)=\frac{r}{r-\ln {{\mathbb {E}}}\left( {h\left( {{\varvec{\upbeta }},\xi } \right) }\right) }\le 1 \end{aligned}$$

since $\ln {{\mathbb {E}}}\left( {h\left( {{\varvec{\upbeta }},\xi } \right) }\right) $ is always negative.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hwang, S., Jeong, M.K. Robust relevance vector machine for classification with variational inference. Ann Oper Res 263, 21–43 (2018). https://doi.org/10.1007/s10479-015-1890-9

Download citation

Published: 30 August 2015
Issue Date: April 2018
DOI: https://doi.org/10.1007/s10479-015-1890-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust relevance vector machine for classification with variational inference

Abstract

Access this article

Similar content being viewed by others

Adaptive Sparse Bayesian Regression with Variational Inference for Parameter Estimation

Simple Learning with a Teacher via Biased Regularized Least Squares

Gaussian Opposite Maps for Reduced-Set Relevance Vector Machines

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Proposition 1

Appendix 2: Proof of Proposition 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust relevance vector machine for classification with variational inference

Abstract

Access this article

Similar content being viewed by others

Adaptive Sparse Bayesian Regression with Variational Inference for Parameter Estimation

Simple Learning with a Teacher via Biased Regularized Least Squares

Gaussian Opposite Maps for Reduced-Set Relevance Vector Machines

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Proposition 1

Appendix 2: Proof of Proposition 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation