Support vector regression for polyhedral and missing data

Gazzola, Gianluca; Jeong, Myong K.

doi:10.1007/s10479-020-03799-y

Support vector regression for polyhedral and missing data

S.I.: Data Mining and Decision Analytics
Published: 07 October 2020

Volume 303, pages 483–506, (2021)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

248 Accesses
1 Citation
Explore all metrics

Abstract

We introduce “Polyhedral Support Vector Regression” (PSVR), a regression model for data represented by arbitrary convex polyhedral sets. PSVR is derived as a generalization of support vector regression, in which the data is represented by individual points along input variables $X_1$, $X_2$, $\ldots $, $X_p$ and output variable Y, and extends a support vector classification model previously introduced for polyhedral data. PSVR is in essence a robust-optimization model, which defines prediction error as the largest deviation, calculated along Y, between an interpolating hyperplane and all points within a convex polyhedron; the model relies on the affine Farkas’ lemma to make this definition computationally tractable within the formulation. As an application, we consider the problem of regression with missing data, where we use convex polyhedra to model the multivariate uncertainty involving the unobserved values in a data set. For this purpose, we discuss a novel technique that builds on multiple imputation and principal component analysis to estimate convex polyhedra from missing data, and on a geometric characterization of such polyhedra to define observation-specific hyper-parameters in the PSVR model. We show that an appropriate calibration of such hyper-parameters can have a significantly beneficial impact on the model’s performance. Experiments on both synthetic and real-world data illustrate how PSVR performs competitively or better than other benchmark methods, especially on data sets with high degree of missingness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linear support vector regression with linear constraints

Article 24 June 2021

Sparse classification: a scalable discrete optimization perspective

Article 02 November 2021

Bayesian Nonlinear Support Vector Machines for Big Data

References

Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459.
Article Google Scholar
Ben-Tal, A., & Nemirovski, A. (2002). Robust optimization-methodology and applications. Mathematical Programming, 92(3), 453–480.
Article Google Scholar
Bertsimas, D., Brown, D. B., & Caramanis, C. (2011). Theory and applications of robust optimization. SIAM Review, 53(3), 464–501.
Article Google Scholar
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
Book Google Scholar
Breiman, L., & Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 80(391), 580–598.
Article Google Scholar
Buuren, S. V., & Groothuis-Oudshoorn, K. (2010). Mice: Multivariate imputation by chained equations in $R$. Journal of Statistical Software, 45(3), 1–68.
Google Scholar
Carrizosa, E., & Gordillo, J. (2008). Kernel support vector regression with imprecise output. Tech. Rep., Dept. MOSI, Vrije Univ. Brussel, Belgium.
Carrizosa, E., Gordillo, J., & Plastria, F. (2007). Support vector regression for imprecise data. Tech. Rep., Dept. MOSI, Vrije Univ. Brussel, Belgium.
Chang, C. C., & Lin, C. J. (2002). Training nu-support vector regression: Theory and algorithms. Neural Computation, 14(8), 1959–1978.
Article Google Scholar
Chen-Chia, C., Shun-Feng, S., Jin-Tsong, J., & Chih-Ching, H. (2002). Robust support vector regression networks for function approximation with outliers. IEEE Transactions on Neural Networks, 13(6), 1322–1330.
Article Google Scholar
Dimitrov, D., Knauer, C., Kriegel, K., & Rote, G. (2006). On the bounding boxes obtained by principal component analysis. In: Pages 193–196 of: 22nd European workshop on computational geometry.
Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A. J., & Vapnik, V. N. (1997). Support vector regression machines. Advances in Neural Information Processing Systems, 9, 155–161.
Google Scholar
Dua, D., & Graff, C. (2020). UCI machine learning repository. Irvine: School of Information and Computer Sciences, University of California. http://archive.ics.uci.edu/ml.
Fan, N., Sadeghi, E., & Pardalos, P. M. (2014). Robust support vector machines with polyhedral uncertainty of the input data. In P. Pardalos, M. Resende, C. Vogiatzis, & J. Walteros (Eds.), Pages 291–305 of: Learning and intelligent optimization. LION 2014. Lecture notes in computer science (Vol. 8426). Cham: Springer.
Google Scholar
Golub, G. H., & Van Loan, C. F. (2012). Matrix computations. Baltimore, London: JHU Press.
Google Scholar
Harrison, D, Jr., & Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1), 81–102.
Article Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. Data mining, inference, and prediction. New York: Springer.
Google Scholar
Hong, D. H., & Hwang, C. (2003). Support vector fuzzy regression machines. Fuzzy Sets and Systems, 138(2), 271–281.
Article Google Scholar
Hong, D. H., & Hwang, C. (2004). Extended fuzzy regression models using regularization method. Information Sciences, 164(1–4), 31–46.
Article Google Scholar
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417–441.
Article Google Scholar
Huang, G., Song, S., Wu, C., & You, K. (2012). Robust support vector regression for uncertain input and output data. IEEE Transactions on Neural Networks and Learning Systems, 23(11), 1690–1700.
Article Google Scholar
Hwang, S., Kim, D., Jeong, M. K., & Yum, B.-J. (2015). Robust kernel-based regression with bounded influence for outliers. Journal of the Operational Research Society, 66(8), 1385–1398.
Article Google Scholar
Jolliffe, I. T. (2002). Principal component analysis (2nd ed.). New York: Spinger.
Google Scholar
Kim, D., Lee, C., Hwang, S., & Jeong, M. K. (2016). A robust support vector regression with a linear-log concave loss function. Journal of the Operational Research Society, 67(5), 735–742.
Article Google Scholar
Kim, J.-H. (2009). Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis, 53(11), 3735–3745.
Article Google Scholar
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Pages 1137–1145 of: International joint conference on artificial intelligence (Vol. 2).
Lee, K., Kim, N., & Jeong, M. K. (2014). The sparse signomial classification and regression model. Annals of Operations Research, 216(1), 257–286.
Article Google Scholar
Lima, C. A. M., Coelho, A. L. V., & Von Zuben, F. J. (2002). Ensembles of support vector machines for regression problems. In: Pages 2381–2386 of: Proceedings of the 2002 international joint conference on neural networks IJCNN’02 (Vol. 3).
Little, R. (1988). Missing-data adjustments in large surveys. Journal of Business and Economic Statistics, 6(3), 287–296.
Google Scholar
Little, R. J. A., & Rubin, D. B. (2014). Statistical analysis with missing data. New York: Wiley.
Google Scholar
Mangasarian, O., Shavlik, J., & Wild, E. (2004). Knowledge-based kernel approximation. The Journal of Machine Learning Research, 5, 1127–1141.
Google Scholar
Mangasarian, O., & Wild, E. (2007). Nonlinear knowledge in kernel approximation. IEEE Transactions on Neural Networks, 18(1), 300–306.
Article Google Scholar
Martín-Guerrero, J. D., Camps-Valls, G., Soria-Olivas, E., Serrano-López, A. J., Pérez-Ruixo, J. J., & Jiménez-Torres, N. V. (2003). Dosage individualization of erythropoietin using a profile-dependent support vector regression. IEEE Transactions on Biomedical Engineering, 50(10), 1136–1142.
Article Google Scholar
Myasnikova, E., Samsonova, A., Samsonova, M., & Reinitz, J. (2002). Support vector regression applied to the determination of the developmental age of a Drosophila embryo from its segmentation gene expression patterns. Bioinformatics, 18(s1), 87–95.
Article Google Scholar
Panagopoulos, O. P., Xanthopoulos, P., Razzaghi, T., & Şeref, O. (2018). Relaxed support vector regression. Annals of Operations Research, 276(1–2), 191–210.
Google Scholar
Park, J. I., Kim, N., Jeong, M. K., & Shin, K. S. (2013). Multiphase support vector regression for function approximation with break-points. Journal of the Operational Research Society, 64(5), 775–785.
Article Google Scholar
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2, 559–572.
Google Scholar
Raghunathan, T. E., Lepkowski, J. M., & Van Hoewyk, J. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27(1), 85–95.
Google Scholar
Rätsch, G., Demiriz, A., & Bennett, K. P. (2002). Sparse regression ensembles in infinite and finite hypothesis spaces. Machine Learning, 48(1–3), 189–218.
Article Google Scholar
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Book Google Scholar
Rubin, D. B. (1996). Multiple imputation after 18$+$ years. Journal of the American Statistical Association, 91(434), 473–518.
Article Google Scholar
Schenker, N., & Taylor, J. M. G. (1996). Partially parametric techniques for multiple imputation. Computational Statistics and Data Analysis, 22(4), 425–446.
Article Google Scholar
Shivaswamy, P., Bhattacharyya, C., & Smola, A. (2006). Second order cone programming approaches for handling missing and uncertain data. The Journal of Machine Learning Research, 7, 1283–1314.
Google Scholar
Smola, A. J. (1996). Regression estimation with support vector learning machines. Ph.D. thesis, Master’s thesis, Technische Universität München.
Smola, A. J., & Scholkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222.
Article Google Scholar
Trafalis, T. B., & Alwazzi, S. A. (2007). Support vector regression with noisy data: A second order cone programming approach. International Journal of General Systems, 36(2), 237–250.
Article Google Scholar
Trafalis, T. B., & Gilbert, R. C. (2006). Robust classification and regression using support vector machines. European Journal of Operational Research, 173(3), 893–909.
Article Google Scholar
Van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16(3), 219–242.
Article Google Scholar
Van Buuren, S., Boshuizen, H. C., & Knook, D. L. (1999). Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine, 18(6), 681–694.
Article Google Scholar
Van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049–1064.
Article Google Scholar
Vapnik, V. N. (1995). The Nature of statistical learning. New York: Springer.
Book Google Scholar
Wang, Y. M., Schultz, R. T., Constable, R. T., & Staib, L. H. (2003). Nonlinear estimation and modeling of fMRI data using spatio-temporal support vector regression. In: Pages 647–659 of: Biennial international conference on information processing in medical imaging.
Wu, C.-H., Wei, C.-C., Su, D.-C., Chang, M.-H., & Ho, J.-M. (2004). Travel time prediction with support vector regression. IEEE Transactions on Intelligent Transportation Systems, 5(4), 276–281.
Article Google Scholar
Yang, H., Chan, L., & King, I. (2002). Support vector machine regression for volatile stock market prediction. In: Pages 391–396 of: International conference on intelligent data engineering and automated learning.

Download references

Author information

Authors and Affiliations

Rutgers Center for Operations Research, Department of Management Science and Information Systems, Rutgers University, 100 Rockafeller Road, Piscataway, NJ, 08854, USA
Gianluca Gazzola & Myong K. Jeong
Bridge Intelligence LLC, 1215 Livingston Ave Suite 208, North Brunswick, NJ, 08902, USA
Gianluca Gazzola
Department of Industrial and Systems Engineering, Rutgers University, 96 Frelinghuysen Road, Piscataway, NJ, 08854, USA
Myong K. Jeong

Authors

Gianluca Gazzola
View author publications
You can also search for this author in PubMed Google Scholar
Myong K. Jeong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Myong K. Jeong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Theorem 1

In order to make the two $\max $ constraints in (6) feasible, the following systems of linear inequalities must be both infeasible:

$$\begin{aligned} \left\{ \begin{array}{ll} y-\varvec{w}^T\varvec{x} - w_0 > \xi _i+\epsilon _i\\ \varvec{A}_i \begin{pmatrix} \varvec{x} \\ y \end{pmatrix} \le \varvec{a}_i \end{array} \right. \end{aligned}$$

(17)

$$\begin{aligned} \left\{ \begin{array}{ll} \varvec{w}^T\varvec{x} + w_0 - y > \xi _i+\epsilon _i\\ \varvec{A}_i \begin{pmatrix} \varvec{x} \\ y \end{pmatrix} \le \varvec{a}_i \end{array} \right. \end{aligned}$$

(18)

By the affine Farkas’ lemma (Boyd and Vandenberghe 2004), alternative systems (19) and (20) must therefore be both feasible:

$$\begin{aligned} \begin{aligned}&\varvec{u}_i \ge \varvec{0} \quad \text{ and } \left\{ \begin{array}{ll} \varvec{A}_i^T \varvec{u}_i = \begin{pmatrix} -\varvec{w} \\ 1 \end{pmatrix} \\ \varvec{a}_i^T \varvec{u}_i - w_0 \le \xi _i+\epsilon _i \end{array} \right.&\text{ or }&\left\{ \begin{array}{ll} \varvec{A}_i^T \varvec{u}_i = \varvec{0}\\ \varvec{a}_i^T \varvec{u}_i < 0 \end{array} \right. \end{aligned} \end{aligned}$$

(19)

$$\begin{aligned} \begin{aligned}&\quad \varvec{v}_i \ge \varvec{0} \quad \ \text{ and } \left\{ \begin{array}{ll} \varvec{A}_i^T \varvec{v}_i = \begin{pmatrix} \varvec{w} \\ -1 \end{pmatrix} \\ \varvec{a}_i^T \varvec{v}_i + w_0 \le \xi _i+\epsilon _i \end{array} \right.&\text{ or }&\left\{ \begin{array}{ll} \varvec{A}_i^T \varvec{v}_i = \varvec{0}\\ \varvec{a}_i^T \varvec{v}_i < 0 \end{array} \right. \end{aligned} \end{aligned}$$

(20)

The feasibility of (19) and (20), and the non-emptiness of $P_i$ imply the following:

$$\begin{aligned} \left\{ \begin{array}{ll} \varvec{u}_i^T\left( \varvec{A}_i \begin{pmatrix} \varvec{x} \\ y \end{pmatrix} - \varvec{a}_i\right) \le 0\\ \varvec{v}_i^T\left( \varvec{A}_i \begin{pmatrix} \varvec{x} \\ y \end{pmatrix} - \varvec{a}_i\right) \le 0 \end{array} \right. \end{aligned}$$

(21)

Now, since both

$$\begin{aligned} \begin{aligned} \left\{ \begin{array}{ll} \varvec{A}_i^T \varvec{u}_i = \varvec{0}\\ \varvec{a}_i^T \varvec{u}_i < 0 \end{array} \right. \end{aligned} \end{aligned}$$

(22)

and

$$\begin{aligned} \left\{ \begin{array}{ll} \varvec{A}_i^T \varvec{v}_i = \varvec{0}\\ \varvec{a}_i^T \varvec{v}_i < 0 \end{array} \right. \end{aligned}$$

(23)

contradict (21), neither can hold true, which leads to formulation (7). $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gazzola, G., Jeong, M.K. Support vector regression for polyhedral and missing data. Ann Oper Res 303, 483–506 (2021). https://doi.org/10.1007/s10479-020-03799-y

Download citation

Accepted: 10 September 2020
Published: 07 October 2020
Issue Date: August 2021
DOI: https://doi.org/10.1007/s10479-020-03799-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Support vector regression for polyhedral and missing data

Abstract

Access this article

Similar content being viewed by others

Linear support vector regression with linear constraints

Sparse classification: a scalable discrete optimization perspective

Bayesian Nonlinear Support Vector Machines for Big Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Support vector regression for polyhedral and missing data

Abstract

Access this article

Similar content being viewed by others

Linear support vector regression with linear constraints

Sparse classification: a scalable discrete optimization perspective

Bayesian Nonlinear Support Vector Machines for Big Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation