Abstract
Datasets with outliers pose a serious challenge in regression analysis. In this paper, a new regression method called relaxed support vector regression (RSVR) is proposed for such datasets. RSVR is based on the concept of constraint relaxation which leads to increased robustness in datasets with outliers. RSVR is formulated using both linear and quadratic loss functions. Numerical experiments on benchmark datasets and computational comparisons with other popular regression methods depict the behavior of our proposed method. RSVR achieves better overall performance than support vector regression (SVR) in measures such as RMSE and \(R^2_{adj}\) while being on par with other state-of-the-art regression methods such as robust regression (RR). Additionally, RSVR provides robustness for higher dimensional datasets which is a limitation of RR, the robust equivalent of ordinary least squares regression. Moreover, RSVR can be used on datasets that contain varying levels of noise.
Similar content being viewed by others
References
Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., et al. (2011). Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17, 255–287.
Behnke, A. R., & Wilmore, J. H. (1974). Evaluation and regulation of body build and composition. Englewood Cliffs: Prentice-Hall.
Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.
Cao, G., Guo, Y., & Bouman, C. (2010). High dimensional regression using the sparse matrix transform (SMT). In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 1870–1873). IEEE.
Cao, F., Ye, H., & Wang, D. (2015). A probabilistic learning algorithm for robust modeling using neural networks with random weights. Information Sciences, 313, 62–78.
Cauwenberghs, G., & Poggio, T. (2001). Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems. NIPS’00 Proceedings of the 13th International Conference on Neural Information Processing Systems, Denver, CO, 2000 (pp. 409–415). Cambridge, MA: MIT Press.
Chang, C. C., & Lin, C. J. (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.
Cifarelli, C., Guarracino, M. R., Seref, O., Cuciniello, S., & Pardalos, P. M. (2007). Incremental classification with generalized eigenvalues. Journal of Classification, 24(2), 205–219.
Diehl, C. P., & Cauwenberghs, G. (2003). SVM incremental learning, adaptation and optimization. In Proceedings of the international joint conference on neural networks, 2003 (Vol. 4, pp. 2685–2690). IEEE.
Dulá, J., & López, F. (2013). Dea with streaming data. Omega, 41(1), 41–47.
D’Urso, P., Massari, R., & Santoro, A. (2011). Robust fuzzy regression analysis. Information Sciences, 181(19), 4154–4174.
Eubank, R. L. (1999). Nonparametric regression and spline smoothing. Boca Raton: CRC Press.
Guarracino, M. R., Cuciniello, S., & Feminiano, D. (2009). Incremental generalized eigenvalue classification on data streams. In International workshop on data stream management and mining (pp. 1–12).
Guvenir, H. A., & Uysal, I. (2000). Bilkent University function approximation repository. Accessed August 10, 2015.
Harrison, D., & Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1), 81–102.
Hawkins, D. M. (1980). Identification of outliers (Vol. 11). New York: Springer.
Hirst, J. D., King, R. D., & Sternberg, M. J. (1994). Quantitative structure-activity relationships by neural networks and inductive logic programming. II. The inhibition of dihydrofolate reductase by triazines. Journal of Computer-Aided Molecular Design, 8(4), 421–432.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.
Huang, C. M., Lee, Y. J., Lin, D. K., & Huang, S. Y. (2007). Model selection for support vector machines via uniform design. Computational Statistics and Data Analysis, 52(1), 335–346.
IBM. (2013). IBM ILOG CPLEX: High-performance mathematical programming engine. https://www-01.ibm.com/software/in/integration/optimization/cplex/. Accessed 9 Apr 2018.
Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control, 12(2), 231–254.
Kibler, D., Aha, D. W., & Albert, M. K. (1989). Instance-based prediction of real-valued attributes. Computational Intelligence, 5(2), 51–57.
Kirts, S., Panagopoulos, O. P., Xanthopoulos, P., & Nam, B. H. (2017). Soil-compressibility prediction models using machine learning. Journal of Computing in Civil Engineering, 32(1), 04017067.
Kohavi, R., et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI, 14, 1137–1145.
Levinson, N. (1947). The wiener rms (root mean square) error criterion in filter design and prediction. Institute of Electrical and Electronics Engineers, 1(3), 129–148.
Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Lin, C. J., Hsu, C. W., & Chang, C. C. (2003). A practical guide to support vector classification. National Taiwan University. www.csie.ntu.edu.tw/cjlin/papers/guide/guide.pdf.
Panagopoulos, A. A. (2013). A novel method for predicting the power output of distributed renewable energy resources. Ph.D. thesis, Diploma thesis, Technical University of Crete.
Panagopoulos, A. A. (2016). Efficient control of domestic space heating systems and intermittent energy resources. Ph.D. thesis, University of Southampton.
Panagopoulos, A. A., Chalkiadakis, G., & Koutroulis, E. (2012). Predicting the power output of distributed renewable energy resources within a broad geographical region. In Proceedings of the 20th European conference on artificial intelligence (pp. 981–986). IOS Press.
Panagopoulos, A. A., Maleki, S., Rogers, A., Venanzi, M., & Jennings, N. R. (2017). Advanced economic control of electricity-based space heating systems in domestic coalitions with shared intermittent energy resources. ACM Transactions on Intelligent Systems and Technology (TIST), 8(4), 59.
Panagopoulos, O. P., Pappu, V., Xanthopoulos, P., & Pardalos, P. M. (2016). Constrained subspace classifier for high dimensional datasets. Omega,. https://doi.org/10.1016/j.omega.2015.05.009.
Pang, S., Ozawa, S., & Kasabov, N. (2005). Incremental linear discriminant analysis for classification of data streams. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 35(5), 905–914.
Pappu, V., Panagopoulos, O. P., Xanthopoulos, P., & Pardalos, P. M. (2015). Sparse proximal support vector machines for feature selection in high dimensional datasets. Expert Systems with Applications,. https://doi.org/10.1016/j.eswa.2015.08.022.
Peters, G., & Lacic, Z. (2012). Tackling outliers in granular box regression. Information Sciences, 212, 44–56.
Rousseeuw, P. J., & Leroy, A. M. (2005). Robust regression and outlier detection (Vol. 589). New York: Wiley.
Şeref, O., Chaovalitwongse, W. A., & Brooks, J. P. (2014). Relaxing support vectors for classification. Annals of Operations Research, 216(1), 229–255.
Smith, M. R., & Martinez, T. (2011). Improving classification accuracy by identifying and removing instances that should be misclassified. In The 2011 international joint conference on neural networks (IJCNN) (pp. 2690–2697). IEEE.
Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222.
Street, J. O., Carroll, R. J., & Ruppert, D. (1988). A note on computing robust regression estimates via iteratively reweighed least squares. The American Statistician, 42(2), 152–154.
Theil, H. (1959). Economic forecasts and policy. The American Economic Review, 49(4), 711–716.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Methodological), 58, 267–288.
Vapnik, V. (2000). The nature of statistical learning theory. New York: Springer.
Vapnik, V. N., & Vapnik, V. (1998). Statistical learning theory (Vol. 1). New York: Wiley.
Wolters, R., & Kateman, G. (1989). The performance of least squares and robust regression in the calibration of analytical methods under non-normal noise distributions. Journal of Chemometrics, 3(2), 329–342.
Xanthopoulos, P., Guarracino, M. R., & Pardalos, P. M. (2014). Robust generalized eigenvalue classifier with ellipsoidal uncertainty. Annals of Operations Research, 216(1), 327–342.
Xanthopoulos, P., Panagopoulos, O. P., Bakamitsos, G. A., & Freudmann, E. (2016). Hashtag hijacking: What it is, why it happens and how to avoid it. Journal of Digital and Social Media Marketing, 3(4), 353–362.
Xanthopoulos, P., Pardalos, P., & Trafalis, T. B. (2012). Robust data mining. New York: Springer.
Yang, E., Lozano, A., & Ravikumar, P. (2014). Elementary estimators for high-dimensional linear regression. In: Proceedings of the 31st international conference on machine learning (ICML-14) (pp. 388–396).
Yeh, I. C. (2007). Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cement and Concrete Composites, 29(6), 474–480.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.
Author information
Authors and Affiliations
Corresponding author
Appendices
Quadratic loss function formulation
1.1 Dual derivation
The Lagrangian function for Formulation (5) can be written as,
where \(\mathbf {\alpha ^+}, \mathbf {\alpha ^-}, \beta , \mathbf {\lambda }\) and \(\overline{\lambda }\) are the Lagrange multipliers. Since (5) is a convex problem, its Wolfe dual can be obtained from the following stationary first order conditions of the primal variables \(\mathbf {w}, b, \mathbf {\xi }, \overline{\mathbf {\xi }}, v\) and \(\overline{v}\).
Substituting the equivalent expressions for \(\mathbf {w}\), b, \(\mathbf {\xi }\), \(\overline{\mathbf {\xi }}\), v and \(\overline{v}\) from Eqs. (11a)–(11f) back in expression (10), the Wolfe dual can be written as shown in (6).
1.2 Optimal hyperplane parameters
The solution to (6) is used to evaluate,
Let \( {S^+} = \Big \{ {\alpha _i}^+ | \, (0< {\alpha _i}^+ < \beta ) \Big \}\), \(I^+ = \Big \{ i | \, {\alpha _i}^+ \in S^+ \Big \}\).
Let \( {S^-} = \Big \{ {\alpha _i}^- | \, (0< {\alpha _i}^- < \beta ) \Big \}\), \(I^- = \Big \{ i | \, {\alpha _i}^- \in S^- \Big \}\).
The bias can be computed as,
Lemma 1 derivation
The proof follows from the Karush–Kuhn–Tucker complementary slackness condition \(\beta \left( n \varUpsilon - \sum _{i=1}^n \left( v_i + \overline{v}_i \right) \right) = 0\). Using the fact that \(||w||>0\) and constraints (6c) and (6d), Eq. (11a) implies that \(\beta >0\). Thus, constraint (5d) is binding, i.e., the total free slack amount \(n \varUpsilon \) is always consumed completely.
Linear loss function formulation
1.1 Dual derivation
The Lagrangian function for Formulation (8) can be written as,
where \(\mathbf {\alpha ^+}, \mathbf {\alpha ^-}, \beta , \gamma , \overline{\gamma }, \mathbf {\delta }, \mathbf {\overline{\delta }}, \mathbf {\lambda }\) and \(\overline{\lambda }\) are the Lagrange multipliers. Since (8) is a convex problem, its Wolfe dual can be obtained from the following stationary first order conditions of the primal variables \(\mathbf {w}, b, \mathbf {\xi }, \overline{\mathbf {\xi }}, s, \overline{s}, v\) and \(\overline{v}\) .
Substituting the equivalent expressions for \(\mathbf {w}, b,\mathbf {\xi },\overline{\mathbf {\xi }},v, \overline{v}\), s and \(\overline{s}\) from Eqs. (15a)–(15h) back in expression (14), the Wolfe dual can be written as shown in (9).
1.2 Optimal hyperplane parameters
The solution to (9) is used to evaluate,
Let \( {S^+} = \Big \{ {\alpha _i}^+ | \, (0< {\alpha _i}^+ < C) \Big \}\), \({I}^+ = \Big \{ i | \, {\alpha _i}^+ \in S^+ \Big \}\).
Let \( {S^-} = \Big \{ {\alpha _i}^- | \, (0< {\alpha _i}^- < C) \Big \}\), \( {I}^- = \Big \{ i | \, {\alpha _i}^- \in S^- \Big \}\).
The bias can be computed as,
Rights and permissions
About this article
Cite this article
Panagopoulos, O.P., Xanthopoulos, P., Razzaghi, T. et al. Relaxed support vector regression. Ann Oper Res 276, 191–210 (2019). https://doi.org/10.1007/s10479-018-2847-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-018-2847-6