Skip to main content
Log in

Relaxed support vector regression

  • S.I.: Computational Biomedicine
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Datasets with outliers pose a serious challenge in regression analysis. In this paper, a new regression method called relaxed support vector regression (RSVR) is proposed for such datasets. RSVR is based on the concept of constraint relaxation which leads to increased robustness in datasets with outliers. RSVR is formulated using both linear and quadratic loss functions. Numerical experiments on benchmark datasets and computational comparisons with other popular regression methods depict the behavior of our proposed method. RSVR achieves better overall performance than support vector regression (SVR) in measures such as RMSE and \(R^2_{adj}\) while being on par with other state-of-the-art regression methods such as robust regression (RR). Additionally, RSVR provides robustness for higher dimensional datasets which is a limitation of RR, the robust equivalent of ordinary least squares regression. Moreover, RSVR can be used on datasets that contain varying levels of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., et al. (2011). Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17, 255–287.

    Google Scholar 

  • Behnke, A. R., & Wilmore, J. H. (1974). Evaluation and regulation of body build and composition. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.

    Google Scholar 

  • Cao, G., Guo, Y., & Bouman, C. (2010). High dimensional regression using the sparse matrix transform (SMT). In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 1870–1873). IEEE.

  • Cao, F., Ye, H., & Wang, D. (2015). A probabilistic learning algorithm for robust modeling using neural networks with random weights. Information Sciences, 313, 62–78.

    Article  Google Scholar 

  • Cauwenberghs, G., & Poggio, T. (2001). Incremental and decremental support vector machine learning. In Advances in Neural Information Processing Systems. NIPS’00 Proceedings of the 13th International Conference on Neural Information Processing Systems, Denver, CO, 2000 (pp. 409–415). Cambridge, MA: MIT Press.

  • Chang, C. C., & Lin, C. J. (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.

    Google Scholar 

  • Cifarelli, C., Guarracino, M. R., Seref, O., Cuciniello, S., & Pardalos, P. M. (2007). Incremental classification with generalized eigenvalues. Journal of Classification, 24(2), 205–219.

    Article  Google Scholar 

  • Diehl, C. P., & Cauwenberghs, G. (2003). SVM incremental learning, adaptation and optimization. In Proceedings of the international joint conference on neural networks, 2003 (Vol. 4, pp. 2685–2690). IEEE.

  • Dulá, J., & López, F. (2013). Dea with streaming data. Omega, 41(1), 41–47.

    Article  Google Scholar 

  • D’Urso, P., Massari, R., & Santoro, A. (2011). Robust fuzzy regression analysis. Information Sciences, 181(19), 4154–4174.

    Article  Google Scholar 

  • Eubank, R. L. (1999). Nonparametric regression and spline smoothing. Boca Raton: CRC Press.

    Google Scholar 

  • Guarracino, M. R., Cuciniello, S., & Feminiano, D. (2009). Incremental generalized eigenvalue classification on data streams. In International workshop on data stream management and mining (pp. 1–12).

  • Guvenir, H. A., & Uysal, I. (2000). Bilkent University function approximation repository. Accessed August 10, 2015.

  • Harrison, D., & Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1), 81–102.

    Article  Google Scholar 

  • Hawkins, D. M. (1980). Identification of outliers (Vol. 11). New York: Springer.

    Book  Google Scholar 

  • Hirst, J. D., King, R. D., & Sternberg, M. J. (1994). Quantitative structure-activity relationships by neural networks and inductive logic programming. II. The inhibition of dihydrofolate reductase by triazines. Journal of Computer-Aided Molecular Design, 8(4), 421–432.

    Article  Google Scholar 

  • Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.

    Article  Google Scholar 

  • Huang, C. M., Lee, Y. J., Lin, D. K., & Huang, S. Y. (2007). Model selection for support vector machines via uniform design. Computational Statistics and Data Analysis, 52(1), 335–346.

    Article  Google Scholar 

  • IBM. (2013). IBM ILOG CPLEX: High-performance mathematical programming engine. https://www-01.ibm.com/software/in/integration/optimization/cplex/. Accessed 9 Apr 2018.

  • Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control, 12(2), 231–254.

    Article  Google Scholar 

  • Kibler, D., Aha, D. W., & Albert, M. K. (1989). Instance-based prediction of real-valued attributes. Computational Intelligence, 5(2), 51–57.

    Article  Google Scholar 

  • Kirts, S., Panagopoulos, O. P., Xanthopoulos, P., & Nam, B. H. (2017). Soil-compressibility prediction models using machine learning. Journal of Computing in Civil Engineering, 32(1), 04017067.

    Article  Google Scholar 

  • Kohavi, R., et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI, 14, 1137–1145.

    Google Scholar 

  • Levinson, N. (1947). The wiener rms (root mean square) error criterion in filter design and prediction. Institute of Electrical and Electronics Engineers, 1(3), 129–148.

    Google Scholar 

  • Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.

  • Lin, C. J., Hsu, C. W., & Chang, C. C. (2003). A practical guide to support vector classification. National Taiwan University. www.csie.ntu.edu.tw/cjlin/papers/guide/guide.pdf.

  • Panagopoulos, A. A. (2013). A novel method for predicting the power output of distributed renewable energy resources. Ph.D. thesis, Diploma thesis, Technical University of Crete.

  • Panagopoulos, A. A. (2016). Efficient control of domestic space heating systems and intermittent energy resources. Ph.D. thesis, University of Southampton.

  • Panagopoulos, A. A., Chalkiadakis, G., & Koutroulis, E. (2012). Predicting the power output of distributed renewable energy resources within a broad geographical region. In Proceedings of the 20th European conference on artificial intelligence (pp. 981–986). IOS Press.

  • Panagopoulos, A. A., Maleki, S., Rogers, A., Venanzi, M., & Jennings, N. R. (2017). Advanced economic control of electricity-based space heating systems in domestic coalitions with shared intermittent energy resources. ACM Transactions on Intelligent Systems and Technology (TIST), 8(4), 59.

    Google Scholar 

  • Panagopoulos, O. P., Pappu, V., Xanthopoulos, P., & Pardalos, P. M. (2016). Constrained subspace classifier for high dimensional datasets. Omega,. https://doi.org/10.1016/j.omega.2015.05.009.

    Article  Google Scholar 

  • Pang, S., Ozawa, S., & Kasabov, N. (2005). Incremental linear discriminant analysis for classification of data streams. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 35(5), 905–914.

    Article  Google Scholar 

  • Pappu, V., Panagopoulos, O. P., Xanthopoulos, P., & Pardalos, P. M. (2015). Sparse proximal support vector machines for feature selection in high dimensional datasets. Expert Systems with Applications,. https://doi.org/10.1016/j.eswa.2015.08.022.

    Article  Google Scholar 

  • Peters, G., & Lacic, Z. (2012). Tackling outliers in granular box regression. Information Sciences, 212, 44–56.

    Article  Google Scholar 

  • Rousseeuw, P. J., & Leroy, A. M. (2005). Robust regression and outlier detection (Vol. 589). New York: Wiley.

    Google Scholar 

  • Şeref, O., Chaovalitwongse, W. A., & Brooks, J. P. (2014). Relaxing support vectors for classification. Annals of Operations Research, 216(1), 229–255.

    Article  Google Scholar 

  • Smith, M. R., & Martinez, T. (2011). Improving classification accuracy by identifying and removing instances that should be misclassified. In The 2011 international joint conference on neural networks (IJCNN) (pp. 2690–2697). IEEE.

  • Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222.

    Article  Google Scholar 

  • Street, J. O., Carroll, R. J., & Ruppert, D. (1988). A note on computing robust regression estimates via iteratively reweighed least squares. The American Statistician, 42(2), 152–154.

    Google Scholar 

  • Theil, H. (1959). Economic forecasts and policy. The American Economic Review, 49(4), 711–716.

    Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Methodological), 58, 267–288.

    Article  Google Scholar 

  • Vapnik, V. (2000). The nature of statistical learning theory. New York: Springer.

    Book  Google Scholar 

  • Vapnik, V. N., & Vapnik, V. (1998). Statistical learning theory (Vol. 1). New York: Wiley.

    Google Scholar 

  • Wolters, R., & Kateman, G. (1989). The performance of least squares and robust regression in the calibration of analytical methods under non-normal noise distributions. Journal of Chemometrics, 3(2), 329–342.

    Article  Google Scholar 

  • Xanthopoulos, P., Guarracino, M. R., & Pardalos, P. M. (2014). Robust generalized eigenvalue classifier with ellipsoidal uncertainty. Annals of Operations Research, 216(1), 327–342.

    Article  Google Scholar 

  • Xanthopoulos, P., Panagopoulos, O. P., Bakamitsos, G. A., & Freudmann, E. (2016). Hashtag hijacking: What it is, why it happens and how to avoid it. Journal of Digital and Social Media Marketing, 3(4), 353–362.

    Google Scholar 

  • Xanthopoulos, P., Pardalos, P., & Trafalis, T. B. (2012). Robust data mining. New York: Springer.

    Google Scholar 

  • Yang, E., Lozano, A., & Ravikumar, P. (2014). Elementary estimators for high-dimensional linear regression. In: Proceedings of the 31st international conference on machine learning (ICML-14) (pp. 388–396).

  • Yeh, I. C. (2007). Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cement and Concrete Composites, 29(6), 474–480.

    Article  Google Scholar 

  • Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petros Xanthopoulos.

Appendices

Quadratic loss function formulation

1.1 Dual derivation

The Lagrangian function for Formulation (5) can be written as,

$$\begin{aligned}&\mathcal {L}(\mathbf {w}, \mathbf {\xi },\overline{\mathbf {\xi }}, b, \epsilon ,\alpha ^+,\alpha ^-, \beta , \lambda ,\overline{\lambda }, v,\overline{v}) \nonumber \\&\quad = \frac{1}{2} \langle \mathbf {w} , \mathbf {w} \rangle + \frac{C}{2} \sum _{i=1}^n \left( {{\xi _i}^2 +{\overline{\mathbf {\xi }}_i }^2} \right) - \sum _{i=1}^n {\alpha _i}^+ \left( \epsilon + \xi _i + v_i +\langle \mathbf {w} , \varPhi (\mathbf {x}_i) \rangle + b - y_{i} \right)&\nonumber \\&\qquad - \sum _{i=1}^n {\alpha _i}^- \left( \epsilon + \overline{\xi }_i +\overline{v_i } - \langle \mathbf {w} , \varPhi (\mathbf {x}_i) \rangle - b + y_{i} \right) - \beta \left( n \varUpsilon - \sum _{i=1}^n \left( v_i + \overline{v}_i \right) \right)&\nonumber \\&\qquad - \sum _{i=1}^n\lambda _i v_i - \sum _{i=1}^n \overline{\lambda }_i \overline{v}_i&\end{aligned}$$
(10a)

where \(\mathbf {\alpha ^+}, \mathbf {\alpha ^-}, \beta , \mathbf {\lambda }\) and \(\overline{\lambda }\) are the Lagrange multipliers. Since (5) is a convex problem, its Wolfe dual can be obtained from the following stationary first order conditions of the primal variables \(\mathbf {w}, b, \mathbf {\xi }, \overline{\mathbf {\xi }}, v\) and \(\overline{v}\).

$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \mathbf {w} }&= \mathbf {w} - \sum _{i=1}^n \left( {\alpha _i}^+ -{\alpha _i}^- \right) \varPhi (\mathbf {x}_i) = 0&\end{aligned}$$
(11a)
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial b}&= \sum _{i=1}^n \left( {\alpha _i}^- - {\alpha _i}^+ \right) = 0&\end{aligned}$$
(11b)
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \xi _k}&= C {\xi }_k - {\alpha _k}^+ = 0,&\forall k=1,2, \ldots n&\end{aligned}$$
(11c)
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \overline{\xi }_k}&= C \overline{{\xi }}_k - {\alpha _k}^- = 0,&\forall k=1,2, \ldots n&\end{aligned}$$
(11d)
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial v_k}&= - {\alpha _k}^+ + \beta - \lambda _k = 0,&\forall k=1,2, \ldots n \end{aligned}$$
(11e)
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \overline{v}_k}&= - {\alpha _k}^- + \beta - \overline{\mathbf {\lambda }}_k = 0,&\forall k=1,2, \ldots n \end{aligned}$$
(11f)

Substituting the equivalent expressions for \(\mathbf {w}\), b, \(\mathbf {\xi }\), \(\overline{\mathbf {\xi }}\), v and \(\overline{v}\) from Eqs. (11a)–(11f) back in expression (10), the Wolfe dual can be written as shown in (6).

1.2 Optimal hyperplane parameters

The solution to (6) is used to evaluate,

$$\begin{aligned} {\mathbf {w}}^* = \sum _{j=1}^n \left( {\alpha _j}^+ -{\alpha _j}^- \right) \varPhi (\mathbf {x}_j) \end{aligned}$$
(12a)

Let \( {S^+} = \Big \{ {\alpha _i}^+ | \, (0< {\alpha _i}^+ < \beta ) \Big \}\), \(I^+ = \Big \{ i | \, {\alpha _i}^+ \in S^+ \Big \}\).

Let \( {S^-} = \Big \{ {\alpha _i}^- | \, (0< {\alpha _i}^- < \beta ) \Big \}\), \(I^- = \Big \{ i | \, {\alpha _i}^- \in S^- \Big \}\).

The bias can be computed as,

$$\begin{aligned} {b}^*&= \frac{1}{|S^+|} \sum _{i \in I^+} \left( - \epsilon -\frac{{\alpha _i}^+ }{C} - \sum _{j=1}^n \left( {\alpha _j}^+ -{\alpha _j}^- \right) K(\mathbf {x}_i,\mathbf {x}_j) + y_{i} \right) \nonumber \\ {}&\quad + \frac{1}{|S^-|} \sum _{i \in I^-}\left( \epsilon + \frac{{\alpha _i}^- }{C} -\sum _{j=1}^n \left( {\alpha _j}^+ -{\alpha _j}^- \right) K(\mathbf {x}_i,\mathbf {x}_j) + y_{i} \right) \end{aligned}$$
(13a)

Lemma 1 derivation

The proof follows from the Karush–Kuhn–Tucker complementary slackness condition \(\beta \left( n \varUpsilon - \sum _{i=1}^n \left( v_i + \overline{v}_i \right) \right) = 0\). Using the fact that \(||w||>0\) and constraints (6c) and (6d), Eq. (11a) implies that \(\beta >0\). Thus, constraint (5d) is binding, i.e., the total free slack amount \(n \varUpsilon \) is always consumed completely.

Linear loss function formulation

1.1 Dual derivation

The Lagrangian function for Formulation (8) can be written as,

$$\begin{aligned}&\mathcal {L}(\mathbf {w}, \mathbf {\xi },\overline{\mathbf {\xi }}, b, \epsilon ,\alpha ^+,\alpha ^-, \beta , {\mathbf {\gamma }},{\mathbf {\overline{\gamma }}},{\mathbf {\delta }},{\mathbf {\overline{\delta }}}, \lambda ,\overline{\lambda }, v,\overline{v}, s,\overline{s})&\nonumber \\&\quad =\frac{1}{2} \langle \mathbf {w} , \mathbf {w} \rangle + C \left( \sum _{i=1}^n \left( \xi _i +\overline{\mathbf {\xi }}_i \right) + s + \overline{s} \right) - \sum _{i=1}^n {\alpha _i}^+ \left( \epsilon + \xi _i + v_i +\langle \mathbf {w} , \varPhi (\mathbf {x}_i) \rangle + b - y_{i} \right)&\nonumber \\&\qquad - \sum _{i=1}^n {\alpha _i}^- \left( \epsilon + \overline{\xi }_i +\overline{v}_i - \langle \mathbf {w} , \varPhi (\mathbf {x}_i) \rangle - b + y_{i} \right) - \beta \left( n \varUpsilon - \sum _{i=1}^n \left( v_i +\overline{v}_i \right) \right) - \sum _{i=1}^n\lambda _i v_i&\nonumber \\&\qquad - \sum _{i=1}^n \overline{\lambda }_i \overline{v}_i - \sum _{i=1}^n {\gamma _i} \xi _i - \sum _{i=1}^n \overline{\mathbf {\gamma }}_i \overline{\mathbf {\xi }}_i - \sum _{i=1}^n {\delta _i} \left( s - \xi _i \right) - \sum _{i=1}^n \overline{\mathbf {\delta }}_i \left( \overline{s} - \overline{\mathbf {\xi }}_i \right) \end{aligned}$$
(14a)

where \(\mathbf {\alpha ^+}, \mathbf {\alpha ^-}, \beta , \gamma , \overline{\gamma }, \mathbf {\delta }, \mathbf {\overline{\delta }}, \mathbf {\lambda }\) and \(\overline{\lambda }\) are the Lagrange multipliers. Since (8) is a convex problem, its Wolfe dual can be obtained from the following stationary first order conditions of the primal variables \(\mathbf {w}, b, \mathbf {\xi }, \overline{\mathbf {\xi }}, s, \overline{s}, v\) and \(\overline{v}\) .

$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \mathbf {w} }&= \mathbf {w} - \sum _{i=1}^n \left( {\alpha _i}^+ -{\alpha _i}^- \right) \varPhi (\mathbf {x}_i)= 0&\end{aligned}$$
(15a)
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial b}&= \sum _{i=1}^n \left( {\alpha _i}^- - {\alpha _i}^+ \right) = 0&\end{aligned}$$
(15b)
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \xi _k}&= C - {\alpha _k}^+ - {\gamma _k} + {\delta _k} = 0,&\forall k=1,2, \ldots n&\end{aligned}$$
(15c)
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \overline{\xi }_k}&= C - {\alpha _k}^- - \overline{\gamma }_k + \overline{\delta }_k = 0,&\forall k=1,2, \ldots n&\end{aligned}$$
(15d)
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial v_k}&= - {\alpha _k}^+ + \beta - \lambda _k = 0,&\forall k=1,2, \ldots n \end{aligned}$$
(15e)
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \overline{v}_k}&= - {\alpha _k}^- + \beta - \overline{\mathbf {\lambda }}_k = 0,&\forall k=1,2, \ldots n \end{aligned}$$
(15f)
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial s}&= C - \sum _{i=1}^n {\delta }_i = 0 \end{aligned}$$
(15g)
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial \overline{s}}&= C - \sum _{i=1}^n \overline{{\delta }}_i = 0 \end{aligned}$$
(15h)

Substituting the equivalent expressions for \(\mathbf {w}, b,\mathbf {\xi },\overline{\mathbf {\xi }},v, \overline{v}\), s and \(\overline{s}\) from Eqs. (15a)–(15h) back in expression (14), the Wolfe dual can be written as shown in (9).

1.2 Optimal hyperplane parameters

The solution to (9) is used to evaluate,

$$\begin{aligned} {\mathbf {w}}^* = \sum _{j=1}^n \left( {\alpha _j}^+ -{\alpha _j}^- \right) \varPhi (\mathbf {x}_j) \end{aligned}$$
(16a)

Let \( {S^+} = \Big \{ {\alpha _i}^+ | \, (0< {\alpha _i}^+ < C) \Big \}\), \({I}^+ = \Big \{ i | \, {\alpha _i}^+ \in S^+ \Big \}\).

Let \( {S^-} = \Big \{ {\alpha _i}^- | \, (0< {\alpha _i}^- < C) \Big \}\), \( {I}^- = \Big \{ i | \, {\alpha _i}^- \in S^- \Big \}\).

The bias can be computed as,

$$\begin{aligned} {b}^*&= \frac{1}{|S^+|} \sum _{i \in {I}^+} \left( - \epsilon - \sum _{j=1}^n \left( {\alpha _j}^+ -{\alpha _j}^- \right) K(\mathbf {x}_i,\mathbf {x}_j) + y_{i} \right) \nonumber \\ {}&\quad + \frac{1}{|S^-|} \sum _{i \in {I}^-}\left( \epsilon -\sum _{j=1}^n \left( {\alpha _j}^+ -{\alpha _j}^- \right) K(\mathbf {x}_i,\mathbf {x}_j) + y_{i} \right) \end{aligned}$$
(17a)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Panagopoulos, O.P., Xanthopoulos, P., Razzaghi, T. et al. Relaxed support vector regression. Ann Oper Res 276, 191–210 (2019). https://doi.org/10.1007/s10479-018-2847-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-018-2847-6

Keywords

Navigation