Abstract
As a generalization to the ordinary least square regression, expectile regression, which can predict conditional expectiles, is fitted by minimizing an asymmetric square loss function on the training data. In literature, the idea of support vector machine was introduced to expectile regression to increase the flexibility of the model, resulting in support vector expectile regression (SVER). This paper reformulates the Lagrangian function of SVER as a differentiable convex function over the nonnegative orthant, which can be minimized by a simple iterative algorithm. The proposed algorithm is easy to implement, without requiring any particular optimization toolbox besides basic matrix operations. Theoretical and experimental analysis show that the algorithm converges r-linearly to the unique minimum point. The proposed method was compared to alternative algorithms on simulated data and real-world data, and we observe that the proposed method is much more computationally efficient while yielding similar prediction accuracy.


Similar content being viewed by others
Notes
Please see Appendix C for details.
The source code for KLERC is available upon request.
For notational convenience, in this proof, we assume all the vectors are row vectors. Clearly, the result also applies to column vectors.
References
Aganagić M (1984) Newton’s method for linear complementarity problems. Math Program 28(3):349–362
Armand P, Gilbert JC, Jan-Jégou S (2000) A feasible BFGS interior point algorithm for solving convex minimization problems. SIAM J Optim 11(1):199–222
Bhatia R (1997) Matrix analysis. Springer, New York
Choi K-L, Shim J, Seok K (2014) Support vector expectile regression using IRWLS procedure. J Korean Data Inf Sci Soc 25(4):931–939
Cottle RW (1983) On the uniqueness of solutions to linear complementarity problems. Math Program 27(2):191–213
Cottle RW, Pang J-S, Stone RE (1992) The linear complementarity problem. SIAM, Philadelphia
Croissant Y, Graves S (2016) Ecdat: data sets for econometrics. R package version 0.3-1. https://CRAN.R-project.org/package=Ecdat
Efron B (1991) Regression percentiles using asymmetric squared error loss. Stat Sin 55:93–125
Farooq M, Steinwart I (2017) An SVM-like approach for expectile regression. Comput Stat Data Anal 109:159–181
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58(1):30–37
Koenker R (2005) Quantile regression. Cambridge University Press, New York
Kremers H, Talman D (1994) A new pivoting algorithm for the linear complementarity problem allowing for an arbitrary starting point. Math Program 63(1):235–252
Mangasarian OL (1994) Nonlinear programming. SIAM, Philadelphia
Mangasarian OL, Musicant DR (1999) Successive overrelaxation for support vector machines. IEEE Trans Neural Netw 10:1032–1037
Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. J Mach Learn Res 1:161–177
Musicant DR, Feinberg A (2004) Active set support vector regression. IEEE Trans Neural Netw 15(2):268–275
Newey W, Powell JL (1987) Asymmetric least squares estimation and testing. Econometrica 55(4):819–847
Osuna E, Freund R, Girosi F (1997a) An improved training algorithm for support vector machines. In: Proceedings of the IEEE workshop neural networks for signal processing, pp 276–285
Osuna E, Freund R, Girosi F (1997b) Training support vector machines: an application to face detection. In: Proceedings of the IEEE conferences on computer vision and pattern recognition
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Schöelkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge
Schnabel SK, Eilers PHC (2009) Optimal expectile smoothing. Comput Stat Data Anal 53(12):4168–4177
Sobotka F, Kneib T (2012) Geoadditive expectile regression. Comput Stat Data Anal 56(4):755–767
Sobotka F, Schnabel S, Schulze WL (2014) expectreg: Expectile and quantile regression. R package version 0.39. https://CRAN.R-project.org/package=expectreg
Vapnik V (1998) Statistical learning theory. Wiley, New York
Waldmann E, Sobotka F, Kneib T (2017) Bayesian regularisation in geoadditive expectile regression. Stat Comput 27(6):1539–1553
Weingessel A (2013) quadprog: Functions to solve quadratic programming problems. R package version 1.5-5. https://CRAN.R-project.org/package=quadprog
Yang Y, Zou H (2015) Nonparametric multiple expectile regression via ER-boost. J Stat Comput Simul 85(7):1442–1458
Yang Y, Zhang T, Zou H (2015) KERE: expectile regression in reproducing kernel hilbert space. R package version 1.0.0. https://CRAN.R-project.org/package=KERE
Yang Y, Zhang T, Zou H (2018) Flexible expectile regression in reproducing kernel Hilbert space. Technometrics 60(1):26–35
Acknowledgements
The author would like to thank the editor and anonymous reviewers for their constructive suggestions which greatly help improve the paper. This work was supported by a Faculty Research Grant (F07336-162001-022) and a Summer Faculty Fellowship from Missouri State University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Proof of Proposition 1
Assume \(\hat{\varvec{\xi }} = (\hat{\xi }_1, \hat{\xi }_2,\ldots , \hat{\xi }_n)'\) and without loss of generality, assume \(\hat{\xi }_1<0\). Let \(\tilde{\varvec{\xi }} = (0, \hat{\xi }_2,\ldots , \hat{\xi }_n)'\), that is, we replace the first component of \(\hat{\varvec{\xi }}\) (which is negative) by 0 and keep others unchanged. Because \(\hat{\xi }_1\) satisfies the constraint \(y_1 - {\hat{\mathbf{w}}}'\phi (\mathbf{x}_1) - {\hat{b}} \le \hat{\xi }_1\), we must have \(y_1 - {\hat{\mathbf{w}}}'\phi (\mathbf{x}_1) - {\hat{b}}\le 0\) since \(\hat{\xi }_1<0\). By assumption, the constraints are satisfied at \(\hat{\xi }_i\) for \(i=2,\ldots ,n\). Hence, the constraints are satisfied at all components of \(\tilde{\varvec{\xi }}\).
However, there is
since \(\hat{\xi }_1<0\) by assumption. Inequality (24) contradicts the assumption that \((\hat{\mathbf{w}}, \hat{b}, \hat{\varvec{\xi }}, \hat{\varvec{\xi }^*})\) is the minimum point. Thus, at the minimum point, there must be \(\hat{\xi }_1\ge 0\). In the same manner, it can be argued that all components of \(\hat{\varvec{\xi }}\) and \(\hat{\varvec{\xi }^*}\) should be nonnegative.
Proof of Proposition 2
We will prove that \({\hat{\alpha }}_1{\hat{\alpha }}^*_1=0\), and others can be argued similarly. Assume \({\hat{\alpha }}_1{\hat{\alpha }}^*_1>0\), then \({\hat{\alpha }}_1>0\) and \({\hat{\alpha }}^*_1>0\). Let \(\beta = \min \{{\hat{\alpha }}_1, {\hat{\alpha }}^*_1\}>0\). Let \(\tilde{\varvec{\alpha }}=({{\tilde{\alpha }}}_1,{{\tilde{\alpha }}}_2,\ldots , {{\tilde{\alpha }}}_n)'\) and \(\tilde{\varvec{\alpha }}^*=({{\tilde{\alpha }}}^*_1,{{\tilde{\alpha }}}^*_2,\ldots , {{\tilde{\alpha }}}^*_n)'\) with \(0\le {\tilde{\alpha }}_1= {\hat{\alpha }}_1-\beta <{\hat{\alpha }}_1\), \(0\le {\tilde{\alpha }}^*_1= {\hat{\alpha }}^*_1-\beta <{\hat{\alpha }}^*_1\), \({\tilde{\alpha }}_i= {\hat{\alpha }}_i\ge 0\) and \({\tilde{\alpha }}^*_i= {\hat{\alpha }}^*_i\ge 0\) for \(i=2,3,\ldots ,n\). Hence, \(\tilde{\varvec{\alpha }}\) and \(\tilde{\varvec{\alpha }}^*\) satisfy the nonnegativity constraints in Problem (5), and \({\tilde{\alpha }}_i-{\tilde{\alpha }}^*_i= {\hat{\alpha }}_i-{\hat{\alpha }}^*_i\) for \(i=1,2,\ldots ,n\).
With these notations and relations, we have
where the inequality follows because \({\tilde{\alpha }}_1<{\hat{\alpha }}_1\) and \({\tilde{\alpha }}^*_1<{\hat{\alpha }}^*_1\). However, inequality (25) is a contradiction to the assumption that \((\hat{\varvec{\alpha }},\hat{\varvec{\alpha }}^*)\) is the solution to Problem (5). Hence, we must have \({\hat{\alpha }}_1{\hat{\alpha }}^*_1=0\).
Kuhn–Tucker stationary point problem
Consider the nonlinear minimization problem with variable \(\mathbf{x}\) in the p-dimensional space:
where \(\theta (\mathbf{x})\) is a real-valued function, \(g(\mathbf{x})\) is a vector-valued function in m-dimensional space, and we assume that \(\theta (\mathbf{x})\) and \(g(\mathbf{x})\) are differentiable. At the minimum point \(\mathbf{x}\), there is a vector \({\mathbf {t}}\in {\mathbb {R}}^m\), such that the following conditions are satisfied
We notice that \(\nabla \theta (\mathbf{x})\) is a p-dimensional vector, and \(\nabla g(\mathbf{x})\) is an \(m\times p\) matrix. These conditions are called the Kuhn–Tucker stationary point problem (KTP) in Mangasarian (1994).
In this paper, the minimization problem is
Hence, the function \(W({\mathbf {u}}) \) is like \(\theta (\mathbf{x})\) in Eq. (26), and \(-{\mathbf {u}}\) is like \(g(\mathbf{x})\) in Eq. (26). As such, according to the KTP condition, there is a vector \({\mathbf {v}}\in {\mathbb {R}}^{2n}\) satisfying:
and \(-{\mathbf {u}}\le {\mathbf {0}}_{2n}\) (i.e., \({\mathbf {u}}\ge {\mathbf {0}}_{2n}\)), \({\mathbf {v}}\ge {\mathbf {0}}_{2n}\), \({\mathbf {v}}' (-{\mathbf {u}})=0\) (i.e., \({\mathbf {v}}' {\mathbf {u}}=0\)). These are the same as in Eq. (9).
Orthogonality condition for two nonnegative vectors
In this Appendix, we will show that two nonnegative vectors \({\mathbf {a}}\) and \({\mathbf {b}}\) are perpendicular, if and only if \({\mathbf {a}} = ({\mathbf {a}} - \gamma {\mathbf {b}})_+\) for any real \(\gamma >0\).
We first assume that two nonnegative real numbers a and b satisfy \(ab=0\). Since \(ab=0\), there is at least one of a and b being 0. If \(a=0\) and \(b\ge 0\), then for any real \(\gamma >0\), \(a-\gamma b\le 0\) so that \((a-\gamma b)_+=0=a\); if \(a>0\), we must have \(b=0\), then for any real \(\gamma >0\), \((a-\gamma b)_+ = (a)_+ = a\). In both cases, there is \(a=(a-\gamma b)_+\) for any real number \(\gamma >0\).
Conversely, assume that two nonnegative real numbers a and b can be written as \(a=(a-\gamma b)_+\) for any real number \(\gamma >0\). If a and b are both strictly positive, then \(a-\gamma b<a\) since \(\gamma >0\). Consequently, \((a-\gamma b)_+<a\), which contradicts to our assumption that \(a = (a-\gamma b)_+\). Thus at least one of a and b must be 0, i.e., \(ab=0\).
Now assume that vectors \({\mathbf {a}}\) and \({\mathbf {b}}\) are in \({\mathbb {R}}^p\) with each component nonnegative, and assume \({\mathbf {a}}\perp {\mathbf {b}}\), that is, \(\sum _{i=1}^p a_ib_i=0\). Since both of \(a_i\) and \(b_i\) are nonnegative, there must be \(a_ib_i=0\) for \(i=1,2,\ldots ,p\). By the last argument, this is equivalent to \(a_i = (a_i - \gamma b_i)_+\) for any \(\gamma >0\) and any \(i=1,2,\ldots ,p\). In vector form, we have \({\mathbf {a}} = ({\mathbf {a}} - \gamma {\mathbf {b}})_+\).
Some technical lemmas
This Appendix presents some Lemmas that will be used in Sects. 3.2 and 3.3.
Lemma 1
Let \({\mathbf {a}}\) and \({\mathbf {b}}\) be two vectors in \({\mathbb {R}}^p\), then
Proof
For two real numbers a and b, there are four situations
-
1.
\(a\ge 0\) and \(b\ge 0\), then \(|a_+ - b_+|=|a-b|\);
-
2.
\(a\ge 0\) and \(b\le 0\), then \(|a_+ - b_+|=|a-0|\le |a-b|\);
-
3.
\(a\le 0\) and \(b\ge 0\), then \(|a_+ - b_+|=|0-b|\le |a-b|\);
-
4.
\(a\le 0\) and \(b\le 0\), then \(|a_+ - b_+|=|0-0|\le |a-b|\).
In summary, for one dimensional case, there is \(|a_+ - b_+|^2\le |a-b|^2\).
Assume that Eq. (27) is true for p-dimensional vectors \({\mathbf {a}}_p\) and \({\mathbf {b}}_p\). Denote the \((p+1)\)-dimensional vectors \({\mathbf {a}}\) and \({\mathbf {b}}\) asFootnote 4\({\mathbf {a}} = ({\mathbf {a}}_p, a_{p+1})\) and \({\mathbf {b}} = ({\mathbf {b}}_p, b_{p+1})\), where \(a_{p+1}\) and \(b_{p+1}\) are real numbers. Then,
where in Eq. (28), we used the assumption on the p-dimensional vectors, the special result for one dimensional case, and the definition of Euclidean norm.
By induction, Eq. (27) is proved. \(\square \)
Lemma 2
(Weyl’s inequality (Bhatia 1997, chap. 3)) Let \({\mathbf {A}}\) and \({\mathbf {B}}\) be \(m\times m\) Hermitian matrices. Let \(\lambda _1({\mathbf {A}})\ge \lambda _2({\mathbf {A}})\ge \cdots \ge \lambda _m({\mathbf {A}})\), \(\lambda _1({\mathbf {B}})\ge \lambda _2({\mathbf {B}})\ge \cdots \ge \lambda _m({\mathbf {B}})\), and \(\lambda _1({\mathbf {A}}+{\mathbf {B}})\ge \lambda _2({\mathbf {A}}+{\mathbf {B}})\ge \cdots \ge \lambda _m({\mathbf {A}}+{\mathbf {B}})\) be the eigenvalues of \({\mathbf {A}}\), \({\mathbf {B}}\), and \({\mathbf {A}}+{\mathbf {B}}\), respectively. For any \(i=1,2,\ldots , m\), there is
In particular, there are
and
Lemma 3
Let \({\mathbf {C}}\) be an \(m\times m\) semi-positive definite matrix, \({\mathbf {I}}\) be the \(m\times m\) identity matrix, a, b, c, and d be real numbers with \(a\ne 0\) and \(d\ne 0\). There are
and
Proof
Direct calculating gives us
Using the previous result, we have
Finally,
\(\square \)
Lemma 4
From (Press et al. 2007, Sect. 2.7) Suppose that an \(N\times N\) matrix \({\mathbf {A}}\) is partitioned into
where \({\mathbf {A}}_{11}\) and \({\mathbf {A}}_{22}\) are square matrices of size \(p\times p\) and \( s\times s\), respectively (\(p +s = N\)). If the inverse of \({\mathbf {A}}\) is partitioned in the same manner,
then \( {\overline{ {\mathbf {A}}}}_{11}\), \({\overline{ {\mathbf {A}}}}_{12}\), \({\overline{ {\mathbf {A}}}}_{21}\), \({\overline{ {\mathbf {A}}}}_{22}\), which have the same sizes as \({\mathbf {A}}_{11}\), \({\mathbf {A}}_{12}\), \({\mathbf {A}}_{21}\), \({\mathbf {A}}_{22}\), respectively, can be found by
and
Proof of Theorem 2
To calculate \({\overline{{\mathbf {P}}}}\), using Lemmas 4 and 3, we consider
Thus,
By Lemma 3, there are
and
Finally,
Rights and permissions
About this article
Cite this article
Zheng, S. KLERC: kernel Lagrangian expectile regression calculator. Comput Stat 36, 283–311 (2021). https://doi.org/10.1007/s00180-020-01003-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-020-01003-0