KLERC: kernel Lagrangian expectile regression calculator

Zheng, Songfeng

doi:10.1007/s00180-020-01003-0

KLERC: kernel Lagrangian expectile regression calculator

Original paper
Published: 25 June 2020

Volume 36, pages 283–311, (2021)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Songfeng Zheng ORCID: orcid.org/0000-0003-0546-8529¹

210 Accesses
Explore all metrics

Abstract

As a generalization to the ordinary least square regression, expectile regression, which can predict conditional expectiles, is fitted by minimizing an asymmetric square loss function on the training data. In literature, the idea of support vector machine was introduced to expectile regression to increase the flexibility of the model, resulting in support vector expectile regression (SVER). This paper reformulates the Lagrangian function of SVER as a differentiable convex function over the nonnegative orthant, which can be minimized by a simple iterative algorithm. The proposed algorithm is easy to implement, without requiring any particular optimization toolbox besides basic matrix operations. Theoretical and experimental analysis show that the algorithm converges r-linearly to the unique minimum point. The proposed method was compared to alternative algorithms on simulated data and real-world data, and we observe that the proposed method is much more computationally efficient while yielding similar prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning rates for kernel-based expectile regression

Article 20 September 2018

A working likelihood approach to support vector regression with a data-driven insensitivity parameter

Article Open access 10 October 2022

Penalized expectile regression: an alternative to penalized quantile regression

Article 19 February 2018

Notes

The model in problem (4) is essentially the same as the model studied in Farooq and Steinwart (2017), but Farooq and Steinwart (2017) did not realize the redundancy of the non-negativity constraints on $\varvec{\xi }$ and $\varvec{\xi }^*$.
Please see Appendix C for details.
The source code for KLERC is available upon request.
For notational convenience, in this proof, we assume all the vectors are row vectors. Clearly, the result also applies to column vectors.

References

Aganagić M (1984) Newton’s method for linear complementarity problems. Math Program 28(3):349–362
Article MathSciNet Google Scholar
Armand P, Gilbert JC, Jan-Jégou S (2000) A feasible BFGS interior point algorithm for solving convex minimization problems. SIAM J Optim 11(1):199–222
Article MathSciNet Google Scholar
Bhatia R (1997) Matrix analysis. Springer, New York
Book Google Scholar
Choi K-L, Shim J, Seok K (2014) Support vector expectile regression using IRWLS procedure. J Korean Data Inf Sci Soc 25(4):931–939
Google Scholar
Cottle RW (1983) On the uniqueness of solutions to linear complementarity problems. Math Program 27(2):191–213
Article MathSciNet Google Scholar
Cottle RW, Pang J-S, Stone RE (1992) The linear complementarity problem. SIAM, Philadelphia
MATH Google Scholar
Croissant Y, Graves S (2016) Ecdat: data sets for econometrics. R package version 0.3-1. https://CRAN.R-project.org/package=Ecdat
Efron B (1991) Regression percentiles using asymmetric squared error loss. Stat Sin 55:93–125
MathSciNet MATH Google Scholar
Farooq M, Steinwart I (2017) An SVM-like approach for expectile regression. Comput Stat Data Anal 109:159–181
Article MathSciNet Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Article MathSciNet Google Scholar
Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58(1):30–37
Article MathSciNet Google Scholar
Koenker R (2005) Quantile regression. Cambridge University Press, New York
Book Google Scholar
Kremers H, Talman D (1994) A new pivoting algorithm for the linear complementarity problem allowing for an arbitrary starting point. Math Program 63(1):235–252
Article MathSciNet Google Scholar
Mangasarian OL (1994) Nonlinear programming. SIAM, Philadelphia
Book Google Scholar
Mangasarian OL, Musicant DR (1999) Successive overrelaxation for support vector machines. IEEE Trans Neural Netw 10:1032–1037
Article Google Scholar
Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. J Mach Learn Res 1:161–177
MathSciNet MATH Google Scholar
Musicant DR, Feinberg A (2004) Active set support vector regression. IEEE Trans Neural Netw 15(2):268–275
Article Google Scholar
Newey W, Powell JL (1987) Asymmetric least squares estimation and testing. Econometrica 55(4):819–847
Article MathSciNet Google Scholar
Osuna E, Freund R, Girosi F (1997a) An improved training algorithm for support vector machines. In: Proceedings of the IEEE workshop neural networks for signal processing, pp 276–285
Osuna E, Freund R, Girosi F (1997b) Training support vector machines: an application to face detection. In: Proceedings of the IEEE conferences on computer vision and pattern recognition
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Schöelkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge
Google Scholar
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge
MATH Google Scholar
Schnabel SK, Eilers PHC (2009) Optimal expectile smoothing. Comput Stat Data Anal 53(12):4168–4177
Article MathSciNet Google Scholar
Sobotka F, Kneib T (2012) Geoadditive expectile regression. Comput Stat Data Anal 56(4):755–767
Article MathSciNet Google Scholar
Sobotka F, Schnabel S, Schulze WL (2014) expectreg: Expectile and quantile regression. R package version 0.39. https://CRAN.R-project.org/package=expectreg
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Waldmann E, Sobotka F, Kneib T (2017) Bayesian regularisation in geoadditive expectile regression. Stat Comput 27(6):1539–1553
Article MathSciNet Google Scholar
Weingessel A (2013) quadprog: Functions to solve quadratic programming problems. R package version 1.5-5. https://CRAN.R-project.org/package=quadprog
Yang Y, Zou H (2015) Nonparametric multiple expectile regression via ER-boost. J Stat Comput Simul 85(7):1442–1458
Article MathSciNet Google Scholar
Yang Y, Zhang T, Zou H (2015) KERE: expectile regression in reproducing kernel hilbert space. R package version 1.0.0. https://CRAN.R-project.org/package=KERE
Yang Y, Zhang T, Zou H (2018) Flexible expectile regression in reproducing kernel Hilbert space. Technometrics 60(1):26–35
Article MathSciNet Google Scholar

Download references

Acknowledgements

The author would like to thank the editor and anonymous reviewers for their constructive suggestions which greatly help improve the paper. This work was supported by a Faculty Research Grant (F07336-162001-022) and a Summer Faculty Fellowship from Missouri State University.

Author information

Authors and Affiliations

Department of Mathematics, Missouri State University, Springfield, MO, 65897, USA
Songfeng Zheng

Authors

Songfeng Zheng
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Songfeng Zheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 125 KB)

Appendices

Proof of Proposition 1

Assume $\hat{\varvec{\xi }} = (\hat{\xi }_1, \hat{\xi }_2,\ldots , \hat{\xi }_n)'$ and without loss of generality, assume $\hat{\xi }_1<0$. Let $\tilde{\varvec{\xi }} = (0, \hat{\xi }_2,\ldots , \hat{\xi }_n)'$, that is, we replace the first component of $\hat{\varvec{\xi }}$ (which is negative) by 0 and keep others unchanged. Because $\hat{\xi }_1$ satisfies the constraint $y_1 - {\hat{\mathbf{w}}}'\phi (\mathbf{x}_1) - {\hat{b}} \le \hat{\xi }_1$, we must have $y_1 - {\hat{\mathbf{w}}}'\phi (\mathbf{x}_1) - {\hat{b}}\le 0$ since $\hat{\xi }_1<0$. By assumption, the constraints are satisfied at $\hat{\xi }_i$ for $i=2,\ldots ,n$. Hence, the constraints are satisfied at all components of $\tilde{\varvec{\xi }}$.

However, there is

$$\begin{aligned} \varPhi (\hat{\mathbf{w}}, \hat{b}, \tilde{\varvec{\xi }}, \hat{\varvec{\xi }^*})&= \frac{1}{2}\Vert \hat{\mathbf{w}}\Vert ^2 + \frac{1}{2}\hat{b}^2+ \frac{C}{2}\left[ \sum _{i=1}^n \rho {\tilde{\xi }}_i^2 + \sum _{i=1}^n(1-\rho ) (\hat{\xi }^*_i)^2 \right] \nonumber \\&= \frac{1}{2}\Vert \hat{\mathbf{w}}\Vert ^2 + \frac{1}{2}\hat{b}^2+ \frac{C}{2}\left[ \sum _{i=2}^n \rho \hat{\xi }_i^2 + \sum _{i=1}^n(1-\rho ) (\hat{\xi }^*_i)^2 \right] \nonumber \\&< \frac{1}{2}\Vert \hat{\mathbf{w}}\Vert ^2 + \frac{1}{2}\hat{b}^2+ \frac{C}{2}\left[ \sum _{i=1}^n \rho \hat{\xi }_i^2 + \sum _{i=1}^n(1-\rho ) (\hat{\xi }^*_i)^2 \right] \nonumber \\&= \varPhi (\hat{\mathbf{w}}, \hat{b}, \hat{\varvec{\xi }}, \hat{\varvec{\xi }^*}), \end{aligned}$$

(24)

since $\hat{\xi }_1<0$ by assumption. Inequality (24) contradicts the assumption that $(\hat{\mathbf{w}}, \hat{b}, \hat{\varvec{\xi }}, \hat{\varvec{\xi }^*})$ is the minimum point. Thus, at the minimum point, there must be $\hat{\xi }_1\ge 0$. In the same manner, it can be argued that all components of $\hat{\varvec{\xi }}$ and $\hat{\varvec{\xi }^*}$ should be nonnegative.

Proof of Proposition 2

We will prove that ${\hat{\alpha }}_1{\hat{\alpha }}^*_1=0$, and others can be argued similarly. Assume ${\hat{\alpha }}_1{\hat{\alpha }}^*_1>0$, then ${\hat{\alpha }}_1>0$ and ${\hat{\alpha }}^*_1>0$. Let $\beta = \min \{{\hat{\alpha }}_1, {\hat{\alpha }}^*_1\}>0$. Let $\tilde{\varvec{\alpha }}=({{\tilde{\alpha }}}_1,{{\tilde{\alpha }}}_2,\ldots , {{\tilde{\alpha }}}_n)'$ and $\tilde{\varvec{\alpha }}^*=({{\tilde{\alpha }}}^*_1,{{\tilde{\alpha }}}^*_2,\ldots , {{\tilde{\alpha }}}^*_n)'$ with $0\le {\tilde{\alpha }}_1= {\hat{\alpha }}_1-\beta <{\hat{\alpha }}_1$, $0\le {\tilde{\alpha }}^*_1= {\hat{\alpha }}^*_1-\beta <{\hat{\alpha }}^*_1$, ${\tilde{\alpha }}_i= {\hat{\alpha }}_i\ge 0$ and ${\tilde{\alpha }}^*_i= {\hat{\alpha }}^*_i\ge 0$ for $i=2,3,\ldots ,n$. Hence, $\tilde{\varvec{\alpha }}$ and $\tilde{\varvec{\alpha }}^*$ satisfy the nonnegativity constraints in Problem (5), and ${\tilde{\alpha }}_i-{\tilde{\alpha }}^*_i= {\hat{\alpha }}_i-{\hat{\alpha }}^*_i$ for $i=1,2,\ldots ,n$.

With these notations and relations, we have

$$\begin{aligned} W(\tilde{\varvec{\alpha }}, \tilde{\varvec{\alpha }}^*) =&\frac{1}{2}\sum _{i=1}^n\sum _{j=1}^n({{\tilde{\alpha }}}_i- {{\tilde{\alpha }}}_i^*)(\phi (\mathbf{x}_i)'\phi (\mathbf{x}_j)+1)({{\tilde{\alpha }}}_j- {{\tilde{\alpha }}}_j^*) \nonumber \\&+\frac{1}{2C\rho }\sum _{i=1}^n {{\tilde{\alpha }}}_i^2+\frac{1}{2C(1-\rho )}\sum _{i=1}^n({{\tilde{\alpha }}}_i^*)^2 - \sum _{i=1}^n ({{\tilde{\alpha }}}_i- {{\tilde{\alpha }}}_i^*)y_i \nonumber \\ =&\frac{1}{2}\sum _{i=1}^n\sum _{j=1}^n({\hat{\alpha }}_i- {\hat{\alpha }}_i^*)(\phi (\mathbf{x}_i)'\phi (\mathbf{x}_j)+1)({\hat{\alpha }}_j- {\hat{\alpha }}_j^*) +\frac{1}{2C\rho } {{\tilde{\alpha }}}_1^2+\frac{1}{2C\rho }\sum _{i=2}^n {\hat{\alpha }}_i^2\nonumber \\&\qquad +\frac{1}{2C(1-\rho )}({{\tilde{\alpha }}}_1^*)^2 +\frac{1}{2C(1-\rho )}\sum _{i=2}^n({\hat{\alpha }}_i^*)^2 - \sum _{i=1}^n ({\hat{\alpha }}_i- {\hat{\alpha }}_i^*)y_i \nonumber \\&< \frac{1}{2}\sum _{i=1}^n\sum _{j=1}^n({\hat{\alpha }}_i- {\hat{\alpha }}_i^*)(\phi (\mathbf{x}_i)'\phi (\mathbf{x}_j)+1)({\hat{\alpha }}_j- {\hat{\alpha }}_j^*) \nonumber \\&\qquad +\frac{1}{2C\rho }\sum _{i=1}^n {\hat{\alpha }}_i^2+\frac{1}{2C(1-\rho )}\sum _{i=1}^n({\hat{\alpha }}_i^*)^2 - \sum _{i=1}^n ({\hat{\alpha }}_i- {\hat{\alpha }}_i^*)y_i \nonumber \\ =&W(\hat{\varvec{\alpha }}, \hat{\varvec{\alpha }}^*), \end{aligned}$$

(25)

where the inequality follows because ${\tilde{\alpha }}_1<{\hat{\alpha }}_1$ and ${\tilde{\alpha }}^*_1<{\hat{\alpha }}^*_1$. However, inequality (25) is a contradiction to the assumption that $(\hat{\varvec{\alpha }},\hat{\varvec{\alpha }}^*)$ is the solution to Problem (5). Hence, we must have ${\hat{\alpha }}_1{\hat{\alpha }}^*_1=0$.

Kuhn–Tucker stationary point problem

Consider the nonlinear minimization problem with variable $\mathbf{x}$ in the p-dimensional space:

$$\begin{aligned} \min _{\mathbf{x}\in {\mathbb {R}}^p} \theta (\mathbf{x}) \quad \text {s. t.} \quad g(\mathbf{x})\le {\mathbf {0}}_m, \end{aligned}$$

(26)

where $\theta (\mathbf{x})$ is a real-valued function, $g(\mathbf{x})$ is a vector-valued function in m-dimensional space, and we assume that $\theta (\mathbf{x})$ and $g(\mathbf{x})$ are differentiable. At the minimum point $\mathbf{x}$, there is a vector ${\mathbf {t}}\in {\mathbb {R}}^m$, such that the following conditions are satisfied

$$\begin{aligned} \nabla \theta (\mathbf{x}) +\left( \nabla g(\mathbf{x})\right) '{\mathbf {t}}={\mathbf {0}}_p, \quad g(\mathbf{x})\le {\mathbf {0}}_m,\quad {\mathbf {t}}' g(\mathbf{x})=0,\quad \text {and} \quad {\mathbf {t}}\ge {\mathbf {0}}_m. \end{aligned}$$

We notice that $\nabla \theta (\mathbf{x})$ is a p-dimensional vector, and $\nabla g(\mathbf{x})$ is an $m\times p$ matrix. These conditions are called the Kuhn–Tucker stationary point problem (KTP) in Mangasarian (1994).

In this paper, the minimization problem is

$$\begin{aligned} \min _{{\mathbf {u}}\ge {\mathbf {0}}_{2n}}\;\; W({\mathbf {u}}) = \frac{1}{2} {\mathbf {u}}'{\mathbf {Q}}{\mathbf {u}} - {\mathbf {r}}'{\mathbf {u}}. \end{aligned}$$

Hence, the function $W({\mathbf {u}}) $ is like $\theta (\mathbf{x})$ in Eq. (26), and $-{\mathbf {u}}$ is like $g(\mathbf{x})$ in Eq. (26). As such, according to the KTP condition, there is a vector ${\mathbf {v}}\in {\mathbb {R}}^{2n}$ satisfying:

$$\begin{aligned} \nabla W({\mathbf {u}})+\left( \nabla (-{\mathbf {u}})\right) '{\mathbf {v}} ={\mathbf {Q}}{\mathbf {u}}-{\mathbf {r}}+\left( -{\mathbf {I}}_{2n}\right) '{\mathbf {v}}={\mathbf {Q}}{\mathbf {u}}-{\mathbf {r}}-{\mathbf {v}}= {\mathbf {0}}_{2n}, \end{aligned}$$

and $-{\mathbf {u}}\le {\mathbf {0}}_{2n}$ (i.e., ${\mathbf {u}}\ge {\mathbf {0}}_{2n}$), ${\mathbf {v}}\ge {\mathbf {0}}_{2n}$, ${\mathbf {v}}' (-{\mathbf {u}})=0$ (i.e., ${\mathbf {v}}' {\mathbf {u}}=0$). These are the same as in Eq. (9).

Orthogonality condition for two nonnegative vectors

In this Appendix, we will show that two nonnegative vectors ${\mathbf {a}}$ and ${\mathbf {b}}$ are perpendicular, if and only if ${\mathbf {a}} = ({\mathbf {a}} - \gamma {\mathbf {b}})_+$ for any real $\gamma >0$.

We first assume that two nonnegative real numbers a and b satisfy $ab=0$. Since $ab=0$, there is at least one of a and b being 0. If $a=0$ and $b\ge 0$, then for any real $\gamma >0$, $a-\gamma b\le 0$ so that $(a-\gamma b)_+=0=a$; if $a>0$, we must have $b=0$, then for any real $\gamma >0$, $(a-\gamma b)_+ = (a)_+ = a$. In both cases, there is $a=(a-\gamma b)_+$ for any real number $\gamma >0$.

Conversely, assume that two nonnegative real numbers a and b can be written as $a=(a-\gamma b)_+$ for any real number $\gamma >0$. If a and b are both strictly positive, then $a-\gamma b<a$ since $\gamma >0$. Consequently, $(a-\gamma b)_+<a$, which contradicts to our assumption that $a = (a-\gamma b)_+$. Thus at least one of a and b must be 0, i.e., $ab=0$.

Now assume that vectors ${\mathbf {a}}$ and ${\mathbf {b}}$ are in ${\mathbb {R}}^p$ with each component nonnegative, and assume ${\mathbf {a}}\perp {\mathbf {b}}$, that is, $\sum _{i=1}^p a_ib_i=0$. Since both of $a_i$ and $b_i$ are nonnegative, there must be $a_ib_i=0$ for $i=1,2,\ldots ,p$. By the last argument, this is equivalent to $a_i = (a_i - \gamma b_i)_+$ for any $\gamma >0$ and any $i=1,2,\ldots ,p$. In vector form, we have ${\mathbf {a}} = ({\mathbf {a}} - \gamma {\mathbf {b}})_+$.

Some technical lemmas

This Appendix presents some Lemmas that will be used in Sects. 3.2 and 3.3.

Lemma 1

Let ${\mathbf {a}}$ and ${\mathbf {b}}$ be two vectors in ${\mathbb {R}}^p$, then

$$\begin{aligned} \Vert {\mathbf {a}}_+-{\mathbf {b}}_+\Vert \le \Vert {\mathbf {a}}-{\mathbf {b}}\Vert . \end{aligned}$$

(27)

Proof

For two real numbers a and b, there are four situations

1.
$a\ge 0$ and $b\ge 0$, then $|a_+ - b_+|=|a-b|$;
2.
$a\ge 0$ and $b\le 0$, then $|a_+ - b_+|=|a-0|\le |a-b|$;
3.
$a\le 0$ and $b\ge 0$, then $|a_+ - b_+|=|0-b|\le |a-b|$;
4.
$a\le 0$ and $b\le 0$, then $|a_+ - b_+|=|0-0|\le |a-b|$.

In summary, for one dimensional case, there is $|a_+ - b_+|^2\le |a-b|^2$.

Assume that Eq. (27) is true for p-dimensional vectors ${\mathbf {a}}_p$ and ${\mathbf {b}}_p$. Denote the $(p+1)$-dimensional vectors ${\mathbf {a}}$ and ${\mathbf {b}}$ as^{Footnote 4}${\mathbf {a}} = ({\mathbf {a}}_p, a_{p+1})$ and ${\mathbf {b}} = ({\mathbf {b}}_p, b_{p+1})$, where $a_{p+1}$ and $b_{p+1}$ are real numbers. Then,

$$\begin{aligned} \Vert {\mathbf {a}}_+-{\mathbf {b}}_+\Vert ^2&= \Vert (({\mathbf {a}}_p)_+-({\mathbf {b}}_p)_+,(a_{p+1})_+-(b_{p+1})_+)\Vert ^2 \nonumber \\&= \Vert ({\mathbf {a}}_p)_+-({\mathbf {b}}_p)_+\Vert ^2 + ((a_{p+1})_+-(b_{p+1})_+)^2 \nonumber \\&\le \Vert {\mathbf {a}}_p-{\mathbf {b}}_p\Vert ^2 + (a_{p+1}-b_{p+1})^2 = \Vert {\mathbf {a}}-{\mathbf {b}}\Vert ^2, \end{aligned}$$

(28)

where in Eq. (28), we used the assumption on the p-dimensional vectors, the special result for one dimensional case, and the definition of Euclidean norm.

By induction, Eq. (27) is proved. $\square $

Lemma 2

(Weyl’s inequality (Bhatia 1997, chap. 3)) Let ${\mathbf {A}}$ and ${\mathbf {B}}$ be $m\times m$ Hermitian matrices. Let $\lambda _1({\mathbf {A}})\ge \lambda _2({\mathbf {A}})\ge \cdots \ge \lambda _m({\mathbf {A}})$, $\lambda _1({\mathbf {B}})\ge \lambda _2({\mathbf {B}})\ge \cdots \ge \lambda _m({\mathbf {B}})$, and $\lambda _1({\mathbf {A}}+{\mathbf {B}})\ge \lambda _2({\mathbf {A}}+{\mathbf {B}})\ge \cdots \ge \lambda _m({\mathbf {A}}+{\mathbf {B}})$ be the eigenvalues of ${\mathbf {A}}$, ${\mathbf {B}}$, and ${\mathbf {A}}+{\mathbf {B}}$, respectively. For any $i=1,2,\ldots , m$, there is

$$\begin{aligned} \lambda _m({\mathbf {A}}) + \lambda _i({\mathbf {B}}) \le \lambda _{i}({\mathbf {A}}+{\mathbf {B}}) \le \lambda _1({\mathbf {A}}) + \lambda _i({\mathbf {B}}). \end{aligned}$$

In particular, there are

$$\begin{aligned} \lambda _{m}({\mathbf {A}}+{\mathbf {B}})\ge \lambda _m({\mathbf {A}}) + \lambda _m({\mathbf {B}}) \end{aligned}$$

(29)

and

$$\begin{aligned} \lambda _{1}({\mathbf {A}}+{\mathbf {B}})\le \lambda _1({\mathbf {A}}) + \lambda _1({\mathbf {B}}). \end{aligned}$$

(30)

Lemma 3

Let ${\mathbf {C}}$ be an $m\times m$ semi-positive definite matrix, ${\mathbf {I}}$ be the $m\times m$ identity matrix, a, b, c, and d be real numbers with $a\ne 0$ and $d\ne 0$. There are

$$\begin{aligned}&(a{\mathbf {I}} + b{\mathbf {C}})^{-1}{\mathbf {C}} = {\mathbf {C}}(a{\mathbf {I}} + b{\mathbf {C}})^{-1}, \\&(a{\mathbf {I}} + b{\mathbf {C}})^{-1}(c{\mathbf {C}}+d{\mathbf {I}}) = (c{\mathbf {C}}+d{\mathbf {I}})(a{\mathbf {I}} + b{\mathbf {C}})^{-1}, \end{aligned}$$

and

$$\begin{aligned} (a{\mathbf {I}} + b{\mathbf {C}})^{-1}(c{\mathbf {C}}+d{\mathbf {I}})^{-1} = (c{\mathbf {C}}+d{\mathbf {I}})^{-1}(a{\mathbf {I}} + b{\mathbf {C}})^{-1}. \end{aligned}$$

Proof

Direct calculating gives us

$$\begin{aligned} (a{\mathbf {I}} + b{\mathbf {C}})^{-1}{\mathbf {C}}&= (a{\mathbf {I}} + b{\mathbf {C}})^{-1}{\mathbf {C}}(a{\mathbf {I}} + b{\mathbf {C}})(a{\mathbf {I}} + b{\mathbf {C}})^{-1} \\&= (a{\mathbf {I}} + b{\mathbf {C}})^{-1}(a{\mathbf {I}} + b{\mathbf {C}}){\mathbf {C}}(a{\mathbf {I}} + b{\mathbf {C}})^{-1}= {\mathbf {C}}(a{\mathbf {I}} + b{\mathbf {C}})^{-1}. \end{aligned}$$

Using the previous result, we have

$$\begin{aligned} (a{\mathbf {I}} + b{\mathbf {C}})^{-1}(c{\mathbf {C}}+d{\mathbf {I}})&= (a{\mathbf {I}} + b{\mathbf {C}})^{-1}c{\mathbf {C}} + (a{\mathbf {I}} + b{\mathbf {C}})^{-1} d{\mathbf {I}} \\&= c{\mathbf {C}} (a{\mathbf {I}} + b{\mathbf {C}})^{-1} + d{\mathbf {I}} (a{\mathbf {I}} + b{\mathbf {C}})^{-1} \\&= (c{\mathbf {C}}+d{\mathbf {I}})(a{\mathbf {I}} + b{\mathbf {C}})^{-1}. \end{aligned}$$

Finally,

$$\begin{aligned} (a{\mathbf {I}} + b{\mathbf {C}})^{-1}(c{\mathbf {C}}+d{\mathbf {I}})^{-1}&= \left[ (c{\mathbf {C}}+d{\mathbf {I}})(a{\mathbf {I}} + b{\mathbf {C}})\right] ^{-1} \\&= \left[ (a{\mathbf {I}} + b{\mathbf {C}})(c{\mathbf {C}}+d{\mathbf {I}})\right] ^{-1} \\&= (c{\mathbf {C}}+d{\mathbf {I}})^{-1}(a{\mathbf {I}} + b{\mathbf {C}})^{-1}. \end{aligned}$$

$\square $

Lemma 4

From (Press et al. 2007, Sect. 2.7) Suppose that an $N\times N$ matrix ${\mathbf {A}}$ is partitioned into

$$\begin{aligned} {\mathbf {A}} = \begin{bmatrix} {\mathbf {A}}_{11} &{}\;\;{\mathbf {A}}_{12} \\ {\mathbf {A}}_{21} &{}\;\; {\mathbf {A}}_{22} \end{bmatrix}, \end{aligned}$$

where ${\mathbf {A}}_{11}$ and ${\mathbf {A}}_{22}$ are square matrices of size $p\times p$ and $ s\times s$, respectively ($p +s = N$). If the inverse of ${\mathbf {A}}$ is partitioned in the same manner,

$$\begin{aligned} {\mathbf {A}}^{-1} = \begin{bmatrix} {\overline{{\mathbf {A}}}}_{11} &{}\;\;{\overline{ {\mathbf {A}}}}_{12} \\ {\overline{{\mathbf {A}}}}_{21} &{}\;\;{\overline{ {\mathbf {A}}}}_{22} \end{bmatrix}, \end{aligned}$$

then $ {\overline{ {\mathbf {A}}}}_{11}$, ${\overline{ {\mathbf {A}}}}_{12}$, ${\overline{ {\mathbf {A}}}}_{21}$, ${\overline{ {\mathbf {A}}}}_{22}$, which have the same sizes as ${\mathbf {A}}_{11}$, ${\mathbf {A}}_{12}$, ${\mathbf {A}}_{21}$, ${\mathbf {A}}_{22}$, respectively, can be found by

$$\begin{aligned}&{\overline{ {\mathbf {A}}}}_{11} = ({\mathbf {A}}_{11} -{\mathbf {A}}_{12}{\mathbf {A}}_{22}^{-1}{\mathbf {A}}_{21} )^{-1}, \\&{\overline{ {\mathbf {A}}}}_{12} = -{\overline{ {\mathbf {A}}}}_{11}{\mathbf {A}}_{12}{\mathbf {A}}_{22}^{-1}, \\&{\overline{ {\mathbf {A}}}}_{21} = -{{\mathbf {A}}^{-1}_{22}}{\mathbf {A}}_{21}{\overline{ {\mathbf {A}}}}_{11}, \end{aligned}$$

and

$$\begin{aligned} {\overline{{\mathbf {A}}}}_{22}= ({\mathbf {I}}-{\overline{{\mathbf {A}}}}_{21} {\mathbf {A}}_{12}){\mathbf {A}}_{22}^{-1}. \end{aligned}$$

Proof of Theorem 2

To calculate ${\overline{{\mathbf {P}}}}$, using Lemmas 4 and 3, we consider

$$\begin{aligned} {\overline{{\mathbf {P}}}}^{-1}&=\frac{1}{C\rho }{\mathbf {I}}_n+{\mathbf {T}}-(-{\mathbf {T}})\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n +{\mathbf {T}}\right) ^{-1}(-{\mathbf {T}}) \\&=\frac{1}{C \rho }{\mathbf {I}}_n+{\mathbf {T}}-\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}{\mathbf {T}}^2 \\&=\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left[ \left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+ {\mathbf {T}}\right) \left( \frac{1}{C\rho }{\mathbf {I}}_n+{\mathbf {T}}\right) -{\mathbf {T}}^2\right] \\&=\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left[ \frac{1}{C^2\rho (1-\rho )}{\mathbf {I}}_n +\frac{1}{C\rho (1-\rho )}{\mathbf {T}}\right] \\&=\frac{1}{C\rho (1-\rho )}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) . \end{aligned}$$

Thus,

$$\begin{aligned} {\overline{{\mathbf {P}}}}=C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n +{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) . \end{aligned}$$

By Lemma 3, there are

$$\begin{aligned} {\overline{{\mathbf {Q}}}}&= - {\overline{{\mathbf {P}}}}(-{\mathbf {T}})\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1} \\&=C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n +{\mathbf {T}}\right) \left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}{\mathbf {T}} \\&=C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}{\mathbf {T}}, \end{aligned}$$

and

$$\begin{aligned} {\overline{{\mathbf {R}}}}&= -\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}(-{\mathbf {T}}) {\overline{{\mathbf {P}}}} \\&={\mathbf {T}}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1} C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) \\&={\mathbf {T}}C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1} \left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) \\&=C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}{\mathbf {T}}. \end{aligned}$$

Finally,

$$\begin{aligned} {\overline{{\mathbf {S}}}}&= \left( {\mathbf {I}}_n - {\overline{{\mathbf {R}}}}(-\mathbf {T)}\right) \left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1} \\&=\left( {\mathbf {I}}_n +C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}{\mathbf {T}}{\mathbf {T}}\right) \left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1} \\&=\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}} +C\rho (1-\rho ){\mathbf {T}}^2\right) \left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n +{\mathbf {T}}\right) ^{-1} \\&=C\rho (1-\rho )\left( \frac{1}{C\rho }{\mathbf {I}}_n+{\mathbf {T}}\right) \left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+ {\mathbf {T}}\right) \left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1} \\&=C\rho (1-\rho )\left( \frac{1}{C\rho }{\mathbf {I}}_n+{\mathbf {T}}\right) \left( \frac{1}{C}{\mathbf {I}}_n+ {\mathbf {T}}\right) ^{-1}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) \left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n +{\mathbf {T}}\right) ^{-1} \\&=C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C\rho }{\mathbf {I}}_n+{\mathbf {T}}\right) . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, S. KLERC: kernel Lagrangian expectile regression calculator. Comput Stat 36, 283–311 (2021). https://doi.org/10.1007/s00180-020-01003-0

Download citation

Received: 15 July 2018
Accepted: 16 June 2020
Published: 25 June 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s00180-020-01003-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

KLERC: kernel Lagrangian expectile regression calculator

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning rates for kernel-based expectile regression

A working likelihood approach to support vector regression with a data-driven insensitivity parameter

Penalized expectile regression: an alternative to penalized quantile regression

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 125 KB)

Appendices

Proof of Proposition 1

Proof of Proposition 2

Kuhn–Tucker stationary point problem

Orthogonality condition for two nonnegative vectors

Some technical lemmas

Lemma 1

Proof

Lemma 2

Lemma 3

Proof

Lemma 4

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now