Skip to main content
Log in

KLERC: kernel Lagrangian expectile regression calculator

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

As a generalization to the ordinary least square regression, expectile regression, which can predict conditional expectiles, is fitted by minimizing an asymmetric square loss function on the training data. In literature, the idea of support vector machine was introduced to expectile regression to increase the flexibility of the model, resulting in support vector expectile regression (SVER). This paper reformulates the Lagrangian function of SVER as a differentiable convex function over the nonnegative orthant, which can be minimized by a simple iterative algorithm. The proposed algorithm is easy to implement, without requiring any particular optimization toolbox besides basic matrix operations. Theoretical and experimental analysis show that the algorithm converges r-linearly to the unique minimum point. The proposed method was compared to alternative algorithms on simulated data and real-world data, and we observe that the proposed method is much more computationally efficient while yielding similar prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The model in problem (4) is essentially the same as the model studied in Farooq and Steinwart (2017), but Farooq and Steinwart (2017) did not realize the redundancy of the non-negativity constraints on \(\varvec{\xi }\) and \(\varvec{\xi }^*\).

  2. Please see Appendix C for details.

  3. The source code for KLERC is available upon request.

  4. For notational convenience, in this proof, we assume all the vectors are row vectors. Clearly, the result also applies to column vectors.

References

  • Aganagić M (1984) Newton’s method for linear complementarity problems. Math Program 28(3):349–362

    Article  MathSciNet  Google Scholar 

  • Armand P, Gilbert JC, Jan-Jégou S (2000) A feasible BFGS interior point algorithm for solving convex minimization problems. SIAM J Optim 11(1):199–222

    Article  MathSciNet  Google Scholar 

  • Bhatia R (1997) Matrix analysis. Springer, New York

    Book  Google Scholar 

  • Choi K-L, Shim J, Seok K (2014) Support vector expectile regression using IRWLS procedure. J Korean Data Inf Sci Soc 25(4):931–939

    Google Scholar 

  • Cottle RW (1983) On the uniqueness of solutions to linear complementarity problems. Math Program 27(2):191–213

    Article  MathSciNet  Google Scholar 

  • Cottle RW, Pang J-S, Stone RE (1992) The linear complementarity problem. SIAM, Philadelphia

    MATH  Google Scholar 

  • Croissant Y, Graves S (2016) Ecdat: data sets for econometrics. R package version 0.3-1. https://CRAN.R-project.org/package=Ecdat

  • Efron B (1991) Regression percentiles using asymmetric squared error loss. Stat Sin 55:93–125

    MathSciNet  MATH  Google Scholar 

  • Farooq M, Steinwart I (2017) An SVM-like approach for expectile regression. Comput Stat Data Anal 109:159–181

    Article  MathSciNet  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232

    Article  MathSciNet  Google Scholar 

  • Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58(1):30–37

    Article  MathSciNet  Google Scholar 

  • Koenker R (2005) Quantile regression. Cambridge University Press, New York

    Book  Google Scholar 

  • Kremers H, Talman D (1994) A new pivoting algorithm for the linear complementarity problem allowing for an arbitrary starting point. Math Program 63(1):235–252

    Article  MathSciNet  Google Scholar 

  • Mangasarian OL (1994) Nonlinear programming. SIAM, Philadelphia

    Book  Google Scholar 

  • Mangasarian OL, Musicant DR (1999) Successive overrelaxation for support vector machines. IEEE Trans Neural Netw 10:1032–1037

    Article  Google Scholar 

  • Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. J Mach Learn Res 1:161–177

    MathSciNet  MATH  Google Scholar 

  • Musicant DR, Feinberg A (2004) Active set support vector regression. IEEE Trans Neural Netw 15(2):268–275

    Article  Google Scholar 

  • Newey W, Powell JL (1987) Asymmetric least squares estimation and testing. Econometrica 55(4):819–847

    Article  MathSciNet  Google Scholar 

  • Osuna E, Freund R, Girosi F (1997a) An improved training algorithm for support vector machines. In: Proceedings of the IEEE workshop neural networks for signal processing, pp 276–285

  • Osuna E, Freund R, Girosi F (1997b) Training support vector machines: an application to face detection. In: Proceedings of the IEEE conferences on computer vision and pattern recognition

  • Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Schöelkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge

    Google Scholar 

  • Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Schnabel SK, Eilers PHC (2009) Optimal expectile smoothing. Comput Stat Data Anal 53(12):4168–4177

    Article  MathSciNet  Google Scholar 

  • Sobotka F, Kneib T (2012) Geoadditive expectile regression. Comput Stat Data Anal 56(4):755–767

    Article  MathSciNet  Google Scholar 

  • Sobotka F, Schnabel S, Schulze WL (2014) expectreg: Expectile and quantile regression. R package version 0.39. https://CRAN.R-project.org/package=expectreg

  • Vapnik V (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  • Waldmann E, Sobotka F, Kneib T (2017) Bayesian regularisation in geoadditive expectile regression. Stat Comput 27(6):1539–1553

    Article  MathSciNet  Google Scholar 

  • Weingessel A (2013) quadprog: Functions to solve quadratic programming problems. R package version 1.5-5. https://CRAN.R-project.org/package=quadprog

  • Yang Y, Zou H (2015) Nonparametric multiple expectile regression via ER-boost. J Stat Comput Simul 85(7):1442–1458

    Article  MathSciNet  Google Scholar 

  • Yang Y, Zhang T, Zou H (2015) KERE: expectile regression in reproducing kernel hilbert space. R package version 1.0.0. https://CRAN.R-project.org/package=KERE

  • Yang Y, Zhang T, Zou H (2018) Flexible expectile regression in reproducing kernel Hilbert space. Technometrics 60(1):26–35

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The author would like to thank the editor and anonymous reviewers for their constructive suggestions which greatly help improve the paper. This work was supported by a Faculty Research Grant (F07336-162001-022) and a Summer Faculty Fellowship from Missouri State University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songfeng Zheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 125 KB)

Appendices

Proof of Proposition 1

Assume \(\hat{\varvec{\xi }} = (\hat{\xi }_1, \hat{\xi }_2,\ldots , \hat{\xi }_n)'\) and without loss of generality, assume \(\hat{\xi }_1<0\). Let \(\tilde{\varvec{\xi }} = (0, \hat{\xi }_2,\ldots , \hat{\xi }_n)'\), that is, we replace the first component of \(\hat{\varvec{\xi }}\) (which is negative) by 0 and keep others unchanged. Because \(\hat{\xi }_1\) satisfies the constraint \(y_1 - {\hat{\mathbf{w}}}'\phi (\mathbf{x}_1) - {\hat{b}} \le \hat{\xi }_1\), we must have \(y_1 - {\hat{\mathbf{w}}}'\phi (\mathbf{x}_1) - {\hat{b}}\le 0\) since \(\hat{\xi }_1<0\). By assumption, the constraints are satisfied at \(\hat{\xi }_i\) for \(i=2,\ldots ,n\). Hence, the constraints are satisfied at all components of \(\tilde{\varvec{\xi }}\).

However, there is

$$\begin{aligned} \varPhi (\hat{\mathbf{w}}, \hat{b}, \tilde{\varvec{\xi }}, \hat{\varvec{\xi }^*})&= \frac{1}{2}\Vert \hat{\mathbf{w}}\Vert ^2 + \frac{1}{2}\hat{b}^2+ \frac{C}{2}\left[ \sum _{i=1}^n \rho {\tilde{\xi }}_i^2 + \sum _{i=1}^n(1-\rho ) (\hat{\xi }^*_i)^2 \right] \nonumber \\&= \frac{1}{2}\Vert \hat{\mathbf{w}}\Vert ^2 + \frac{1}{2}\hat{b}^2+ \frac{C}{2}\left[ \sum _{i=2}^n \rho \hat{\xi }_i^2 + \sum _{i=1}^n(1-\rho ) (\hat{\xi }^*_i)^2 \right] \nonumber \\&< \frac{1}{2}\Vert \hat{\mathbf{w}}\Vert ^2 + \frac{1}{2}\hat{b}^2+ \frac{C}{2}\left[ \sum _{i=1}^n \rho \hat{\xi }_i^2 + \sum _{i=1}^n(1-\rho ) (\hat{\xi }^*_i)^2 \right] \nonumber \\&= \varPhi (\hat{\mathbf{w}}, \hat{b}, \hat{\varvec{\xi }}, \hat{\varvec{\xi }^*}), \end{aligned}$$
(24)

since \(\hat{\xi }_1<0\) by assumption. Inequality (24) contradicts the assumption that \((\hat{\mathbf{w}}, \hat{b}, \hat{\varvec{\xi }}, \hat{\varvec{\xi }^*})\) is the minimum point. Thus, at the minimum point, there must be \(\hat{\xi }_1\ge 0\). In the same manner, it can be argued that all components of \(\hat{\varvec{\xi }}\) and \(\hat{\varvec{\xi }^*}\) should be nonnegative.

Proof of Proposition 2

We will prove that \({\hat{\alpha }}_1{\hat{\alpha }}^*_1=0\), and others can be argued similarly. Assume \({\hat{\alpha }}_1{\hat{\alpha }}^*_1>0\), then \({\hat{\alpha }}_1>0\) and \({\hat{\alpha }}^*_1>0\). Let \(\beta = \min \{{\hat{\alpha }}_1, {\hat{\alpha }}^*_1\}>0\). Let \(\tilde{\varvec{\alpha }}=({{\tilde{\alpha }}}_1,{{\tilde{\alpha }}}_2,\ldots , {{\tilde{\alpha }}}_n)'\) and \(\tilde{\varvec{\alpha }}^*=({{\tilde{\alpha }}}^*_1,{{\tilde{\alpha }}}^*_2,\ldots , {{\tilde{\alpha }}}^*_n)'\) with \(0\le {\tilde{\alpha }}_1= {\hat{\alpha }}_1-\beta <{\hat{\alpha }}_1\), \(0\le {\tilde{\alpha }}^*_1= {\hat{\alpha }}^*_1-\beta <{\hat{\alpha }}^*_1\), \({\tilde{\alpha }}_i= {\hat{\alpha }}_i\ge 0\) and \({\tilde{\alpha }}^*_i= {\hat{\alpha }}^*_i\ge 0\) for \(i=2,3,\ldots ,n\). Hence, \(\tilde{\varvec{\alpha }}\) and \(\tilde{\varvec{\alpha }}^*\) satisfy the nonnegativity constraints in Problem (5), and \({\tilde{\alpha }}_i-{\tilde{\alpha }}^*_i= {\hat{\alpha }}_i-{\hat{\alpha }}^*_i\) for \(i=1,2,\ldots ,n\).

With these notations and relations, we have

$$\begin{aligned} W(\tilde{\varvec{\alpha }}, \tilde{\varvec{\alpha }}^*) =&\frac{1}{2}\sum _{i=1}^n\sum _{j=1}^n({{\tilde{\alpha }}}_i- {{\tilde{\alpha }}}_i^*)(\phi (\mathbf{x}_i)'\phi (\mathbf{x}_j)+1)({{\tilde{\alpha }}}_j- {{\tilde{\alpha }}}_j^*) \nonumber \\&+\frac{1}{2C\rho }\sum _{i=1}^n {{\tilde{\alpha }}}_i^2+\frac{1}{2C(1-\rho )}\sum _{i=1}^n({{\tilde{\alpha }}}_i^*)^2 - \sum _{i=1}^n ({{\tilde{\alpha }}}_i- {{\tilde{\alpha }}}_i^*)y_i \nonumber \\ =&\frac{1}{2}\sum _{i=1}^n\sum _{j=1}^n({\hat{\alpha }}_i- {\hat{\alpha }}_i^*)(\phi (\mathbf{x}_i)'\phi (\mathbf{x}_j)+1)({\hat{\alpha }}_j- {\hat{\alpha }}_j^*) +\frac{1}{2C\rho } {{\tilde{\alpha }}}_1^2+\frac{1}{2C\rho }\sum _{i=2}^n {\hat{\alpha }}_i^2\nonumber \\&\qquad +\frac{1}{2C(1-\rho )}({{\tilde{\alpha }}}_1^*)^2 +\frac{1}{2C(1-\rho )}\sum _{i=2}^n({\hat{\alpha }}_i^*)^2 - \sum _{i=1}^n ({\hat{\alpha }}_i- {\hat{\alpha }}_i^*)y_i \nonumber \\&< \frac{1}{2}\sum _{i=1}^n\sum _{j=1}^n({\hat{\alpha }}_i- {\hat{\alpha }}_i^*)(\phi (\mathbf{x}_i)'\phi (\mathbf{x}_j)+1)({\hat{\alpha }}_j- {\hat{\alpha }}_j^*) \nonumber \\&\qquad +\frac{1}{2C\rho }\sum _{i=1}^n {\hat{\alpha }}_i^2+\frac{1}{2C(1-\rho )}\sum _{i=1}^n({\hat{\alpha }}_i^*)^2 - \sum _{i=1}^n ({\hat{\alpha }}_i- {\hat{\alpha }}_i^*)y_i \nonumber \\ =&W(\hat{\varvec{\alpha }}, \hat{\varvec{\alpha }}^*), \end{aligned}$$
(25)

where the inequality follows because \({\tilde{\alpha }}_1<{\hat{\alpha }}_1\) and \({\tilde{\alpha }}^*_1<{\hat{\alpha }}^*_1\). However, inequality (25) is a contradiction to the assumption that \((\hat{\varvec{\alpha }},\hat{\varvec{\alpha }}^*)\) is the solution to Problem (5). Hence, we must have \({\hat{\alpha }}_1{\hat{\alpha }}^*_1=0\).

Kuhn–Tucker stationary point problem

Consider the nonlinear minimization problem with variable \(\mathbf{x}\) in the p-dimensional space:

$$\begin{aligned} \min _{\mathbf{x}\in {\mathbb {R}}^p} \theta (\mathbf{x}) \quad \text {s. t.} \quad g(\mathbf{x})\le {\mathbf {0}}_m, \end{aligned}$$
(26)

where \(\theta (\mathbf{x})\) is a real-valued function, \(g(\mathbf{x})\) is a vector-valued function in m-dimensional space, and we assume that \(\theta (\mathbf{x})\) and \(g(\mathbf{x})\) are differentiable. At the minimum point \(\mathbf{x}\), there is a vector \({\mathbf {t}}\in {\mathbb {R}}^m\), such that the following conditions are satisfied

$$\begin{aligned} \nabla \theta (\mathbf{x}) +\left( \nabla g(\mathbf{x})\right) '{\mathbf {t}}={\mathbf {0}}_p, \quad g(\mathbf{x})\le {\mathbf {0}}_m,\quad {\mathbf {t}}' g(\mathbf{x})=0,\quad \text {and} \quad {\mathbf {t}}\ge {\mathbf {0}}_m. \end{aligned}$$

We notice that \(\nabla \theta (\mathbf{x})\) is a p-dimensional vector, and \(\nabla g(\mathbf{x})\) is an \(m\times p\) matrix. These conditions are called the Kuhn–Tucker stationary point problem (KTP) in Mangasarian (1994).

In this paper, the minimization problem is

$$\begin{aligned} \min _{{\mathbf {u}}\ge {\mathbf {0}}_{2n}}\;\; W({\mathbf {u}}) = \frac{1}{2} {\mathbf {u}}'{\mathbf {Q}}{\mathbf {u}} - {\mathbf {r}}'{\mathbf {u}}. \end{aligned}$$

Hence, the function \(W({\mathbf {u}}) \) is like \(\theta (\mathbf{x})\) in Eq. (26), and \(-{\mathbf {u}}\) is like \(g(\mathbf{x})\) in Eq. (26). As such, according to the KTP condition, there is a vector \({\mathbf {v}}\in {\mathbb {R}}^{2n}\) satisfying:

$$\begin{aligned} \nabla W({\mathbf {u}})+\left( \nabla (-{\mathbf {u}})\right) '{\mathbf {v}} ={\mathbf {Q}}{\mathbf {u}}-{\mathbf {r}}+\left( -{\mathbf {I}}_{2n}\right) '{\mathbf {v}}={\mathbf {Q}}{\mathbf {u}}-{\mathbf {r}}-{\mathbf {v}}= {\mathbf {0}}_{2n}, \end{aligned}$$

and \(-{\mathbf {u}}\le {\mathbf {0}}_{2n}\) (i.e., \({\mathbf {u}}\ge {\mathbf {0}}_{2n}\)), \({\mathbf {v}}\ge {\mathbf {0}}_{2n}\), \({\mathbf {v}}' (-{\mathbf {u}})=0\) (i.e., \({\mathbf {v}}' {\mathbf {u}}=0\)). These are the same as in Eq. (9).

Orthogonality condition for two nonnegative vectors

In this Appendix, we will show that two nonnegative vectors \({\mathbf {a}}\) and \({\mathbf {b}}\) are perpendicular, if and only if \({\mathbf {a}} = ({\mathbf {a}} - \gamma {\mathbf {b}})_+\) for any real \(\gamma >0\).

We first assume that two nonnegative real numbers a and b satisfy \(ab=0\). Since \(ab=0\), there is at least one of a and b being 0. If \(a=0\) and \(b\ge 0\), then for any real \(\gamma >0\), \(a-\gamma b\le 0\) so that \((a-\gamma b)_+=0=a\); if \(a>0\), we must have \(b=0\), then for any real \(\gamma >0\), \((a-\gamma b)_+ = (a)_+ = a\). In both cases, there is \(a=(a-\gamma b)_+\) for any real number \(\gamma >0\).

Conversely, assume that two nonnegative real numbers a and b can be written as \(a=(a-\gamma b)_+\) for any real number \(\gamma >0\). If a and b are both strictly positive, then \(a-\gamma b<a\) since \(\gamma >0\). Consequently, \((a-\gamma b)_+<a\), which contradicts to our assumption that \(a = (a-\gamma b)_+\). Thus at least one of a and b must be 0, i.e., \(ab=0\).

Now assume that vectors \({\mathbf {a}}\) and \({\mathbf {b}}\) are in \({\mathbb {R}}^p\) with each component nonnegative, and assume \({\mathbf {a}}\perp {\mathbf {b}}\), that is, \(\sum _{i=1}^p a_ib_i=0\). Since both of \(a_i\) and \(b_i\) are nonnegative, there must be \(a_ib_i=0\) for \(i=1,2,\ldots ,p\). By the last argument, this is equivalent to \(a_i = (a_i - \gamma b_i)_+\) for any \(\gamma >0\) and any \(i=1,2,\ldots ,p\). In vector form, we have \({\mathbf {a}} = ({\mathbf {a}} - \gamma {\mathbf {b}})_+\).

Some technical lemmas

This Appendix presents some Lemmas that will be used in Sects. 3.2 and 3.3.

Lemma 1

Let \({\mathbf {a}}\) and \({\mathbf {b}}\) be two vectors in \({\mathbb {R}}^p\), then

$$\begin{aligned} \Vert {\mathbf {a}}_+-{\mathbf {b}}_+\Vert \le \Vert {\mathbf {a}}-{\mathbf {b}}\Vert . \end{aligned}$$
(27)

Proof

For two real numbers a and b, there are four situations

  1. 1.

    \(a\ge 0\) and \(b\ge 0\), then \(|a_+ - b_+|=|a-b|\);

  2. 2.

    \(a\ge 0\) and \(b\le 0\), then \(|a_+ - b_+|=|a-0|\le |a-b|\);

  3. 3.

    \(a\le 0\) and \(b\ge 0\), then \(|a_+ - b_+|=|0-b|\le |a-b|\);

  4. 4.

    \(a\le 0\) and \(b\le 0\), then \(|a_+ - b_+|=|0-0|\le |a-b|\).

In summary, for one dimensional case, there is \(|a_+ - b_+|^2\le |a-b|^2\).

Assume that Eq. (27) is true for p-dimensional vectors \({\mathbf {a}}_p\) and \({\mathbf {b}}_p\). Denote the \((p+1)\)-dimensional vectors \({\mathbf {a}}\) and \({\mathbf {b}}\) asFootnote 4\({\mathbf {a}} = ({\mathbf {a}}_p, a_{p+1})\) and \({\mathbf {b}} = ({\mathbf {b}}_p, b_{p+1})\), where \(a_{p+1}\) and \(b_{p+1}\) are real numbers. Then,

$$\begin{aligned} \Vert {\mathbf {a}}_+-{\mathbf {b}}_+\Vert ^2&= \Vert (({\mathbf {a}}_p)_+-({\mathbf {b}}_p)_+,(a_{p+1})_+-(b_{p+1})_+)\Vert ^2 \nonumber \\&= \Vert ({\mathbf {a}}_p)_+-({\mathbf {b}}_p)_+\Vert ^2 + ((a_{p+1})_+-(b_{p+1})_+)^2 \nonumber \\&\le \Vert {\mathbf {a}}_p-{\mathbf {b}}_p\Vert ^2 + (a_{p+1}-b_{p+1})^2 = \Vert {\mathbf {a}}-{\mathbf {b}}\Vert ^2, \end{aligned}$$
(28)

where in Eq. (28), we used the assumption on the p-dimensional vectors, the special result for one dimensional case, and the definition of Euclidean norm.

By induction, Eq. (27) is proved. \(\square \)

Lemma 2

(Weyl’s inequality (Bhatia 1997, chap. 3)) Let \({\mathbf {A}}\) and \({\mathbf {B}}\) be \(m\times m\) Hermitian matrices. Let \(\lambda _1({\mathbf {A}})\ge \lambda _2({\mathbf {A}})\ge \cdots \ge \lambda _m({\mathbf {A}})\), \(\lambda _1({\mathbf {B}})\ge \lambda _2({\mathbf {B}})\ge \cdots \ge \lambda _m({\mathbf {B}})\), and \(\lambda _1({\mathbf {A}}+{\mathbf {B}})\ge \lambda _2({\mathbf {A}}+{\mathbf {B}})\ge \cdots \ge \lambda _m({\mathbf {A}}+{\mathbf {B}})\) be the eigenvalues of \({\mathbf {A}}\), \({\mathbf {B}}\), and \({\mathbf {A}}+{\mathbf {B}}\), respectively. For any \(i=1,2,\ldots , m\), there is

$$\begin{aligned} \lambda _m({\mathbf {A}}) + \lambda _i({\mathbf {B}}) \le \lambda _{i}({\mathbf {A}}+{\mathbf {B}}) \le \lambda _1({\mathbf {A}}) + \lambda _i({\mathbf {B}}). \end{aligned}$$

In particular, there are

$$\begin{aligned} \lambda _{m}({\mathbf {A}}+{\mathbf {B}})\ge \lambda _m({\mathbf {A}}) + \lambda _m({\mathbf {B}}) \end{aligned}$$
(29)

and

$$\begin{aligned} \lambda _{1}({\mathbf {A}}+{\mathbf {B}})\le \lambda _1({\mathbf {A}}) + \lambda _1({\mathbf {B}}). \end{aligned}$$
(30)

Lemma 3

Let \({\mathbf {C}}\) be an \(m\times m\) semi-positive definite matrix, \({\mathbf {I}}\) be the \(m\times m\) identity matrix, a, b, c, and d be real numbers with \(a\ne 0\) and \(d\ne 0\). There are

$$\begin{aligned}&(a{\mathbf {I}} + b{\mathbf {C}})^{-1}{\mathbf {C}} = {\mathbf {C}}(a{\mathbf {I}} + b{\mathbf {C}})^{-1}, \\&(a{\mathbf {I}} + b{\mathbf {C}})^{-1}(c{\mathbf {C}}+d{\mathbf {I}}) = (c{\mathbf {C}}+d{\mathbf {I}})(a{\mathbf {I}} + b{\mathbf {C}})^{-1}, \end{aligned}$$

and

$$\begin{aligned} (a{\mathbf {I}} + b{\mathbf {C}})^{-1}(c{\mathbf {C}}+d{\mathbf {I}})^{-1} = (c{\mathbf {C}}+d{\mathbf {I}})^{-1}(a{\mathbf {I}} + b{\mathbf {C}})^{-1}. \end{aligned}$$

Proof

Direct calculating gives us

$$\begin{aligned} (a{\mathbf {I}} + b{\mathbf {C}})^{-1}{\mathbf {C}}&= (a{\mathbf {I}} + b{\mathbf {C}})^{-1}{\mathbf {C}}(a{\mathbf {I}} + b{\mathbf {C}})(a{\mathbf {I}} + b{\mathbf {C}})^{-1} \\&= (a{\mathbf {I}} + b{\mathbf {C}})^{-1}(a{\mathbf {I}} + b{\mathbf {C}}){\mathbf {C}}(a{\mathbf {I}} + b{\mathbf {C}})^{-1}= {\mathbf {C}}(a{\mathbf {I}} + b{\mathbf {C}})^{-1}. \end{aligned}$$

Using the previous result, we have

$$\begin{aligned} (a{\mathbf {I}} + b{\mathbf {C}})^{-1}(c{\mathbf {C}}+d{\mathbf {I}})&= (a{\mathbf {I}} + b{\mathbf {C}})^{-1}c{\mathbf {C}} + (a{\mathbf {I}} + b{\mathbf {C}})^{-1} d{\mathbf {I}} \\&= c{\mathbf {C}} (a{\mathbf {I}} + b{\mathbf {C}})^{-1} + d{\mathbf {I}} (a{\mathbf {I}} + b{\mathbf {C}})^{-1} \\&= (c{\mathbf {C}}+d{\mathbf {I}})(a{\mathbf {I}} + b{\mathbf {C}})^{-1}. \end{aligned}$$

Finally,

$$\begin{aligned} (a{\mathbf {I}} + b{\mathbf {C}})^{-1}(c{\mathbf {C}}+d{\mathbf {I}})^{-1}&= \left[ (c{\mathbf {C}}+d{\mathbf {I}})(a{\mathbf {I}} + b{\mathbf {C}})\right] ^{-1} \\&= \left[ (a{\mathbf {I}} + b{\mathbf {C}})(c{\mathbf {C}}+d{\mathbf {I}})\right] ^{-1} \\&= (c{\mathbf {C}}+d{\mathbf {I}})^{-1}(a{\mathbf {I}} + b{\mathbf {C}})^{-1}. \end{aligned}$$

\(\square \)

Lemma 4

From (Press et al. 2007, Sect. 2.7) Suppose that an \(N\times N\) matrix \({\mathbf {A}}\) is partitioned into

$$\begin{aligned} {\mathbf {A}} = \begin{bmatrix} {\mathbf {A}}_{11} &{}\;\;{\mathbf {A}}_{12} \\ {\mathbf {A}}_{21} &{}\;\; {\mathbf {A}}_{22} \end{bmatrix}, \end{aligned}$$

where \({\mathbf {A}}_{11}\) and \({\mathbf {A}}_{22}\) are square matrices of size \(p\times p\) and \( s\times s\), respectively (\(p +s = N\)). If the inverse of \({\mathbf {A}}\) is partitioned in the same manner,

$$\begin{aligned} {\mathbf {A}}^{-1} = \begin{bmatrix} {\overline{{\mathbf {A}}}}_{11} &{}\;\;{\overline{ {\mathbf {A}}}}_{12} \\ {\overline{{\mathbf {A}}}}_{21} &{}\;\;{\overline{ {\mathbf {A}}}}_{22} \end{bmatrix}, \end{aligned}$$

then \( {\overline{ {\mathbf {A}}}}_{11}\), \({\overline{ {\mathbf {A}}}}_{12}\), \({\overline{ {\mathbf {A}}}}_{21}\), \({\overline{ {\mathbf {A}}}}_{22}\), which have the same sizes as \({\mathbf {A}}_{11}\), \({\mathbf {A}}_{12}\), \({\mathbf {A}}_{21}\), \({\mathbf {A}}_{22}\), respectively, can be found by

$$\begin{aligned}&{\overline{ {\mathbf {A}}}}_{11} = ({\mathbf {A}}_{11} -{\mathbf {A}}_{12}{\mathbf {A}}_{22}^{-1}{\mathbf {A}}_{21} )^{-1}, \\&{\overline{ {\mathbf {A}}}}_{12} = -{\overline{ {\mathbf {A}}}}_{11}{\mathbf {A}}_{12}{\mathbf {A}}_{22}^{-1}, \\&{\overline{ {\mathbf {A}}}}_{21} = -{{\mathbf {A}}^{-1}_{22}}{\mathbf {A}}_{21}{\overline{ {\mathbf {A}}}}_{11}, \end{aligned}$$

and

$$\begin{aligned} {\overline{{\mathbf {A}}}}_{22}= ({\mathbf {I}}-{\overline{{\mathbf {A}}}}_{21} {\mathbf {A}}_{12}){\mathbf {A}}_{22}^{-1}. \end{aligned}$$

Proof of Theorem 2

To calculate \({\overline{{\mathbf {P}}}}\), using Lemmas 4 and 3, we consider

$$\begin{aligned} {\overline{{\mathbf {P}}}}^{-1}&=\frac{1}{C\rho }{\mathbf {I}}_n+{\mathbf {T}}-(-{\mathbf {T}})\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n +{\mathbf {T}}\right) ^{-1}(-{\mathbf {T}}) \\&=\frac{1}{C \rho }{\mathbf {I}}_n+{\mathbf {T}}-\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}{\mathbf {T}}^2 \\&=\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left[ \left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+ {\mathbf {T}}\right) \left( \frac{1}{C\rho }{\mathbf {I}}_n+{\mathbf {T}}\right) -{\mathbf {T}}^2\right] \\&=\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left[ \frac{1}{C^2\rho (1-\rho )}{\mathbf {I}}_n +\frac{1}{C\rho (1-\rho )}{\mathbf {T}}\right] \\&=\frac{1}{C\rho (1-\rho )}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) . \end{aligned}$$

Thus,

$$\begin{aligned} {\overline{{\mathbf {P}}}}=C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n +{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) . \end{aligned}$$

By Lemma 3, there are

$$\begin{aligned} {\overline{{\mathbf {Q}}}}&= - {\overline{{\mathbf {P}}}}(-{\mathbf {T}})\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1} \\&=C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n +{\mathbf {T}}\right) \left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}{\mathbf {T}} \\&=C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}{\mathbf {T}}, \end{aligned}$$

and

$$\begin{aligned} {\overline{{\mathbf {R}}}}&= -\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}(-{\mathbf {T}}) {\overline{{\mathbf {P}}}} \\&={\mathbf {T}}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1} C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) \\&={\mathbf {T}}C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1} \left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) \\&=C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}{\mathbf {T}}. \end{aligned}$$

Finally,

$$\begin{aligned} {\overline{{\mathbf {S}}}}&= \left( {\mathbf {I}}_n - {\overline{{\mathbf {R}}}}(-\mathbf {T)}\right) \left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1} \\&=\left( {\mathbf {I}}_n +C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}{\mathbf {T}}{\mathbf {T}}\right) \left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1} \\&=\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}} +C\rho (1-\rho ){\mathbf {T}}^2\right) \left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n +{\mathbf {T}}\right) ^{-1} \\&=C\rho (1-\rho )\left( \frac{1}{C\rho }{\mathbf {I}}_n+{\mathbf {T}}\right) \left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+ {\mathbf {T}}\right) \left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1} \\&=C\rho (1-\rho )\left( \frac{1}{C\rho }{\mathbf {I}}_n+{\mathbf {T}}\right) \left( \frac{1}{C}{\mathbf {I}}_n+ {\mathbf {T}}\right) ^{-1}\left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n+{\mathbf {T}}\right) \left( \frac{1}{C(1-\rho )}{\mathbf {I}}_n +{\mathbf {T}}\right) ^{-1} \\&=C\rho (1-\rho )\left( \frac{1}{C}{\mathbf {I}}_n+{\mathbf {T}}\right) ^{-1}\left( \frac{1}{C\rho }{\mathbf {I}}_n+{\mathbf {T}}\right) . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, S. KLERC: kernel Lagrangian expectile regression calculator. Comput Stat 36, 283–311 (2021). https://doi.org/10.1007/s00180-020-01003-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-020-01003-0

Keywords

Navigation