A coordinate descent algorithm for computing penalized smooth quantile regression

Mkhadri, Abdallah; Ouhourane, Mohamed; Oualkacha, Karim

doi:10.1007/s11222-016-9659-9

A coordinate descent algorithm for computing penalized smooth quantile regression

Published: 10 May 2016

Volume 27, pages 865–883, (2017)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Abdallah Mkhadri¹,
Mohamed Ouhourane² &
Karim Oualkacha²

1173 Accesses
9 Citations
Explore all metrics

Abstract

The computation of penalized quantile regression estimates is often computationally intensive in high dimensions. In this paper we propose a coordinate descent algorithm for computing the penalized smooth quantile regression (cdaSQR) with convex and nonconvex penalties. The cdaSQR approach is based on the approximation of the objective check function, which is not differentiable at zero, by a modified check function which is differentiable at zero. Then, using the maximization-minimization trick of the gcdnet algorithm (Yang and Zou in, J Comput Graph Stat 22(2):396–415, 2013), we update each coefficient simply and efficiently. In our implementation, we consider the convex penalties $\ell _1+\ell _2$ and the nonconvex penalties SCAD (or MCP) $+ \ell _2$. We establishe the convergence property of the csdSQR with $\ell _1+\ell _2$ penalty. The numerical results show that our implementation is an order of magnitude faster than its competitors. Using simulations we compare the speed of our algorithm to its competitors. Finally, the performance of our algorithm is illustrated on three real data sets from diabetes, leukemia and Bardet–Bidel syndrome gene expression studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable selection via generalized SELO-penalized linear regression models

Article 08 June 2018

Yue-yong Shi, Yong-xiu Cao, … Yu-ling Jiao

Advanced algorithms for penalized quantile and composite quantile regression

Article 12 July 2020

Matthew Pietrosanu, Jueyu Gao, … Di Niu

A unified penalized method for sparse additive quantile models: an RKHS approach

Article 06 June 2016

Shaogao Lv, Xin He & Junhui Wang

References

Aravkin, A., Kambadur, A., Lozano, A. C., Luss, R.: Sparse Quantile Huber Regression for Efficient and Robust Estimation (2014) ArXiv:1402.4624v1
Belloni, A., Chernozhukov, V.: L$_1$-Penalized quantile regression in high-dimensional sparse models. Ann. Stat. 39, 82–130 (2011)
Article MATH Google Scholar
Breheny, P., Huang, J.: Coordinate descent algorithms for non nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5(1), 232 (2011)
Article MathSciNet MATH Google Scholar
Briollais, L., Durrieu, G.: Application of quantile regression to recent genetic and -omic studies. Hum. Genet. 133, 951–966 (2014)
Article Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–499 (2004)
Article MathSciNet MATH Google Scholar
El Anbari, M., Mkhadri, A.: Penalized regression combing the L1 norm and a correlation based penalty. Sankhya B 76(1), 82–102 (2014)
Article MathSciNet MATH Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)
Article Google Scholar
Hebiri, M., van de Geer, S.: The Smooth-Lasso and other $l_1 + l_2$-penalized methods. Electron. J. Stat. 5, 1184–1226 (2011)
Article MathSciNet MATH Google Scholar
Huang, P., Breheny, S., Zhang, C.H.: The Mnet method for variable selection. Technical Report N 402, Department of Statistics, Iowa University (2010)
Hunter, D.R., Lange, K.: Quantile regression via an MM algorithm. J. Comput. Graph. Stat. 19, 60–77 (2000)
MathSciNet Google Scholar
Jennings, L.S., Wong, K.H., Teo, K.L.: Optimal control computation to account for electric movement. J. Aust. Math. Soc. B 38, 182–193 (1996)
Article MATH Google Scholar
Jiang, D., Huang, J.: Majorization minimization by coordinate descent for concave penalized generalized linear models. Department of Biostatistics, University of Iowa, Report No 412 (2012)
Koenker, R.: Quantile Regression. Cambridge University Press, Cambridge (2005)
Book MATH Google Scholar
Koenker, R., Basset, G.: Regression quantiles. Econometrica 46, 33–50 (1970)
Article MathSciNet Google Scholar
Li, C., Wei, Y., Chappell, R., He, X.: Best line quantile regression with application to an allometric study of land mam-mals? speed and mass. Biometrics 67(1), 242–249 (2011)
Article MathSciNet MATH Google Scholar
Li, Y., Zhu, J.: $L_{1}$-norm quantile regression. J. Compu. Graph. Stat. 17, 163–185 (2008)
Article Google Scholar
Mazumder, R., Friedman, J.H., Hastie, T.: SparseNet: coordinate descent with nonconvex penal- ties. J. Am. Stat. Assoc. 106, 1125–1138 (2011)
Article MATH Google Scholar
Oh, S.-K., Lee, T.C.M., Nychka, D.W.: Fast nonparametric quantile regression with arbitrary smoothing methods. J. Compu. Graph. Stat. 20, 510–526 (2011)
Article MathSciNet Google Scholar
Peng, B., Wang, L.: An iterative coordinate descent algorithm for high-dimensional nonconvex penalized quantile regression. J. Compu. Graph. Stat. 24, 676–694 (2015)
Article MathSciNet Google Scholar
Scheetz, T., Kim, K., Swiderski, R., Philp, A., Braun, T., Knudtson, K., Dorrance, A., DiBona, G., Huang, J., Casavant, T., Sheffield, V., Stone, E.: Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc. Natl. Acad. Sci. USA 103(39), 14429–14434 (2006)
Article Google Scholar
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse group lasso. J. Comput. Graph. Stat. 22, 231–245 (2013)
Article MathSciNet Google Scholar
Slawski, M.: The structured elastic net for quantile regression and support vector classification. Stat. Comput. 22(1), 153–168 (2012)
Article MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Assoc B 67, 91–108 (2005)
Article MathSciNet MATH Google Scholar
Wu, Y., Liu, Y.: Variable selection in quantile regression. Stati. Sin. 19, 801–817 (2009)
MathSciNet MATH Google Scholar
Yang, Y., Zou, H.: An efficient algorithm for computing The HHSVM and its generalizations. J. Comput. Graph. Stat. 22(2), 396–415 (2013)
Article MathSciNet Google Scholar
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Article MathSciNet MATH Google Scholar
Zhao, G.H., Teo, K.L., Chan, K.S.: Estimation of conditional quantiles by a new smoothing approximation of asymmetric loss function. Stat. Comput. 15, 5–11 (2005)
Article MathSciNet Google Scholar
Zheng, S.: Gradient descent algorithms for quantile regression with smooth approximation. Int. J. Mach. Learn. Cybern. 2, 191–207 (2011)
Article Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic-net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

We warmly thank the reviewers for their careful reading of the previous version of our paper and their helpful comments. This work is partially supported by Centre National pour la Recherche Scientifique et Technique (Morocco) project URAC01 to Abdallah Mkhadri and by the Natural Sciences and Engineering Research Council of Canada and Fonds de recherche du Québec$-$Santé grant FRQS$-$31110 to Karim Oualkacha.

Author information

Authors and Affiliations

Department of Mathematics, Cadi Ayyad University, Marrakech, Morocco
Abdallah Mkhadri
Department of Mathematics, Université du Québec À Montréal, H2X 3Y7, Montreal, QC, Canada
Mohamed Ouhourane & Karim Oualkacha

Authors

Abdallah Mkhadri
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Ouhourane
View author publications
You can also search for this author in PubMed Google Scholar
Karim Oualkacha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdallah Mkhadri.

Appendices

Appendix 1 Proof of Proposition 1

The proof of the point (a) is omitted since it is based on simple algebra. We will detail the proof of the point (b) on the convexity of the only function $\rho $ defined in equation (5) (the proof for the function (4) is similar and is omitted). For that, it suffices to show that for any u and $v\in ]-\infty ,-(1-\tau )k[\cup [-(1-\tau )k, \tau k[\cup ]\tau k, +\infty [=I_1\cup I_2\cup I_3 $, we have

$$\begin{aligned} \rho (v)-\rho (u)-\rho '(u)(v-u)\ge 0. \end{aligned}$$

It is clear that this inequality is satisfied if u and v are both taken in the same interval $I_j$ ($j=1, 2, 3$), since $\rho (u)$ is a quadratic function of u or a function of the absolute value of u. So, we establish the proposition for any $(u,v)\in I_j\times I_{j'}$ for $j\ne j'$. Thus, we have six cases to distinguish.

1.
If $v\in I_1$ and $u\in I_2$, then
$$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = -(1-\tau )v-\frac{k(1-\tau )^2}{2}-\frac{1}{2k}u^2-\frac{u}{k}(v-u)\\&\quad = (1-\tau )[-v-\frac{k(1-\tau )}{2}]- \frac{v^2}{2k}+\left( \frac{u}{\sqrt{2k}}-\frac{v}{\sqrt{2k}}\right) ^2\\&\quad = \frac{1}{2k}[(u-v)^2-(v+k(1-\tau ))^2]\\&\quad = \frac{1}{2k}[(u-2v-k(1-\tau ))(u-v+v+k(1-\tau ))] \\&\quad = \frac{1}{2k}[(u-2v-k(1-\tau ))(u+k(1-\tau ))]. \\&\quad \ge 0 \end{aligned}$$
The last inequality is obtained from the fact that $u\in I_2=[-(1-\tau )k,\tau k]$. So $(u+k(1-\tau ))\ge 0$, which implies that $u-2v-k(1-\tau ) \ge -k(1-\tau ) -2v-k(1-\tau )=-2[v+k(1-\tau )] > 0,$ since $v<-(1-\tau )k$.
2.
If $v\in I_3$ and $u\in I_2$, then
$$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = \tau v-\frac{u\tau ^2}{2}-\frac{1}{2k}u^2-\frac{u}{k}(v-u)\\&\quad = \tau v-\frac{u\tau ^2}{2}-\frac{1}{2k}u^2-\frac{uv}{k}+\frac{u^2}{k} + \frac{v^2}{2k}-\frac{v^2}{2k}\\&\quad = \frac{1}{2k}[(u-v)^2-(v-k\tau )^2]\\&\quad = \frac{1}{2k}[(u-2v+k\tau )(u-k\tau )] \\&\quad \ge 0. \end{aligned}$$
The last inequality comes from the fact that $u\in I_2$, so $u-k\tau \le 0$. It implies that $u-2v+k\tau \le k\tau -2v+k\tau =2(k\tau -v) \le 0 $, since $v\in I_3$.
3.
If $u\in I_3$ and $v\in I_1$, then
$$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = -(1-\tau )v-\frac{k(1-\tau )^2}{2} -\tau u+\frac{k\tau ^2}{2}-\tau (v-u)\\&\quad = -(1-\tau )v-\frac{k(1-\tau )^2}{2}+\frac{k\tau ^2}{2}-\tau v\\&\quad = -v -\frac{k(1-\tau )-k\tau }{2}\\&\quad \ge \frac{2k(1-\tau )-k(1-2\tau )}{2} \quad \text{ since } \quad -v\ge (1-\tau )k\\&\quad = \frac{k}{2} \ge 0 . \end{aligned}$$
4.
If $v\in I_3$ and $u\in I_1$ , as in case 3.), we have
$$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = \tau v-\frac{k\tau ^2}{2}+(1-\tau )u \\&\qquad + \frac{k(1-\tau )^2}{2}+(1-\tau )(v-u)\\&\quad = \frac{k(1-2\tau )}{2}+v \\&\quad \ge \frac{k(1-2\tau )+2\tau k}{2}\\&\quad =\frac{k}{2} \quad \text{ since } \quad v>\tau k\\&\quad \ge 0 . \end{aligned}$$
5.
If $v\in I_2$ and $u\in I_1$, then we have
$$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = \frac{1}{2k}v^2+(1-\tau )u\\&\qquad +\frac{k(1-\tau )^2}{2}+(1-\tau )(v-u)\\&\quad =\frac{1}{2k}v^2+\frac{k(1-\tau )^2}{2}+(1-\tau )v\\&\quad =\left[ \frac{v}{\sqrt{2k}}+\sqrt{k/2}(1-\tau )\right] ^2 \ge 0. \end{aligned}$$
6.
If $v\in I_2$ and $u\in I_3$ , then we have as in case 5.)
$$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = \frac{1}{2k}v^2-\tau u+\frac{k\tau ^2}{2}-\tau (v-u)\\&\quad = \left[ \frac{v}{\sqrt{2k}}-\sqrt{k/2}\tau \right] ^2 \ge 0. \end{aligned}$$

This completes the proof of Proposition 1. $\square $

Appendix 2 Proof of Proposition 2

We will show that the two functions $ \rho _{\tau ,c}(u)$ and $ \rho _{\tau ,k}(u)$ are differentiable and have a Lipschitz continuous first derivative. To do so, one can calculate the derivatives of the functions $ \rho _{\tau ,c}(u)$ and $ \rho _{\tau ,k}(u)$ as

$$\begin{aligned} {\rho ^{\prime }_{\tau ,c}(u)}=\left\{ \begin{array}{ll} \tau -1&{}\quad for\,\, u<-c\\ (1-\tau )u/c&{}\quad for\,\, -c \le u < 0\\ \tau u /c&{}\quad for\,\, 0 \le u < c \\ \tau &{}\quad for\,\, c \le u \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} {\rho _{\tau ,k}^{\prime }(u)}=\left\{ \begin{array}{ll} \tau - 1 &{}\quad for\,\, u<-(1-\tau ) k\\ \frac{1}{k}u&{}\quad for\,\, -(1-\tau ) k \le u \le \tau k\\ \tau &{}\quad for\,\, u > \tau k.\\ \end{array} \right. \end{aligned}$$

After some algebra, these derivatives satisfy

$$\begin{aligned}&{\mid \rho ^{\prime }_{\tau ,c}(u)- \rho ^{\prime }_{\tau ,c}(v)\mid }\\&\quad =\left\{ \begin{array}{ll} 0 &{}\quad {if}\quad (u<-c,v<-c)~or~(u>c,v>c)\\ \frac{\tau }{c}\mid u - c \mid &{}\quad {if}\quad (0<u<c,v>c)\\ \frac{\tau }{c}\mid c - v \mid &{}\quad {if}\quad (0<v<c,u>c)\\ \frac{1}{c}\mid c\tau - (1-\tau )u \mid &{}\quad {if}\quad (-c<u<0,v>c)\\ \frac{1}{c}\mid c\tau - (1-\tau )v \mid &{}\quad {if}\quad (-c<v<0,u>c)\\ 1 &{}\quad {if}\quad (-c>v,u>c)~or~(-c>u,v>c)\\ \frac{\tau }{c}\mid u - v\mid &{}\quad {if}\quad (0<u<c,0<v<c)\\ \frac{1}{c}\mid \tau u - (1-\tau )v\mid &{}\quad {if}\quad (0<u<c,-c<v<0)\\ \frac{1}{c}\mid (1-\tau ) u - \tau v\mid &{}\quad {if}\quad (0<v<c,-c<u<0)\\ \frac{1}{c}\mid \tau u - (\tau -1) c\mid &{}\quad {if}\quad (0<u<c,-c>v)\\ \frac{1}{c}\mid \tau v - (\tau -1) c\mid &{}\quad {if}\quad (0<v<c,-c>u)\\ \frac{1-\tau }{c}\mid u - v\mid &{}\quad {if}\quad (-c<u<0,-c<v<0)\\ \frac{1-\tau }{c}\mid u + c\mid &{}\quad {if}\quad (-c<u<0,-c<v<0)\\ \frac{1-\tau }{c}\mid c + v\mid &{}\quad {if}\quad (-c<v<0,-c<u<0)\end{array} \right. \end{aligned}$$

and

$$\begin{aligned} {\mid \rho ^{\prime }_{\tau ,k}(u)- \rho ^{\prime }_{\tau ,k}(v)\mid }=\left\{ \begin{array}{ll} 0 &{}\quad {if}~~~ (u<-(1-\tau ) k,~~~v<-(1-\tau ) k)~~~{\text{ o }r}~~~(u>\tau k,~~~ v> \tau k)\\ \frac{1}{k}|v+(1-\tau ) k| &{}\quad {if}~~~ (u<-(1-\tau ) k,~~~\tau k > v >-(1-\tau ) k)\\ \frac{1}{k}|u+(1-\tau ) k| &{}\quad {if}~~~ (v<-(1-\tau ) k,~~~\tau k > u >-(1-\tau ) k)\\ \frac{1}{k}|u-\tau k| &{}\quad {if}~~~ (v > \tau k,~~~\tau k > u >-(1-\tau ) k)\\ \frac{1}{k}|v-\tau k| &{}\quad {if}~~~ (u > \tau k,~~~\tau k > v >-(1-\tau ) k)\\ \frac{1}{k}|u-v| &{}\quad {if}~~~ (\tau k > u >-(1-\tau ) k,~~~\tau k > v >-(1-\tau ) k)\\ 1 &{}\quad {if}~~~ (u < -(1-\tau ) k,~~~> \tau k )~~~{\text{ o }r}~~~(v < -(1-\tau ) k,~~~u > \tau k )\\ 1 &{}\quad {if}~~~ (u<-\tau k,v>(1-\tau ) k)~{\text{ o }r}~(u>(1-\tau )k,v<\tau k).\end{array} \right. \end{aligned}$$

Therefore, one can verify that

$$\begin{aligned} \mid \rho ^{\prime }_{\tau ,c}(u)- \rho ^{\prime }_{\tau ,c}(u)\mid \le \frac{sup(\tau ,1-\tau )}{c}\mid u - v\mid , \end{aligned}$$

and

$$\begin{aligned} \mid \rho ^{\prime }_{\tau ,k}(u)- \rho ^{\prime }_{\tau ,k}(u)\mid \le \frac{1}{k}\mid u - v\mid , \end{aligned}$$

which ends the proof. $\square $

Appendix 3 Proof of Theoreme 3.1

The function $\varvec{G}$ is a convex function since it the sum of convex functions $\rho _{\tau ,*}$ (see Proposition 1) and the enet penalty $P_{\lambda _1,\lambda _2}$. This completes the proof of point 1.).

The proof of the point 2.) of Theorem 3.1 relies on the following result which will be proved in Lemma 1: for all m we have

$$\begin{aligned} \varvec{G}(\bar{{\varvec{\beta }}}^{m}) - \varvec{G}(\bar{{\varvec{\beta }}}^{m+1}) \ge \theta \Vert \bar{{\varvec{\beta }}}^{m}-\bar{{\varvec{\beta }}}^{m+ 1}\Vert _2^2. \end{aligned}$$

(19)

Since the sequence $\varvec{G}(\bar{{\varvec{\beta }}}^{m})$ is decreasing and is bounded below [e.g. $0 \le \varvec{G}(\bar{{\varvec{\beta }}}^{m})$, for all m], then it converges. Thus, from (19) one can verify that the sequence generated by Algorithm 1 cannot cycle without convergence; i.e. it must have a unique limit point. This completes the proof of the point 2.). $\square $

Now we will prove the point 3.). In the sequel, the parameter $\tau $ is omitted in the parameter vector ${\varvec{\beta }}$ and in $\beta _0$ to simplify the presentation of the proof. For any $\bar{{\varvec{\beta }}} = (\beta _0, \beta _1,\ldots ,\beta _p)^\top $ and $\varvec{\gamma }_j = (0, \ldots , 0,\gamma _j, 0, \ldots , 0)^\top \in I\!\!R^{p+1}$, we have

$$\begin{aligned}&\text{ lim }\inf _{\alpha \downarrow 0+} \frac{\varvec{G}(\bar{{\varvec{\beta }}} + \alpha \varvec{\gamma }_j) - \varvec{G}(\bar{{\varvec{\beta }}})}{\alpha }\nonumber \\&\quad = -\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0-\mathbf {x}_i^{\top }{\varvec{\beta }} \} x_{ij} \gamma _j + \lambda _2 \beta _j \gamma _j \nonumber \\&\qquad +\, \text{ lim }\inf _{\alpha \downarrow 0+} \frac{|\beta _j + \alpha \gamma _j| - |\beta _j|}{\alpha } \nonumber \\&\quad = -\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0-\mathbf {x}_i^{\top }{\varvec{\beta }} \} x_{ij} \gamma _j + \lambda _2 \beta _j \gamma _j + \partial {|\beta _j|_{\gamma _j}}\nonumber \\ \end{aligned}$$

(20)

for $j\in {1,\ldots ,p}$, with

$$\begin{aligned} \partial {|\beta _j|_{\gamma _j}} = \left\{ \begin{array}{ll} sgn(\beta _j) \lambda _1 \gamma _j &{}~~~\text{ for }~~~ |\beta _j| > 0; \\ \lambda _1 |\gamma _j| &{}~~~\text{ for } ~~~ \beta _j = 0. \end{array} \right. \end{aligned}$$

(21)

Assume a subsequence $\bar{{\varvec{\beta }}}^{n_k} \rightarrow \bar{{\varvec{\beta }}}^\infty = (\beta _0^{\infty },\ldots ,\beta _p^{\infty })^\top $, by equation (19), the successive differences converge to zero (i.e. $\bar{{\varvec{\beta }}}^{n_k} - \bar{{\varvec{\beta }}}^{n_k-1} \rightarrow 0$). Thus, as $k \rightarrow \infty $, we have

$$\begin{aligned}&\bar{{\varvec{\beta }}}^{n_k-1}_j = (\beta _0^{n_k}, \ldots , \beta _{j-1}^{n_k},\beta _j^{n_k},\beta _{j+1}^{n_k-1}, \ldots ,\beta _p^{n_k-1}) \nonumber \\&\quad \rightarrow (\beta _0^{\infty }, \ldots , \beta _{j-1}^{\infty },\beta _j^{\infty },\beta _{j+1}^{\infty }, \ldots ,\beta _p^{\infty }). \end{aligned}$$

(22)

By (21) and (22), we have for $j\in \{1,\ldots ,p\}$

$$\begin{aligned}&\partial {|\beta _j^{n_k}|_{\gamma _j}} \rightarrow \partial {|\beta _j^{\infty }|_{\gamma _j}}, \; \; \text {if} \; \; \beta _j^{\infty } \ne 0; \quad \partial {|\beta _j^{\infty }|_{\gamma _j}}\nonumber \\&\quad \ge \text{ lim }\inf _{k} \partial {|\beta _j^{n_k}|_{\gamma _j}}, \; \; \text {if} \; \; \beta _j^{\infty } = 0. \end{aligned}$$

(23)

By the coordinate-wise minimum of j-th coordinate $j\in \{1,\ldots ,p\}$, one has

$$\begin{aligned}&-\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0^{n_k}-\mathbf {x}_i^{\top }{\varvec{\beta }}_j^{n_k-1} \} x_{ij} \gamma _j + \lambda _2 \beta _j^{n_k} \gamma _j \nonumber \\&\quad +\, \partial {|\beta _j^{n_k}|_{\gamma _j}} \ge 0 , \quad \text {for all} \;\; k, \end{aligned}$$

(24)

with ${\varvec{\beta }}_j^{n_k-1} = (\beta _1^{n_k}, \ldots , \beta _{j-1}^{n_k},\beta _j^{n_k},\beta _{j+1}^{n_k-1},\ldots , \beta _p^{n_k-1})^\top $. From (23, 24) one can write, for all $j\in \{1,\ldots ,p\}$

$$\begin{aligned}&-\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0^{\infty } -\mathbf {x}_i^{\top }{\varvec{\beta }}_j^{\infty } \} x_{ij} \gamma _j + \lambda _2 \beta _j^{\infty } \gamma _j + \partial {|\beta _j^{\infty } |_{\gamma _j}}\nonumber \\&\quad \ge \text{ lim }\inf _{k} \left[ -\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0^{n_k}-\mathbf {x}_i^{\top }{\varvec{\beta }}_j^{n_k-1} \} x_{ij} \gamma _j \right. \nonumber \\&\qquad +\,\left. \lambda _2 \beta _j^{n_k} \gamma _j + \partial {|\beta _j^{n_k}|_{\gamma _j}}\right] \ge 0 \end{aligned}$$

(25)

By (20, 25), for $j\in \{1,\ldots ,p\}$, we have

$$\begin{aligned} \text{ lim }\inf _{\alpha \downarrow 0+} \frac{\varvec{G}(\bar{{\varvec{\beta }}}^{\infty } + \alpha \varvec{\gamma }_j) - \varvec{G}(\bar{{\varvec{\beta }}}^{\infty })}{\alpha } \ge 0. \end{aligned}$$

(26)

For $j=0$, following the same above arguments, one can easily verify that

$$\begin{aligned}&-\frac{1}{n}\sum \limits ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0^{\infty } -\mathbf {x}_i^{\top }{\varvec{\beta }}_j^{\infty } \} x_{ij} \gamma _0 \ge 0. \end{aligned}$$

(27)

Thus for ${\varvec{\gamma }} = (\gamma _0, \ldots ,\gamma _p)^\top \in I\!\!R^{p+1}$, we have

$$\begin{aligned}&\text{ lim }\inf _{\alpha \downarrow 0+} \frac{\varvec{G}(\bar{{\varvec{\beta }}}^{\infty } + \alpha \varvec{\gamma }) - \varvec{G}(\bar{{\varvec{\beta }}}^{\infty })}{\alpha } \nonumber \\&\quad = -\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i} -\beta _0^{\infty } -\mathbf {x}_i^{\top }{\varvec{\beta }}_j^{\infty } \} x_{ij} \gamma _0 \nonumber \\&\qquad +\, \sum ^{p}_{j=1} \left[ -\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0^{\infty }-\mathbf {x}_i^{\top }{\varvec{\beta }}^{\infty } \} x_{ij} \gamma _j + \lambda _2 \beta _j^{\infty } \gamma _j \right] \nonumber \\&\qquad +\, \lambda _1\sum ^{p}_{j=1} \text{ lim }\inf _{\alpha \downarrow 0+} \frac{|\beta _j^{\infty } + \alpha \gamma _j| - |\beta _j^{\infty }|}{\alpha } \nonumber \\&\quad = -\frac{1}{n}\sum ^{n}_{i=1}\rho _{\tau ,*}^{\prime }\{ y_{i}-\beta _0^{\infty } -\mathbf {x}_i^{\top }{\varvec{\beta }}_j^{\infty } \} x_{ij} \gamma _0 \nonumber \\&\qquad +\, \sum ^{p}_{j=1} \text{ lim }\inf _{\alpha \downarrow 0+} \frac{\varvec{G}(\bar{{\varvec{\beta }}}^{\infty } + \alpha \varvec{\gamma }_j) - \varvec{G}(\bar{{\varvec{\beta }}}^{\infty })}{\alpha } \nonumber \\&\quad \ge 0, \end{aligned}$$

(28)

where the last inequality ($\ge $0) is obtained using equations (26) and (27). This limit point is a global minimum since the function G is convex, which completes the proof of (3).$\square $

Lemma 1

Let $G(.|\beta _0, {\varvec{\beta }})$ and $F(.|\beta _0, {\varvec{\beta }})$ be defined as in Theorem 1 and let $\varvec{G}(\bar{{\varvec{\beta }}})$ the global objective function defined by (6). Let $\bar{{\varvec{\beta }}}^m_\tau = \{ \beta _0^m(\tau ), {\varvec{\beta }}^m_\tau \}^\top $ be a sequence of iterates generated by the iteration map of Algorithm 1 of our MM coordinate descente algorithm.

Then the sequence $\bar{{\varvec{\beta }}}^m$ satisfies equation (19) for all m, i.e.

$$\begin{aligned} \varvec{G}(\bar{{\varvec{\beta }}}^{m}) - \varvec{G}(\bar{{\varvec{\beta }}}^{m+1}) \ge \theta \Vert \bar{{\varvec{\beta }}}^{m}-\bar{{\varvec{\beta }}}^{m+1} \Vert _2^2. \end{aligned}$$

Proof of Lemma

The objective function to minimize for each coordinate $j = 1,\ldots ,p$ can be written as

$$\begin{aligned} \varvec{G}(u|{\varvec{\beta }})=\frac{1}{n}\sum _{i=1}^n \rho _{\tau ,*}(r_i-x_{ij}(\beta _j-u))+ \lambda _1|u|+\frac{\lambda _2}{2}u^2,\nonumber \\ \end{aligned}$$

(29)

where $r_i=y_i-\beta _0-\mathbf {x}_i^T{\varvec{\beta }}$ and the parameter $\tau $ is omitted in the parameter vector ${\varvec{\beta }}$ to simplify the presentation of the proof. Let $u_0$ be the minimizer of $\varvec{G}(.|{\varvec{\beta }})$ with respect the specified coordinate $j=1, \ldots , p$. Following the same argument as in Mazumber et al. 2011 and Jiang and Huang (2012), we need to show that

$$\begin{aligned} \varvec{G}(u_0+\nu |{\varvec{\beta }}) -\varvec{G}(u_0|{\varvec{\beta }})\ge \theta \nu ^2, \end{aligned}$$

(30)

where $\theta >0$ and $\nu $ a small real value. In fact, we notice that $\varvec{G}(.|{\varvec{\beta }})$ is a convex function, since it is a sum of convex functions. Notice that for $u\ne 0$, the gradient of $\varvec{G}(.|{\varvec{\beta }})$ exists and is equal to

$$\begin{aligned} d=\varvec{G}^{\prime }(u|{\varvec{\beta }})= & {} -\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*}^{\prime }(r_i -x_{ij}(\beta _j-u))]x_{ij}\nonumber \\&\quad + \lambda _1\text{ sgn }(u)+\lambda _2u. \end{aligned}$$

(31)

If $u=0$, then d can be written as

$$\begin{aligned} d=-\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*}^{\prime }(r_i-x_{ij}(\beta _j-u))] x_{ij}+ \lambda _1z+\lambda _2u, \end{aligned}$$

(32)

with $z\in (-1,1)$.

In our coordinate MM descent algorithm, the function to be minimized is

$$\begin{aligned} F(u|{\varvec{\beta }})= & {} cte+\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*}^{\prime } (r_i)x_{ij}(\beta _j-u) \\&+ \,\frac{1}{\delta }(\beta _j-u)^2+\lambda _1|u|+\frac{\lambda _2}{2}u^2. \end{aligned}$$

Then any minimizer u of $F(.|{\varvec{\beta }})$ will satisfy the equation

$$\begin{aligned} 0= & {} -\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*}^{\prime }(r_i)x_{ij} -\frac{2}{\delta }(\beta _j-u) + \lambda _1\text{ sgn }(u)+\lambda _2u \nonumber \\= & {} -\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*}^{\prime }(y_i-\beta _0 -{\varvec{\beta }}_{(-j)}^T\mathbf {x}_{i(-j)}-x_{ij}\beta _j)x_{ij} \nonumber \\&\quad -\frac{2}{\delta }(\beta _j-u) + \lambda _1\text{ sgn }(u)+\lambda _2u, \end{aligned}$$

(33)

where ${\varvec{\beta }}_{(-j)}^T\mathbf {x}_{i(-j)}=\sum _{\ell \ne j}\beta _{\ell }x_{i\ell }$. Since $\varvec{G}(.|{\varvec{\beta }})$ is minimized at $u_0$, by (32), we have

$$\begin{aligned} 0= & {} -\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*}^{\prime }(y_i-\beta _0-{\varvec{\beta }}_{(-j)}^T\mathbf {x}_{i(-j)}-x_{ij}(u_0+\nu ))x_{ij} \nonumber \\&\quad -\frac{2}{\delta }(u_0+\nu -u_0) + \lambda _1\text{ sgn }(u_0)+\lambda _2u_0. \end{aligned}$$

(34)

If $u_0=0$, the latter equation is true for some value of $\text{ sgn }(u_0)\in (-1,1)$. Now, let $d_0$ be the sub-gradient of $\varvec{G}(.|{\varvec{\beta }})$ at $u_0$ and defined by

$$\begin{aligned} d_0= & {} -\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*}^{\prime }(y_i-\beta _0 -{\varvec{\beta }}_{(-j)}^T\mathbf {x}_{i(-j)}-x_{ij}u_0)x_{ij}\nonumber \\&\quad + \lambda _1\text{ sgn }(u_0)+\lambda _2u_0, \end{aligned}$$

(35)

where $\text{ sgn }(u_0)\in (-1,1)$. From (34) and (35), one can write

$$\begin{aligned} d_0=\frac{2}{\delta }\nu + \frac{1}{n}\sum _{i=1}^n[\rho _{\tau ,*}^{\prime }(r_i^{(0)}-x_{ij}\nu ) -\rho _{\tau ,*}^{\prime }(r_i^{(0)})]x_{ij}, \end{aligned}$$

where $r_i^{(0)}=y_i-\beta _0-{\varvec{\beta }}_{(-j)}^T\mathbf {x}_{i(-j)} -x_{ij}u_0$. From equation (31), we can write

$$\begin{aligned}&\varvec{G}(u_0+\nu |{\varvec{\beta }})-\varvec{G} (u_0|{\varvec{\beta }})\ge d_0\nu \nonumber \\&\quad =\left( \frac{2}{\delta }\nu + \frac{1}{n}\sum _{i=1}^n\left[ \rho _{\tau ,*}^{\prime }(r_i^{(0)}-x_{ij}\nu ) -\rho _{\tau ,*}^{\prime }(r_i^{(0)})\right] x_{ij}\right) \nu .\nonumber \\ \end{aligned}$$

(36)

From Proposition 2, we have

$$\begin{aligned} -\frac{|x_{ij}|\nu }{\delta }\le \rho _{\tau ,*}^{\prime }(r_i^{(0)}-x_{ij}\nu ) -\rho _{\tau ,*}^{\prime }(r_i^{(0)})\le \frac{|x_{ij}|\nu }{\delta }. \end{aligned}$$

We assume that for fixed j, we have $x_{ij}\ge 0$ for $i=1,\ldots , n_0$ and $x_{ij}< 0$ for $i=n_0+1,\ldots , n$. Then, we get

$$\begin{aligned}&[\rho _{\tau ,*}^{\prime }(r_i^{(0)}-x_{ij}\nu )-\rho _{\tau ,*}^{\prime }(r_i^{(0)})] x_{ij}\ge -\frac{|x_{ij}|\nu }{\delta }x_{ij}\\&\quad = -\frac{x_{ij}^2\nu }{\delta } \qquad \text{ for } \quad i=1,\ldots , n_0, \end{aligned}$$

and

$$\begin{aligned}&[\rho _{\tau ,*}^{\prime }(r_i^{(0)}-x_{ij}\nu )-\rho _{\tau ,*}^{\prime }(r_i^{(0)})] x_{ij}\ge \frac{|x_{ij}|\nu }{\delta }x_{ij}\\&\quad = -\frac{x_{ij}^2\nu }{\delta } \qquad \text{ for } \quad i=n_0+1,\ldots , n. \end{aligned}$$

Thus, from (36) we obtain that

$$\begin{aligned}&\varvec{G}(u_0+\nu |{\varvec{\beta }})-\varvec{G}(u_0| {\varvec{\beta }})\\&\quad \ge \displaystyle \frac{2\nu ^2}{\delta }+\displaystyle \frac{\nu }{n} \displaystyle \sum _{i=1}^{n_0} \{\rho _{\tau ,*}^{\prime }(r_i^{(0)}-x_{ij}\nu )-\rho _{\tau ,*}^{\prime }(r_i^{(0)})\} x_{ij}\\&\qquad + \displaystyle \frac{\nu }{n}\displaystyle \sum _{i=n_0+1}^{n} \{\rho _{\tau ,*}^{\prime }(r_i^{(0)}-x_{ij}\nu ) -\rho _{\tau ,*}^{\prime }(r_i^{(0)})\}x_{ij}\\&\quad = \displaystyle \frac{2\nu ^2}{\delta } - \displaystyle \frac{\nu ^2}{n\delta }\displaystyle \sum _{i=1}^{n_0}x_{ij}^2- \displaystyle \frac{\nu ^2}{n\delta }\displaystyle \sum _{i=n_0+1}^{n}x_{ij}^2\\&\quad = \displaystyle \frac{2\nu ^2}{\delta } - \displaystyle \frac{\nu ^2}{\delta }\left( \displaystyle \frac{1}{n} \displaystyle \sum _{i=1}^{n}x_{ij}^2\right) \\&\quad = \displaystyle \frac{\nu ^2}{\delta } \qquad \left( \text{ since } \quad \displaystyle \sum _{i=1}^{n}x_{ij}^2=1\right) . \end{aligned}$$

Thus, equation (30) holds for every $\beta _1,\ldots ,\beta _p$.

Now consider $\beta _0$, observe that

$$\begin{aligned}&\varvec{G}(u+\nu |{\varvec{\beta }}) -\varvec{G}(u|{\varvec{\beta }})\nonumber \\&\quad =\frac{1}{n}\sum _{i=1}^n \rho _{\tau ,*}(r_i^*-u-\nu )-\frac{1}{n}\sum _{i=1}^n\rho _{\tau ,*} (r_i^*-u)), \end{aligned}$$

where $r_i^*=y_i-\mathbf {x}_i^T{\varvec{\beta }}$. Then, using similar arguments as in (34) and (35), we have

$$\begin{aligned} 0=-\displaystyle \frac{1}{n}\displaystyle \sum _{i=1}^{n}\rho _{\tau ,*}^{\prime } (r_i^*-u_0-\nu )-\displaystyle \frac{2}{\delta }\nu , \end{aligned}$$

(37)

and the gradient of $\varvec{G}(.|{\varvec{\beta }})$ at $u_0$ is given by

$$\begin{aligned} d_0=-\displaystyle \frac{1}{n}\displaystyle \sum _{i=1}^{n} \rho _{\tau ,*}^{\prime }(r_i^*-u_0). \end{aligned}$$

(38)

As in equation (36), we can write

$$\begin{aligned}&\varvec{G}(u_0+\nu |{\varvec{\beta }})-\varvec{G} (u_0|{\varvec{\beta }})\\&\quad \ge -\displaystyle \frac{\nu }{n}\displaystyle \sum _{i=1}^{n} \rho _{\tau ,*}^{\prime }(r_i^*-u_0)\\&\quad = -\displaystyle \frac{\nu }{n}\displaystyle \sum _{i=1}^{n} [\rho _{\tau ,*}^{\prime }(r_i^*-u_0)\\&\qquad +\,\rho _{\tau ,*}^{\prime }(r_i^*-u_0-\nu ) -\rho _{\tau ,*}^{\prime }(r_i^*-u_0-\nu )]\\&\quad = -\displaystyle \frac{\nu }{n} \displaystyle \sum _{i=1}^{n}\rho _{\tau ,*}^{\prime }(r_i^*-u_0-\nu ) \\&\qquad +\,\displaystyle \frac{\nu }{n}\displaystyle \sum _{i=1}^{n} [\rho _{\tau ,*}^{\prime }(r_i^*-u_0-\nu )-\rho _{\tau ,*}^{\prime }(r_i^*-u_0)]. \end{aligned}$$

Now, using equation (37) and Proposition 2 for the first and second terms of the latter inequality, respectively, we obtain

$$\begin{aligned} \varvec{G}(u_0+\nu |{\varvec{\beta }})-\varvec{G}(u_0 |{\varvec{\beta }})\ge \displaystyle \frac{2\nu ^2}{\delta }-\displaystyle \frac{\nu ^2}{\delta }= \displaystyle \frac{\nu ^2}{\delta }. \end{aligned}$$

(39)

Finally, equation (30) holds with $\theta =1/\delta $ for $\beta _0,\beta _1,\ldots ,\beta _p$. Then, we have

$$\begin{aligned} \begin{array}{lll} \varvec{G}({\varvec{\beta }}_j^{m-1}|{\varvec{\beta }}) -\varvec{G}({\varvec{\beta }}_{j+1}^{m-1}|{\varvec{\beta }}) &{}\ge &{} \theta (\beta _{j+1}^m-\beta _{j+1}^{m+1})^2\\ &{}=&{}\theta \Vert {\varvec{\beta }}_j^{m}-{\varvec{\beta }}_j^{m-1} \Vert _2^2. \end{array} \end{aligned}$$

(40)

Consequently, apply (40) over all coordinates, we have for all m

$$\begin{aligned} \varvec{G}({\varvec{\beta }}^{m}|{\varvec{\beta }}) -\varvec{G}({\varvec{\beta }}^{m+1}|{\varvec{\beta }}) \ge \theta \Vert {\varvec{\beta }}^{m}-{\varvec{\beta }}^{m+1}\Vert _2^2. \end{aligned}$$

(41)

This completes the proof of the Lemma 6.1. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mkhadri, A., Ouhourane, M. & Oualkacha, K. A coordinate descent algorithm for computing penalized smooth quantile regression. Stat Comput 27, 865–883 (2017). https://doi.org/10.1007/s11222-016-9659-9

Download citation

Received: 10 July 2015
Accepted: 21 April 2016
Published: 10 May 2016
Issue Date: July 2017
DOI: https://doi.org/10.1007/s11222-016-9659-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A coordinate descent algorithm for computing penalized smooth quantile regression

Abstract

Access this article

Similar content being viewed by others

Variable selection via generalized SELO-penalized linear regression models

Advanced algorithms for penalized quantile and composite quantile regression

A unified penalized method for sparse additive quantile models: an RKHS approach

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Proof of Proposition 1

Appendix 2

Proof of Proposition 2

Appendix 3

Proof of Theoreme 3.1

Lemma 1

Proof of Lemma

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A coordinate descent algorithm for computing penalized smooth quantile regression

Abstract

Access this article

Similar content being viewed by others

Variable selection via generalized SELO-penalized linear regression models

Advanced algorithms for penalized quantile and composite quantile regression

A unified penalized method for sparse additive quantile models: an RKHS approach

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1

Proof of Proposition 1

Appendix 2

Proof of Proposition 2

Appendix 3

Proof of Theoreme 3.1

Lemma 1

Proof of Lemma

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation