Abstract
The computation of penalized quantile regression estimates is often computationally intensive in high dimensions. In this paper we propose a coordinate descent algorithm for computing the penalized smooth quantile regression (cdaSQR) with convex and nonconvex penalties. The cdaSQR approach is based on the approximation of the objective check function, which is not differentiable at zero, by a modified check function which is differentiable at zero. Then, using the maximization-minimization trick of the gcdnet algorithm (Yang and Zou in, J Comput Graph Stat 22(2):396–415, 2013), we update each coefficient simply and efficiently. In our implementation, we consider the convex penalties \(\ell _1+\ell _2\) and the nonconvex penalties SCAD (or MCP) \(+ \ell _2\). We establishe the convergence property of the csdSQR with \(\ell _1+\ell _2\) penalty. The numerical results show that our implementation is an order of magnitude faster than its competitors. Using simulations we compare the speed of our algorithm to its competitors. Finally, the performance of our algorithm is illustrated on three real data sets from diabetes, leukemia and Bardet–Bidel syndrome gene expression studies.
Similar content being viewed by others
References
Aravkin, A., Kambadur, A., Lozano, A. C., Luss, R.: Sparse Quantile Huber Regression for Efficient and Robust Estimation (2014) ArXiv:1402.4624v1
Belloni, A., Chernozhukov, V.: L\(_1\)-Penalized quantile regression in high-dimensional sparse models. Ann. Stat. 39, 82–130 (2011)
Breheny, P., Huang, J.: Coordinate descent algorithms for non nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5(1), 232 (2011)
Briollais, L., Durrieu, G.: Application of quantile regression to recent genetic and -omic studies. Hum. Genet. 133, 951–966 (2014)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–499 (2004)
El Anbari, M., Mkhadri, A.: Penalized regression combing the L1 norm and a correlation based penalty. Sankhya B 76(1), 82–102 (2014)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)
Hebiri, M., van de Geer, S.: The Smooth-Lasso and other \(l_1 + l_2\)-penalized methods. Electron. J. Stat. 5, 1184–1226 (2011)
Huang, P., Breheny, S., Zhang, C.H.: The Mnet method for variable selection. Technical Report N 402, Department of Statistics, Iowa University (2010)
Hunter, D.R., Lange, K.: Quantile regression via an MM algorithm. J. Comput. Graph. Stat. 19, 60–77 (2000)
Jennings, L.S., Wong, K.H., Teo, K.L.: Optimal control computation to account for electric movement. J. Aust. Math. Soc. B 38, 182–193 (1996)
Jiang, D., Huang, J.: Majorization minimization by coordinate descent for concave penalized generalized linear models. Department of Biostatistics, University of Iowa, Report No 412 (2012)
Koenker, R.: Quantile Regression. Cambridge University Press, Cambridge (2005)
Koenker, R., Basset, G.: Regression quantiles. Econometrica 46, 33–50 (1970)
Li, C., Wei, Y., Chappell, R., He, X.: Best line quantile regression with application to an allometric study of land mam-mals? speed and mass. Biometrics 67(1), 242–249 (2011)
Li, Y., Zhu, J.: \(L_{1}\)-norm quantile regression. J. Compu. Graph. Stat. 17, 163–185 (2008)
Mazumder, R., Friedman, J.H., Hastie, T.: SparseNet: coordinate descent with nonconvex penal- ties. J. Am. Stat. Assoc. 106, 1125–1138 (2011)
Oh, S.-K., Lee, T.C.M., Nychka, D.W.: Fast nonparametric quantile regression with arbitrary smoothing methods. J. Compu. Graph. Stat. 20, 510–526 (2011)
Peng, B., Wang, L.: An iterative coordinate descent algorithm for high-dimensional nonconvex penalized quantile regression. J. Compu. Graph. Stat. 24, 676–694 (2015)
Scheetz, T., Kim, K., Swiderski, R., Philp, A., Braun, T., Knudtson, K., Dorrance, A., DiBona, G., Huang, J., Casavant, T., Sheffield, V., Stone, E.: Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc. Natl. Acad. Sci. USA 103(39), 14429–14434 (2006)
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse group lasso. J. Comput. Graph. Stat. 22, 231–245 (2013)
Slawski, M.: The structured elastic net for quantile regression and support vector classification. Stat. Comput. 22(1), 153–168 (2012)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Assoc B 67, 91–108 (2005)
Wu, Y., Liu, Y.: Variable selection in quantile regression. Stati. Sin. 19, 801–817 (2009)
Yang, Y., Zou, H.: An efficient algorithm for computing The HHSVM and its generalizations. J. Comput. Graph. Stat. 22(2), 396–415 (2013)
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Zhao, G.H., Teo, K.L., Chan, K.S.: Estimation of conditional quantiles by a new smoothing approximation of asymmetric loss function. Stat. Comput. 15, 5–11 (2005)
Zheng, S.: Gradient descent algorithms for quantile regression with smooth approximation. Int. J. Mach. Learn. Cybern. 2, 191–207 (2011)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic-net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Acknowledgments
We warmly thank the reviewers for their careful reading of the previous version of our paper and their helpful comments. This work is partially supported by Centre National pour la Recherche Scientifique et Technique (Morocco) project URAC01 to Abdallah Mkhadri and by the Natural Sciences and Engineering Research Council of Canada and Fonds de recherche du Québec\(-\)Santé grant FRQS\(-\)31110 to Karim Oualkacha.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1
Proof of Proposition 1
The proof of the point (a) is omitted since it is based on simple algebra. We will detail the proof of the point (b) on the convexity of the only function \(\rho \) defined in equation (5) (the proof for the function (4) is similar and is omitted). For that, it suffices to show that for any u and \(v\in ]-\infty ,-(1-\tau )k[\cup [-(1-\tau )k, \tau k[\cup ]\tau k, +\infty [=I_1\cup I_2\cup I_3 \), we have
It is clear that this inequality is satisfied if u and v are both taken in the same interval \(I_j\) (\(j=1, 2, 3\)), since \(\rho (u)\) is a quadratic function of u or a function of the absolute value of u. So, we establish the proposition for any \((u,v)\in I_j\times I_{j'}\) for \(j\ne j'\). Thus, we have six cases to distinguish.
-
1.
If \(v\in I_1\) and \(u\in I_2\), then
$$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = -(1-\tau )v-\frac{k(1-\tau )^2}{2}-\frac{1}{2k}u^2-\frac{u}{k}(v-u)\\&\quad = (1-\tau )[-v-\frac{k(1-\tau )}{2}]- \frac{v^2}{2k}+\left( \frac{u}{\sqrt{2k}}-\frac{v}{\sqrt{2k}}\right) ^2\\&\quad = \frac{1}{2k}[(u-v)^2-(v+k(1-\tau ))^2]\\&\quad = \frac{1}{2k}[(u-2v-k(1-\tau ))(u-v+v+k(1-\tau ))] \\&\quad = \frac{1}{2k}[(u-2v-k(1-\tau ))(u+k(1-\tau ))]. \\&\quad \ge 0 \end{aligned}$$The last inequality is obtained from the fact that \(u\in I_2=[-(1-\tau )k,\tau k]\). So \((u+k(1-\tau ))\ge 0\), which implies that \(u-2v-k(1-\tau ) \ge -k(1-\tau ) -2v-k(1-\tau )=-2[v+k(1-\tau )] > 0,\) since \(v<-(1-\tau )k\).
-
2.
If \(v\in I_3\) and \(u\in I_2\), then
$$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = \tau v-\frac{u\tau ^2}{2}-\frac{1}{2k}u^2-\frac{u}{k}(v-u)\\&\quad = \tau v-\frac{u\tau ^2}{2}-\frac{1}{2k}u^2-\frac{uv}{k}+\frac{u^2}{k} + \frac{v^2}{2k}-\frac{v^2}{2k}\\&\quad = \frac{1}{2k}[(u-v)^2-(v-k\tau )^2]\\&\quad = \frac{1}{2k}[(u-2v+k\tau )(u-k\tau )] \\&\quad \ge 0. \end{aligned}$$The last inequality comes from the fact that \(u\in I_2\), so \(u-k\tau \le 0\). It implies that \(u-2v+k\tau \le k\tau -2v+k\tau =2(k\tau -v) \le 0 \), since \(v\in I_3\).
-
3.
If \(u\in I_3\) and \(v\in I_1\), then
$$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = -(1-\tau )v-\frac{k(1-\tau )^2}{2} -\tau u+\frac{k\tau ^2}{2}-\tau (v-u)\\&\quad = -(1-\tau )v-\frac{k(1-\tau )^2}{2}+\frac{k\tau ^2}{2}-\tau v\\&\quad = -v -\frac{k(1-\tau )-k\tau }{2}\\&\quad \ge \frac{2k(1-\tau )-k(1-2\tau )}{2} \quad \text{ since } \quad -v\ge (1-\tau )k\\&\quad = \frac{k}{2} \ge 0 . \end{aligned}$$ -
4.
If \(v\in I_3\) and \(u\in I_1\) , as in case 3.), we have
$$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = \tau v-\frac{k\tau ^2}{2}+(1-\tau )u \\&\qquad + \frac{k(1-\tau )^2}{2}+(1-\tau )(v-u)\\&\quad = \frac{k(1-2\tau )}{2}+v \\&\quad \ge \frac{k(1-2\tau )+2\tau k}{2}\\&\quad =\frac{k}{2} \quad \text{ since } \quad v>\tau k\\&\quad \ge 0 . \end{aligned}$$ -
5.
If \(v\in I_2\) and \(u\in I_1\), then we have
$$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = \frac{1}{2k}v^2+(1-\tau )u\\&\qquad +\frac{k(1-\tau )^2}{2}+(1-\tau )(v-u)\\&\quad =\frac{1}{2k}v^2+\frac{k(1-\tau )^2}{2}+(1-\tau )v\\&\quad =\left[ \frac{v}{\sqrt{2k}}+\sqrt{k/2}(1-\tau )\right] ^2 \ge 0. \end{aligned}$$ -
6.
If \(v\in I_2\) and \(u\in I_3\) , then we have as in case 5.)
$$\begin{aligned}&\rho (v)-\rho (u) -\rho '(u)(v-u)\\&\quad = \frac{1}{2k}v^2-\tau u+\frac{k\tau ^2}{2}-\tau (v-u)\\&\quad = \left[ \frac{v}{\sqrt{2k}}-\sqrt{k/2}\tau \right] ^2 \ge 0. \end{aligned}$$
This completes the proof of Proposition 1. \(\square \)
Appendix 2
Proof of Proposition 2
We will show that the two functions \( \rho _{\tau ,c}(u)\) and \( \rho _{\tau ,k}(u)\) are differentiable and have a Lipschitz continuous first derivative. To do so, one can calculate the derivatives of the functions \( \rho _{\tau ,c}(u)\) and \( \rho _{\tau ,k}(u)\) as
and
After some algebra, these derivatives satisfy
and
Therefore, one can verify that
and
which ends the proof. \(\square \)
Appendix 3
Proof of Theoreme 3.1
The function \(\varvec{G}\) is a convex function since it the sum of convex functions \(\rho _{\tau ,*}\) (see Proposition 1) and the enet penalty \(P_{\lambda _1,\lambda _2}\). This completes the proof of point 1.).
The proof of the point 2.) of Theorem 3.1 relies on the following result which will be proved in Lemma 1: for all m we have
Since the sequence \(\varvec{G}(\bar{{\varvec{\beta }}}^{m})\) is decreasing and is bounded below [e.g. \(0 \le \varvec{G}(\bar{{\varvec{\beta }}}^{m})\), for all m], then it converges. Thus, from (19) one can verify that the sequence generated by Algorithm 1 cannot cycle without convergence; i.e. it must have a unique limit point. This completes the proof of the point 2.). \(\square \)
Now we will prove the point 3.). In the sequel, the parameter \(\tau \) is omitted in the parameter vector \({\varvec{\beta }}\) and in \(\beta _0\) to simplify the presentation of the proof. For any \(\bar{{\varvec{\beta }}} = (\beta _0, \beta _1,\ldots ,\beta _p)^\top \) and \(\varvec{\gamma }_j = (0, \ldots , 0,\gamma _j, 0, \ldots , 0)^\top \in I\!\!R^{p+1}\), we have
for \(j\in {1,\ldots ,p}\), with
Assume a subsequence \(\bar{{\varvec{\beta }}}^{n_k} \rightarrow \bar{{\varvec{\beta }}}^\infty = (\beta _0^{\infty },\ldots ,\beta _p^{\infty })^\top \), by equation (19), the successive differences converge to zero (i.e. \(\bar{{\varvec{\beta }}}^{n_k} - \bar{{\varvec{\beta }}}^{n_k-1} \rightarrow 0\)). Thus, as \(k \rightarrow \infty \), we have
By (21) and (22), we have for \(j\in \{1,\ldots ,p\}\)
By the coordinate-wise minimum of j-th coordinate \(j\in \{1,\ldots ,p\}\), one has
with \({\varvec{\beta }}_j^{n_k-1} = (\beta _1^{n_k}, \ldots , \beta _{j-1}^{n_k},\beta _j^{n_k},\beta _{j+1}^{n_k-1},\ldots , \beta _p^{n_k-1})^\top \). From (23, 24) one can write, for all \(j\in \{1,\ldots ,p\}\)
By (20, 25), for \(j\in \{1,\ldots ,p\}\), we have
For \(j=0\), following the same above arguments, one can easily verify that
Thus for \({\varvec{\gamma }} = (\gamma _0, \ldots ,\gamma _p)^\top \in I\!\!R^{p+1}\), we have
where the last inequality (\(\ge \)0) is obtained using equations (26) and (27). This limit point is a global minimum since the function G is convex, which completes the proof of (3).\(\square \)
Lemma 1
Let \(G(.|\beta _0, {\varvec{\beta }})\) and \(F(.|\beta _0, {\varvec{\beta }})\) be defined as in Theorem 1 and let \(\varvec{G}(\bar{{\varvec{\beta }}})\) the global objective function defined by (6). Let \(\bar{{\varvec{\beta }}}^m_\tau = \{ \beta _0^m(\tau ), {\varvec{\beta }}^m_\tau \}^\top \) be a sequence of iterates generated by the iteration map of Algorithm 1 of our MM coordinate descente algorithm.
Then the sequence \(\bar{{\varvec{\beta }}}^m\) satisfies equation (19) for all m, i.e.
Proof of Lemma
The objective function to minimize for each coordinate \(j = 1,\ldots ,p\) can be written as
where \(r_i=y_i-\beta _0-\mathbf {x}_i^T{\varvec{\beta }}\) and the parameter \(\tau \) is omitted in the parameter vector \({\varvec{\beta }}\) to simplify the presentation of the proof. Let \(u_0\) be the minimizer of \(\varvec{G}(.|{\varvec{\beta }})\) with respect the specified coordinate \(j=1, \ldots , p\). Following the same argument as in Mazumber et al. 2011 and Jiang and Huang (2012), we need to show that
where \(\theta >0\) and \(\nu \) a small real value. In fact, we notice that \(\varvec{G}(.|{\varvec{\beta }})\) is a convex function, since it is a sum of convex functions. Notice that for \(u\ne 0\), the gradient of \(\varvec{G}(.|{\varvec{\beta }})\) exists and is equal to
If \(u=0\), then d can be written as
with \(z\in (-1,1)\).
In our coordinate MM descent algorithm, the function to be minimized is
Then any minimizer u of \(F(.|{\varvec{\beta }})\) will satisfy the equation
where \({\varvec{\beta }}_{(-j)}^T\mathbf {x}_{i(-j)}=\sum _{\ell \ne j}\beta _{\ell }x_{i\ell }\). Since \(\varvec{G}(.|{\varvec{\beta }})\) is minimized at \(u_0\), by (32), we have
If \(u_0=0\), the latter equation is true for some value of \(\text{ sgn }(u_0)\in (-1,1)\). Now, let \(d_0\) be the sub-gradient of \(\varvec{G}(.|{\varvec{\beta }})\) at \(u_0\) and defined by
where \(\text{ sgn }(u_0)\in (-1,1)\). From (34) and (35), one can write
where \(r_i^{(0)}=y_i-\beta _0-{\varvec{\beta }}_{(-j)}^T\mathbf {x}_{i(-j)} -x_{ij}u_0\). From equation (31), we can write
From Proposition 2, we have
We assume that for fixed j, we have \(x_{ij}\ge 0\) for \(i=1,\ldots , n_0\) and \(x_{ij}< 0\) for \(i=n_0+1,\ldots , n\). Then, we get
and
Thus, from (36) we obtain that
Thus, equation (30) holds for every \(\beta _1,\ldots ,\beta _p\).
Now consider \(\beta _0\), observe that
where \(r_i^*=y_i-\mathbf {x}_i^T{\varvec{\beta }}\). Then, using similar arguments as in (34) and (35), we have
and the gradient of \(\varvec{G}(.|{\varvec{\beta }})\) at \(u_0\) is given by
As in equation (36), we can write
Now, using equation (37) and Proposition 2 for the first and second terms of the latter inequality, respectively, we obtain
Finally, equation (30) holds with \(\theta =1/\delta \) for \(\beta _0,\beta _1,\ldots ,\beta _p\). Then, we have
Consequently, apply (40) over all coordinates, we have for all m
This completes the proof of the Lemma 6.1. \(\square \)
Rights and permissions
About this article
Cite this article
Mkhadri, A., Ouhourane, M. & Oualkacha, K. A coordinate descent algorithm for computing penalized smooth quantile regression. Stat Comput 27, 865–883 (2017). https://doi.org/10.1007/s11222-016-9659-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9659-9