Skip to main content

An effective method to reduce the computational complexity of composite quantile regression

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In this article, we aim to reduce the computational complexity of the recently proposed composite quantile regression (CQR). We propose a new regression method called infinitely composite quantile regression (ICQR) to avoid the determination of the number of uniform quantile positions. Unlike the composite quantile regression, our proposed ICQR method allows combining continuous and infinite quantile positions. We show that the proposed ICQR criterion can be readily transformed into a linear programming problem. Furthermore, the computing time of the ICQR estimate is far less than that of the CQR, though it is slightly larger than that of the quantile regression. The oracle properties of the penalized ICQR are also provided. The simulations are conducted to compare different estimators. A real data analysis is used to illustrate the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MATH  MathSciNet  Google Scholar 

  • Hunter DR, Lange K (2000) Quantile regression via an MM algorithm. J Comput Graph Stat 9(1):60–77

    MathSciNet  Google Scholar 

  • Jiang R, Qian W (2013) Composite quantile regression for nonparametric model with random censored data. Open J Stat 3:65–73

    Article  Google Scholar 

  • Jiang X, Jiang J, Song X (2012) Oracle model selection for nonloinear models based on weight composite quantile regression. Stat Sin 22:1479–1506

    MATH  Google Scholar 

  • Kai B, Li R (2010) Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. J R Stat Soc 72:49–69

    Article  MathSciNet  Google Scholar 

  • Koenker R (1984) A note on L-estimates for linear models. Stat Probab Lett 2(6):323–325

    Article  MATH  MathSciNet  Google Scholar 

  • Koenker R (2005) Quantile regression, vol 38. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Koenker R, Bassett Jr G (1978) Regression quantiles. Econometrica 46(1):33–50

  • Kozumi H, Kobayashi G (2011) Gibbs sampling methods for Bayesian quantile regression. J Stat Comput Simul 81(11):1565–1578

    Article  MATH  MathSciNet  Google Scholar 

  • Koenker R, Ng P, Portnoy S (1994) Quantile smoothing splines. Biometrika 81(4):673–680

    Article  MATH  MathSciNet  Google Scholar 

  • Reich BJ, Smith LB (2013) Bayesian quantile regression for censored data. Biometrics 69(3):651–660

    Article  MATH  MathSciNet  Google Scholar 

  • Tian M (2006) A quantile regression analysis of family background factor effects on mathematical achievement. J Data Sci 4(4):461–478

    Google Scholar 

  • Tian M, Chen G (2006) Hierarchical linear regression models for conditional quantiles. Sci China Ser A Math 49(12):1800–1815

    Article  MATH  MathSciNet  Google Scholar 

  • Tian Y, Tian M, Zhu Q (2014) Linear quantile regression based on EM algorithm. Commun Stat Theory Methods 43(16):3464–3484

    Article  MATH  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc B (Methodol) 58(1):267–288

  • Tsionas EG (2003) Bayesian quantile inference. J Stat Comput Simul 73(9):659–674

    Article  MATH  MathSciNet  Google Scholar 

  • Yu K, Moyeed RA (2001) Bayesian quantile regression. Stat Probab Lett 54(4):437–447

    Article  MATH  MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive LASSO and its oracle properties. J Am Stat Assoc 101(476):1418–1429

    Article  MATH  MathSciNet  Google Scholar 

  • Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36(3):1108–1126

Download references

Acknowledgements

The work was partially supported by the major research Projects of philosophy and social science of the Chinese Ministry of Education (No. 15JZD015), National Natural Science Foundation of China (No. 11271368), Project supported by the Major Program of Beijing Philosophy and Social Science Foundation of China (No. 15ZDA17), Project of Ministry of Education supported by the Specialized Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20130004110007), the Key Program of National Philosophy and Social Science Foundation Grant (No. 13AZD064), the Major Project of Humanities Social Science Foundation of Ministry of Education (No. 15JJD910001), the Fundamental Research Funds for the Central Universities, and the Research Funds of Renmin University of China (No. 15XNL008)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maozai Tian.

Appendices

Appendix A: Comparison of computational complexity between ICQR and CQR

For the ICQR estimation, the optimization problem (7) can be casted into the linear programming. First, we introduce the slack variables as follows:

$$\begin{aligned} \begin{aligned} u_i^+=&\max \left( y_i-\varvec{x}_i^T\varvec{\beta }-\widetilde{b}_i,0\right) ,~u_i^-=\max \left( -\left( y_i-\varvec{x}_i^T\varvec{\beta }-\widetilde{b}_i\right) ,0\right) ,~i=1,\ldots ,n,\\ \widetilde{b}_i^+=&\max \left( \widetilde{b}_i,0\right) ,~\widetilde{b}_i^-=\max \left( -\widetilde{b}_i,0\right) ,~i=1,\ldots ,n,\\ \beta _j^+=&\max \left( \beta _j,0\right) ,~\beta _j^-=\max \left( -\beta _j,0\right) ,~j=1,\ldots ,p. \end{aligned} \end{aligned}$$

Denote \(\varvec{\beta }^+=(\beta _1^+,\ldots ,\beta _p^+)^T\) and \(\varvec{\beta }^-=(\beta _1^-,\ldots ,\beta _p^-)^T\). With the aid of slack variables, solving the optimization problem (7) is equivalent to solving the following linear programming problem:

$$\begin{aligned} \begin{aligned} \min _{\varvec{z}\in \mathbb {R}^{2\left( p+2n\right) }}&\sum _{i=1}^n\widehat{w}_i\left( u_i^++u_i^-\right) \\ \text {s.t.}&y_i-\varvec{x}^T\left( \varvec{\beta }^+-\varvec{\beta }^-\right) -\left( \widetilde{b}_i^+-\widetilde{b}_i^-\right) =u_i^+-u_i^-,i=1,\ldots ,n,\\&\beta _j^+\ge 0, \beta _j^-\ge 0, j=1,\ldots ,p,\\&\widetilde{b}_i^+\ge 0, \widetilde{b}_i^-\ge 0, i=1,\ldots ,n,\\&u_i^+\ge 0, u_i^-\ge 0, i=1,\ldots ,n, \end{aligned} \end{aligned}$$
(10)

where \(\varvec{z}=\left( \left( \varvec{\beta }^+\right) ^T,\left( \varvec{\beta }^-\right) ^T,\left( \widetilde{\varvec{b}}^+\right) ^T,\left( \widetilde{\varvec{b}}^-\right) ^T,\left( \varvec{u}^+\right) ^T,\left( \varvec{u}^-\right) ^T\right) ^T\), \(\varvec{u}^+=\left( u_1^+,\ldots ,u_n^+\right) ^T\), \(\varvec{u}^-=\left( u_1^-,\ldots ,u_n^-\right) ^T\), \(\widetilde{\varvec{b}}^+=\left( \widetilde{b}_1^+,\ldots ,\widetilde{b}_n^+\right) ^T\), and \(\widetilde{\varvec{b}}^-=\left( \widetilde{b}_1^-,\ldots ,\widetilde{b}_n^-\right) ^T\).

Similarly, for the CQR estimation, minimizing the criterion (3) can also be transformed into the linear programming. Let

$$\begin{aligned} \begin{aligned} u_{ik}^{*+}=&\max \left( y_i-\varvec{x}_i^T\varvec{\beta }-b_{\tau _k},0\right) ,~u_{ik}^{*-}=\max \left( -\left( y_i-\varvec{x}_i^T\varvec{\beta }-b_{\tau _k}\right) ,0\right) ,\\&i=1,\ldots ,n,k=1,\ldots ,K,\\ b_k^{*+}=&\max \left( b_{\tau _k},0\right) ,~b_k^{*-}=\max \left( -b_{\tau _k},0\right) ,~k=1,\ldots ,K. \end{aligned} \end{aligned}$$

Then minimizing (3) is equivalent to solving the following linear programming problem:

$$\begin{aligned} \begin{aligned} \min _{\varvec{z}^*\in \mathbb {R}^{2\left( p+K+nK\right) }}&\sum _{k=1}^K\sum _{i=1}^n\tau _ku_{ik}^{*+}+\left( 1-\tau _k\right) u_{ik}^{*-}\\ \text {s.t.}&y_i-\varvec{x}^T\left( \varvec{\beta }^+-\varvec{\beta }^-\right) -\left( b_k^{*+}-b_k^{*-}\right) =u_{ik}^{*+}-u_{ik}^{*-},\\&\quad i=1,\ldots ,n,k=1,\ldots ,K,\\&\beta _j^+\ge 0, \beta _j^-\ge 0, j=1,\ldots ,p,\\&b_k^{*+}\ge 0, b_k^{*-}\ge 0, k=1,\ldots ,K,\\&u_{ik}^{*+}\ge 0, u_{ik}^{*-}\ge 0, i=1,\ldots ,n,k=1,\ldots ,K, \end{aligned} \end{aligned}$$
(11)

where \(\varvec{z}^*=\left( \left( \varvec{\beta }^+\right) ^T,\left( \varvec{\beta }^-\right) ^T,\left( \varvec{b}^{*+}\right) ^T,\left( \varvec{b}^{*-}\right) ^T,\left( \varvec{u}^{*+}\right) ^T,\left( \varvec{u}^{*-}\right) ^T\right) ^T,\, \varvec{u}^{*+}=\left( u_{11}^{*+},\ldots ,u_{n1}^{*+},\ldots , u_{1K}^{*+},\ldots ,u_{nK}^{*+}\right) ^T\), \(\varvec{u}^{*-}=\left( u_{11}^{*-},\ldots ,u_{n1}^{*-},\ldots ,u_{1K}^{*-},\ldots ,u_{nK}^{*-}\right) ^T\), \(\varvec{b}^{*+}=\left( b_1^{*+},\ldots ,b_K^{*+}\right) ^T\), and \(\varvec{b}^{*-}=\left( b_1^{*-},\ldots ,b_K^{*-}\right) ^T\).

Obviously, if \(K>1\) then the scale of the linear programming (10) is smaller than (11). Even for a moderately big K, the ICQR has computational advantage comparing to the CQR when using either simplex or interior point algorithm.

Appendix B: Proof of Theorem 1

For the ICQR estimator, we consider the following criterion:

$$\begin{aligned} \sum _{i=1}^nw_i\left| y_i-\varvec{x}_i^T\varvec{\beta }-\widetilde{b}_i\right| . \end{aligned}$$
(12)

Write \(\widehat{\varvec{\beta }}_n=\widehat{\varvec{\beta }}^{\text {ICQR}}\) and \(\widehat{\varvec{b}}_n=\widehat{\widetilde{\varvec{b}}}\). Let \(\varvec{\delta }_n=\sqrt{n}(\widehat{\varvec{\beta }}_n-\varvec{\beta })\) and \(\varvec{v}_n=\sqrt{n}(\widehat{\varvec{b}}_n-\widetilde{\varvec{b}})=(v_{n,1},\ldots ,v_{n,n})^T\). Denote \(\varvec{\theta }_n=(\varvec{\delta }_n^T,\varvec{v}_n^T)^T\). Let \(\varvec{\theta }\) be a vector. Minimizing expression (12) is equivalent to minimizing

$$\begin{aligned} Z_n(\varvec{\theta })=\sum _{i=1}^nw_i\left( \left| \widetilde{\varepsilon }_i-(\varvec{x}_i^T\varvec{\delta }+v_i)/\sqrt{n}\right| -|\widetilde{\varepsilon }_i|\right) . \end{aligned}$$

Obviously, the function \(Z_n(\varvec{\theta })\) is convex and \(\varvec{\theta }_n\) is its minimizer. Noting that

$$\begin{aligned} |x-y|-|x|=-y\text {sgn}(x)+2\int _0^y[I(x\le z)-I(x\le 0)]dz, \end{aligned}$$

where \(\text {sgn}(\cdot )\) denotes the sign function, we have

$$\begin{aligned}&E_{\varepsilon }Z_n(\varvec{\theta })\nonumber \\&\quad =\sum _{i=1}^n\left\{ - \frac{\varvec{x}_i^T\varvec{\delta }+v_i}{\sqrt{n}}E_{\varepsilon } \left[ w_i\text {sgn}(\widetilde{\varepsilon }_i)\right] + 2\int _0^{\frac{\varvec{x}_i^T\varvec{\delta }+v_i}{\sqrt{n}}}E_{\varepsilon } \left\{ w_i\left[ I\left( \widetilde{\varepsilon }_i\le z\right) -I\left( \widetilde{\varepsilon }_i\le 0\right) \right] \right\} dz\right\} . \end{aligned}$$

Under condition A2, we know that \(\widetilde{\varepsilon }_i\)’s are also independent and identically distributed. Moreover, the distribution of \(\widetilde{\varepsilon }_i\), denoted by \(\widetilde{f}(\cdot )\), also has an uniformly bounded and strictly positive density. Therefore, the convex function \(M(t)=E_{\varepsilon }[w_i(|\widetilde{\varepsilon }_i-t|-|\widetilde{\varepsilon }_i|)]\) has a unique minimizer at zero. We have \(E_{\varepsilon }[w_i\text {sgn}(\widetilde{\varepsilon }_i)]=0\). On the other hand,

$$\begin{aligned} E_{\varepsilon }\left\{ w_i\left[ I(\widetilde{\varepsilon }_i\le z)-I(\widetilde{\varepsilon }_i\le 0)\right] \right\} =\int _{-\infty }^zw_i\widetilde{f}(t)dt-\int _{-\infty }^0w_i\widetilde{f}(t)dt=w_0\widetilde{f}(0)z+o(z). \end{aligned}$$

Under condition A1(ii), we know \(\lim _{n\rightarrow \infty }\varvec{x}_i/n=\varvec{0},i=1,\ldots ,n\). And because of the condition A1(i), it is clear that \(E_{\varepsilon }Z_n(\varvec{\theta })=w_0\widetilde{f}(0)(\varvec{\delta }^T,v_i)D_1(\varvec{\delta }^T,v_i)^T+o(1)\) with \(D_1=\left( \begin{array}{cc}D&{}\varvec{0}\\ \varvec{0}&{}1\end{array}\right) \). Let

$$\begin{aligned} R_{in}(\varvec{\theta })=w_i\left( \left| \widetilde{\varepsilon }_i-(\varvec{x}_i^T\varvec{\delta }+v_i)/\sqrt{n}\right| -|\widetilde{\varepsilon }_i|\right) -w_i\text {sgn}\left( \widetilde{\varepsilon }_i\right) \left( \varvec{x}_i^T\varvec{\delta }+v_i\right) /\sqrt{n}, \end{aligned}$$

and

$$\begin{aligned} \varvec{W}_n=\frac{1}{\sqrt{n}}\sum _{i=1}^nw_i\text {sgn} \left( \widetilde{\varepsilon }_i\right) \left( \varvec{x}_i^T,1\right) ^T. \end{aligned}$$

Noting that \(E_{\varepsilon }[w_i\text {sgn}(\widetilde{\varepsilon }_i)]=0\), we have

$$\begin{aligned} Z_n(\varvec{\theta })= & {} w_0\widetilde{f}(0) \left( \varvec{\delta }^T,v_i\right) D_1\left( \varvec{\delta }^T,v_i\right) ^T+o(1)+ \varvec{W}_n^T\left( \varvec{\delta }^T,v_i\right) ^T\\&+\sum _{i=1}^n \left[ R_{in}(\varvec{\theta })-E_{\varepsilon }R_{in}(\varvec{\theta })\right] . \end{aligned}$$

Now we show that \(\sum _{i=1}^n\left[ R_{in}(\varvec{\theta })-E_{\varepsilon }R_{in}(\varvec{\theta })\right] \) = \(o_p(1)\). It is sufficient to show that \(\text {var}_{\varepsilon }\left\{ \sum _{i=1}^n\left[ R_{in}(\varvec{\theta })-E_{\varepsilon }R_{in}(\varvec{\theta })\right] \right\} =o_p(1)\). In fact, based on the inequality

$$\begin{aligned} |R_{in}(\varvec{\theta })|\le 2w_i\left| \left( \varvec{x}_i^T\varvec{\delta }+v_i\right) / \sqrt{n}\right| I\left( |\widetilde{\varepsilon }_i|\le | \left( \varvec{x}_i^T\varvec{\delta }+v_i\right) /\sqrt{n}|\right) \end{aligned}$$

we get

$$\begin{aligned} \begin{aligned}&\text {var}_{\varepsilon }\left\{ \sum _{i=1}^n\left[ R_{in}(\varvec{\theta })-E_{\varepsilon }R_{in}(\varvec{\theta })\right] \right\} \\&\quad =E_{\varepsilon }\left\{ \sum _{i=1}^n\left[ R_{in}(\varvec{\theta })-E_{\varepsilon }R_{in}(\varvec{\theta })\right] \right\} ^2 \le \sum _{i=1}^nE_{\varepsilon }[R_{in}^2(\varvec{\theta })]\\&\quad \le 4\sum _{i=1}^n\left| \left( \varvec{x}_i^T\varvec{\delta }+v_i\right) /\sqrt{n}\right| ^2E_{\varepsilon }\left[ w_i^2I\left( |\widetilde{\varepsilon }_i|\le |(\varvec{x}_i^T\varvec{\delta }+v_i)/\sqrt{n}|\right) \right] \\&\quad \le 4\left( \Vert \varvec{\delta }\Vert ^2+v_i^2\right) E_{\varepsilon }\left[ w_i^2I\left( |\widetilde{\varepsilon }_i|\le \sqrt{\Vert \varvec{\delta }\Vert ^2+v_i^2}\max _{i=1,\ldots ,n}\left( \sqrt{(\Vert \varvec{x}_i\Vert ^2+1)/n}\right) \right) \right] \\&\quad \quad \times \sum _{i=1}^n\left( \Vert \varvec{x}_i\Vert ^2+1\right) /n\\&\quad \le 4\left( \Vert \varvec{\delta }\Vert ^2+v_i^2\right) E_{\varepsilon }\left[ I\left( |\widetilde{\varepsilon }_i|\le \sqrt{\Vert \varvec{\delta }\Vert ^2+v_i^2}\max _{i=1,\ldots ,n}\left( \sqrt{(\Vert \varvec{x}_i\Vert /\sqrt{n})^2+1/n}\right) \right) \right] \\&\quad \quad \times \left( 1+\sum _{i=1}^n\Vert \varvec{x}_i\Vert ^2/n\right) . \end{aligned} \end{aligned}$$

The last inequality holds since \(w_i^2\le 1\). Under Assumption A1(ii), we have

$$\begin{aligned} \text {var}_{\varepsilon }\left\{ \sum _{i=1}^n\left[ R_{in} \left( \varvec{\theta }\right) -E_{\varepsilon }R_{in}\left( \varvec{\theta }\right) \right] \right\} =o_p(1). \end{aligned}$$

Therefore, we have

$$\begin{aligned} Z_n(\varvec{\theta })=w_0\widetilde{f}(0)\left( \varvec{\delta }^T,v_i\right) D_1\left( \varvec{\delta }^T,v_i\right) ^T+o(1)+\varvec{W}_n^T \left( \varvec{\delta }^T,v_i\right) ^T+o_p(1). \end{aligned}$$

The convexity of the limiting objective function above guarantees the uniqueness of the minimizer. We have

$$\begin{aligned} \left( \varvec{\delta }_n^T,v_{n,i}\right) ^T=-\frac{1}{2w_0\widetilde{f}(0)}D_1^{-1}\varvec{W}_n+o_p(1). \end{aligned}$$

Under Assumption A1(i) and using the equality \(\text {var}_{\varepsilon }[w_i\text {sgn}(\widetilde{\varepsilon }_i)]=E_{\varepsilon }(w_1^2)=1/12\), we have \(E_{\varepsilon }\varvec{W}_n=\varvec{0}\) and \(\text {var}_{\varepsilon }\varvec{W}_n\rightarrow D_1/12\) as \(n\rightarrow \infty \). By the Cramér–Wald device and CLT, we have \(\varvec{W}_n\mathop {\rightarrow }\limits ^{\mathcal {L}}\mathcal {N}\left( \varvec{0},D_1/12\right) \). Thus,

$$\begin{aligned} \left( \varvec{\delta }_n^T,v_{n,i}\right) ^T\mathop {\rightarrow }\limits ^{\mathcal {L}}\mathcal {N}\left( \varvec{0},\frac{1}{48w_0^2\widetilde{f}^2(0)}D_1^{-1}\right) . \end{aligned}$$

Combining all the results for \(i=1,\ldots ,n\) completes the proof.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Tian, M. An effective method to reduce the computational complexity of composite quantile regression. Comput Stat 32, 1375–1393 (2017). https://doi.org/10.1007/s00180-017-0749-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-017-0749-8

Keywords