An effective method to reduce the computational complexity of composite quantile regression

Wu, Yanke; Tian, Maozai

doi:10.1007/s00180-017-0749-8

An effective method to reduce the computational complexity of composite quantile regression

Original Paper
Published: 20 July 2017

Volume 32, pages 1375–1393, (2017)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Yanke Wu^1,2 &
Maozai Tian^2,3,4

469 Accesses
1 Citation
Explore all metrics

Abstract

In this article, we aim to reduce the computational complexity of the recently proposed composite quantile regression (CQR). We propose a new regression method called infinitely composite quantile regression (ICQR) to avoid the determination of the number of uniform quantile positions. Unlike the composite quantile regression, our proposed ICQR method allows combining continuous and infinite quantile positions. We show that the proposed ICQR criterion can be readily transformed into a linear programming problem. Furthermore, the computing time of the ICQR estimate is far less than that of the CQR, though it is slightly larger than that of the quantile regression. The oracle properties of the penalized ICQR are also provided. The simulations are conducted to compare different estimators. A real data analysis is used to illustrate the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Article Open access 07 July 2017

Entropy-Based Subsampling Methods for Big Data

Article 11 April 2024

Residuals-based distributionally robust optimization with covariate information

Article 26 September 2023

References

Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MATH MathSciNet Google Scholar
Hunter DR, Lange K (2000) Quantile regression via an MM algorithm. J Comput Graph Stat 9(1):60–77
MathSciNet Google Scholar
Jiang R, Qian W (2013) Composite quantile regression for nonparametric model with random censored data. Open J Stat 3:65–73
Article Google Scholar
Jiang X, Jiang J, Song X (2012) Oracle model selection for nonloinear models based on weight composite quantile regression. Stat Sin 22:1479–1506
MATH Google Scholar
Kai B, Li R (2010) Local composite quantile regression smoothing: an efficient and safe alternative to local polynomial regression. J R Stat Soc 72:49–69
Article MathSciNet Google Scholar
Koenker R (1984) A note on L-estimates for linear models. Stat Probab Lett 2(6):323–325
Article MATH MathSciNet Google Scholar
Koenker R (2005) Quantile regression, vol 38. Cambridge University Press, Cambridge
Book MATH Google Scholar
Koenker R, Bassett Jr G (1978) Regression quantiles. Econometrica 46(1):33–50
Kozumi H, Kobayashi G (2011) Gibbs sampling methods for Bayesian quantile regression. J Stat Comput Simul 81(11):1565–1578
Article MATH MathSciNet Google Scholar
Koenker R, Ng P, Portnoy S (1994) Quantile smoothing splines. Biometrika 81(4):673–680
Article MATH MathSciNet Google Scholar
Reich BJ, Smith LB (2013) Bayesian quantile regression for censored data. Biometrics 69(3):651–660
Article MATH MathSciNet Google Scholar
Tian M (2006) A quantile regression analysis of family background factor effects on mathematical achievement. J Data Sci 4(4):461–478
Google Scholar
Tian M, Chen G (2006) Hierarchical linear regression models for conditional quantiles. Sci China Ser A Math 49(12):1800–1815
Article MATH MathSciNet Google Scholar
Tian Y, Tian M, Zhu Q (2014) Linear quantile regression based on EM algorithm. Commun Stat Theory Methods 43(16):3464–3484
Article MATH MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc B (Methodol) 58(1):267–288
Tsionas EG (2003) Bayesian quantile inference. J Stat Comput Simul 73(9):659–674
Article MATH MathSciNet Google Scholar
Yu K, Moyeed RA (2001) Bayesian quantile regression. Stat Probab Lett 54(4):437–447
Article MATH MathSciNet Google Scholar
Zou H (2006) The adaptive LASSO and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Article MATH MathSciNet Google Scholar
Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36(3):1108–1126

Download references

Acknowledgements

The work was partially supported by the major research Projects of philosophy and social science of the Chinese Ministry of Education (No. 15JZD015), National Natural Science Foundation of China (No. 11271368), Project supported by the Major Program of Beijing Philosophy and Social Science Foundation of China (No. 15ZDA17), Project of Ministry of Education supported by the Specialized Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20130004110007), the Key Program of National Philosophy and Social Science Foundation Grant (No. 13AZD064), the Major Project of Humanities Social Science Foundation of Ministry of Education (No. 15JJD910001), the Fundamental Research Funds for the Central Universities, and the Research Funds of Renmin University of China (No. 15XNL008)

Author information

Authors and Affiliations

College of Science, Guangdong Ocean University, Zhanjiang, 524088, Guangdong, China
Yanke Wu
Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, 100872, China
Yanke Wu & Maozai Tian
School of Statistics, Lanzhou University of Finance and Economics, Lanzhou, 730101, Gansu, China
Maozai Tian
School of Statistics and Information, Xinjiang University of Finance and Economics, Ürümqi, 830012, Xinjiang, China
Maozai Tian

Authors

Yanke Wu
View author publications
You can also search for this author in PubMed Google Scholar
Maozai Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maozai Tian.

Appendices

Appendix A: Comparison of computational complexity between ICQR and CQR

For the ICQR estimation, the optimization problem (7) can be casted into the linear programming. First, we introduce the slack variables as follows:

$$\begin{aligned} \begin{aligned} u_i^+=&\max \left( y_i-\varvec{x}_i^T\varvec{\beta }-\widetilde{b}_i,0\right) ,~u_i^-=\max \left( -\left( y_i-\varvec{x}_i^T\varvec{\beta }-\widetilde{b}_i\right) ,0\right) ,~i=1,\ldots ,n,\\ \widetilde{b}_i^+=&\max \left( \widetilde{b}_i,0\right) ,~\widetilde{b}_i^-=\max \left( -\widetilde{b}_i,0\right) ,~i=1,\ldots ,n,\\ \beta _j^+=&\max \left( \beta _j,0\right) ,~\beta _j^-=\max \left( -\beta _j,0\right) ,~j=1,\ldots ,p. \end{aligned} \end{aligned}$$

Denote $\varvec{\beta }^+=(\beta _1^+,\ldots ,\beta _p^+)^T$ and $\varvec{\beta }^-=(\beta _1^-,\ldots ,\beta _p^-)^T$. With the aid of slack variables, solving the optimization problem (7) is equivalent to solving the following linear programming problem:

$$\begin{aligned} \begin{aligned} \min _{\varvec{z}\in \mathbb {R}^{2\left( p+2n\right) }}&\sum _{i=1}^n\widehat{w}_i\left( u_i^++u_i^-\right) \\ \text {s.t.}&y_i-\varvec{x}^T\left( \varvec{\beta }^+-\varvec{\beta }^-\right) -\left( \widetilde{b}_i^+-\widetilde{b}_i^-\right) =u_i^+-u_i^-,i=1,\ldots ,n,\\&\beta _j^+\ge 0, \beta _j^-\ge 0, j=1,\ldots ,p,\\&\widetilde{b}_i^+\ge 0, \widetilde{b}_i^-\ge 0, i=1,\ldots ,n,\\&u_i^+\ge 0, u_i^-\ge 0, i=1,\ldots ,n, \end{aligned} \end{aligned}$$

(10)

where $\varvec{z}=\left( \left( \varvec{\beta }^+\right) ^T,\left( \varvec{\beta }^-\right) ^T,\left( \widetilde{\varvec{b}}^+\right) ^T,\left( \widetilde{\varvec{b}}^-\right) ^T,\left( \varvec{u}^+\right) ^T,\left( \varvec{u}^-\right) ^T\right) ^T$, $\varvec{u}^+=\left( u_1^+,\ldots ,u_n^+\right) ^T$, $\varvec{u}^-=\left( u_1^-,\ldots ,u_n^-\right) ^T$, $\widetilde{\varvec{b}}^+=\left( \widetilde{b}_1^+,\ldots ,\widetilde{b}_n^+\right) ^T$, and $\widetilde{\varvec{b}}^-=\left( \widetilde{b}_1^-,\ldots ,\widetilde{b}_n^-\right) ^T$.

Similarly, for the CQR estimation, minimizing the criterion (3) can also be transformed into the linear programming. Let

$$\begin{aligned} \begin{aligned} u_{ik}^{*+}=&\max \left( y_i-\varvec{x}_i^T\varvec{\beta }-b_{\tau _k},0\right) ,~u_{ik}^{*-}=\max \left( -\left( y_i-\varvec{x}_i^T\varvec{\beta }-b_{\tau _k}\right) ,0\right) ,\\&i=1,\ldots ,n,k=1,\ldots ,K,\\ b_k^{*+}=&\max \left( b_{\tau _k},0\right) ,~b_k^{*-}=\max \left( -b_{\tau _k},0\right) ,~k=1,\ldots ,K. \end{aligned} \end{aligned}$$

Then minimizing (3) is equivalent to solving the following linear programming problem:

$$\begin{aligned} \begin{aligned} \min _{\varvec{z}^*\in \mathbb {R}^{2\left( p+K+nK\right) }}&\sum _{k=1}^K\sum _{i=1}^n\tau _ku_{ik}^{*+}+\left( 1-\tau _k\right) u_{ik}^{*-}\\ \text {s.t.}&y_i-\varvec{x}^T\left( \varvec{\beta }^+-\varvec{\beta }^-\right) -\left( b_k^{*+}-b_k^{*-}\right) =u_{ik}^{*+}-u_{ik}^{*-},\\&\quad i=1,\ldots ,n,k=1,\ldots ,K,\\&\beta _j^+\ge 0, \beta _j^-\ge 0, j=1,\ldots ,p,\\&b_k^{*+}\ge 0, b_k^{*-}\ge 0, k=1,\ldots ,K,\\&u_{ik}^{*+}\ge 0, u_{ik}^{*-}\ge 0, i=1,\ldots ,n,k=1,\ldots ,K, \end{aligned} \end{aligned}$$

(11)

where $\varvec{z}^*=\left( \left( \varvec{\beta }^+\right) ^T,\left( \varvec{\beta }^-\right) ^T,\left( \varvec{b}^{*+}\right) ^T,\left( \varvec{b}^{*-}\right) ^T,\left( \varvec{u}^{*+}\right) ^T,\left( \varvec{u}^{*-}\right) ^T\right) ^T,\, \varvec{u}^{*+}=\left( u_{11}^{*+},\ldots ,u_{n1}^{*+},\ldots , u_{1K}^{*+},\ldots ,u_{nK}^{*+}\right) ^T$, $\varvec{u}^{*-}=\left( u_{11}^{*-},\ldots ,u_{n1}^{*-},\ldots ,u_{1K}^{*-},\ldots ,u_{nK}^{*-}\right) ^T$, $\varvec{b}^{*+}=\left( b_1^{*+},\ldots ,b_K^{*+}\right) ^T$, and $\varvec{b}^{*-}=\left( b_1^{*-},\ldots ,b_K^{*-}\right) ^T$.

Obviously, if $K>1$ then the scale of the linear programming (10) is smaller than (11). Even for a moderately big K, the ICQR has computational advantage comparing to the CQR when using either simplex or interior point algorithm.

Appendix B: Proof of Theorem 1

For the ICQR estimator, we consider the following criterion:

$$\begin{aligned} \sum _{i=1}^nw_i\left| y_i-\varvec{x}_i^T\varvec{\beta }-\widetilde{b}_i\right| . \end{aligned}$$

(12)

Write $\widehat{\varvec{\beta }}_n=\widehat{\varvec{\beta }}^{\text {ICQR}}$ and $\widehat{\varvec{b}}_n=\widehat{\widetilde{\varvec{b}}}$. Let $\varvec{\delta }_n=\sqrt{n}(\widehat{\varvec{\beta }}_n-\varvec{\beta })$ and $\varvec{v}_n=\sqrt{n}(\widehat{\varvec{b}}_n-\widetilde{\varvec{b}})=(v_{n,1},\ldots ,v_{n,n})^T$. Denote $\varvec{\theta }_n=(\varvec{\delta }_n^T,\varvec{v}_n^T)^T$. Let $\varvec{\theta }$ be a vector. Minimizing expression (12) is equivalent to minimizing

$$\begin{aligned} Z_n(\varvec{\theta })=\sum _{i=1}^nw_i\left( \left| \widetilde{\varepsilon }_i-(\varvec{x}_i^T\varvec{\delta }+v_i)/\sqrt{n}\right| -|\widetilde{\varepsilon }_i|\right) . \end{aligned}$$

Obviously, the function $Z_n(\varvec{\theta })$ is convex and $\varvec{\theta }_n$ is its minimizer. Noting that

$$\begin{aligned} |x-y|-|x|=-y\text {sgn}(x)+2\int _0^y[I(x\le z)-I(x\le 0)]dz, \end{aligned}$$

where $\text {sgn}(\cdot )$ denotes the sign function, we have

$$\begin{aligned}&E_{\varepsilon }Z_n(\varvec{\theta })\nonumber \\&\quad =\sum _{i=1}^n\left\{ - \frac{\varvec{x}_i^T\varvec{\delta }+v_i}{\sqrt{n}}E_{\varepsilon } \left[ w_i\text {sgn}(\widetilde{\varepsilon }_i)\right] + 2\int _0^{\frac{\varvec{x}_i^T\varvec{\delta }+v_i}{\sqrt{n}}}E_{\varepsilon } \left\{ w_i\left[ I\left( \widetilde{\varepsilon }_i\le z\right) -I\left( \widetilde{\varepsilon }_i\le 0\right) \right] \right\} dz\right\} . \end{aligned}$$

Under condition A2, we know that $\widetilde{\varepsilon }_i$’s are also independent and identically distributed. Moreover, the distribution of $\widetilde{\varepsilon }_i$, denoted by $\widetilde{f}(\cdot )$, also has an uniformly bounded and strictly positive density. Therefore, the convex function $M(t)=E_{\varepsilon }[w_i(|\widetilde{\varepsilon }_i-t|-|\widetilde{\varepsilon }_i|)]$ has a unique minimizer at zero. We have $E_{\varepsilon }[w_i\text {sgn}(\widetilde{\varepsilon }_i)]=0$. On the other hand,

$$\begin{aligned} E_{\varepsilon }\left\{ w_i\left[ I(\widetilde{\varepsilon }_i\le z)-I(\widetilde{\varepsilon }_i\le 0)\right] \right\} =\int _{-\infty }^zw_i\widetilde{f}(t)dt-\int _{-\infty }^0w_i\widetilde{f}(t)dt=w_0\widetilde{f}(0)z+o(z). \end{aligned}$$

Under condition A1(ii), we know $\lim _{n\rightarrow \infty }\varvec{x}_i/n=\varvec{0},i=1,\ldots ,n$. And because of the condition A1(i), it is clear that $E_{\varepsilon }Z_n(\varvec{\theta })=w_0\widetilde{f}(0)(\varvec{\delta }^T,v_i)D_1(\varvec{\delta }^T,v_i)^T+o(1)$ with $D_1=\left( \begin{array}{cc}D&{}\varvec{0}\\ \varvec{0}&{}1\end{array}\right) $. Let

$$\begin{aligned} R_{in}(\varvec{\theta })=w_i\left( \left| \widetilde{\varepsilon }_i-(\varvec{x}_i^T\varvec{\delta }+v_i)/\sqrt{n}\right| -|\widetilde{\varepsilon }_i|\right) -w_i\text {sgn}\left( \widetilde{\varepsilon }_i\right) \left( \varvec{x}_i^T\varvec{\delta }+v_i\right) /\sqrt{n}, \end{aligned}$$

and

$$\begin{aligned} \varvec{W}_n=\frac{1}{\sqrt{n}}\sum _{i=1}^nw_i\text {sgn} \left( \widetilde{\varepsilon }_i\right) \left( \varvec{x}_i^T,1\right) ^T. \end{aligned}$$

Noting that $E_{\varepsilon }[w_i\text {sgn}(\widetilde{\varepsilon }_i)]=0$, we have

$$\begin{aligned} Z_n(\varvec{\theta })= & {} w_0\widetilde{f}(0) \left( \varvec{\delta }^T,v_i\right) D_1\left( \varvec{\delta }^T,v_i\right) ^T+o(1)+ \varvec{W}_n^T\left( \varvec{\delta }^T,v_i\right) ^T\\&+\sum _{i=1}^n \left[ R_{in}(\varvec{\theta })-E_{\varepsilon }R_{in}(\varvec{\theta })\right] . \end{aligned}$$

Now we show that $\sum _{i=1}^n\left[ R_{in}(\varvec{\theta })-E_{\varepsilon }R_{in}(\varvec{\theta })\right] $ = $o_p(1)$. It is sufficient to show that $\text {var}_{\varepsilon }\left\{ \sum _{i=1}^n\left[ R_{in}(\varvec{\theta })-E_{\varepsilon }R_{in}(\varvec{\theta })\right] \right\} =o_p(1)$. In fact, based on the inequality

$$\begin{aligned} |R_{in}(\varvec{\theta })|\le 2w_i\left| \left( \varvec{x}_i^T\varvec{\delta }+v_i\right) / \sqrt{n}\right| I\left( |\widetilde{\varepsilon }_i|\le | \left( \varvec{x}_i^T\varvec{\delta }+v_i\right) /\sqrt{n}|\right) \end{aligned}$$

we get

$$\begin{aligned} \begin{aligned}&\text {var}_{\varepsilon }\left\{ \sum _{i=1}^n\left[ R_{in}(\varvec{\theta })-E_{\varepsilon }R_{in}(\varvec{\theta })\right] \right\} \\&\quad =E_{\varepsilon }\left\{ \sum _{i=1}^n\left[ R_{in}(\varvec{\theta })-E_{\varepsilon }R_{in}(\varvec{\theta })\right] \right\} ^2 \le \sum _{i=1}^nE_{\varepsilon }[R_{in}^2(\varvec{\theta })]\\&\quad \le 4\sum _{i=1}^n\left| \left( \varvec{x}_i^T\varvec{\delta }+v_i\right) /\sqrt{n}\right| ^2E_{\varepsilon }\left[ w_i^2I\left( |\widetilde{\varepsilon }_i|\le |(\varvec{x}_i^T\varvec{\delta }+v_i)/\sqrt{n}|\right) \right] \\&\quad \le 4\left( \Vert \varvec{\delta }\Vert ^2+v_i^2\right) E_{\varepsilon }\left[ w_i^2I\left( |\widetilde{\varepsilon }_i|\le \sqrt{\Vert \varvec{\delta }\Vert ^2+v_i^2}\max _{i=1,\ldots ,n}\left( \sqrt{(\Vert \varvec{x}_i\Vert ^2+1)/n}\right) \right) \right] \\&\quad \quad \times \sum _{i=1}^n\left( \Vert \varvec{x}_i\Vert ^2+1\right) /n\\&\quad \le 4\left( \Vert \varvec{\delta }\Vert ^2+v_i^2\right) E_{\varepsilon }\left[ I\left( |\widetilde{\varepsilon }_i|\le \sqrt{\Vert \varvec{\delta }\Vert ^2+v_i^2}\max _{i=1,\ldots ,n}\left( \sqrt{(\Vert \varvec{x}_i\Vert /\sqrt{n})^2+1/n}\right) \right) \right] \\&\quad \quad \times \left( 1+\sum _{i=1}^n\Vert \varvec{x}_i\Vert ^2/n\right) . \end{aligned} \end{aligned}$$

The last inequality holds since $w_i^2\le 1$. Under Assumption A1(ii), we have

$$\begin{aligned} \text {var}_{\varepsilon }\left\{ \sum _{i=1}^n\left[ R_{in} \left( \varvec{\theta }\right) -E_{\varepsilon }R_{in}\left( \varvec{\theta }\right) \right] \right\} =o_p(1). \end{aligned}$$

Therefore, we have

$$\begin{aligned} Z_n(\varvec{\theta })=w_0\widetilde{f}(0)\left( \varvec{\delta }^T,v_i\right) D_1\left( \varvec{\delta }^T,v_i\right) ^T+o(1)+\varvec{W}_n^T \left( \varvec{\delta }^T,v_i\right) ^T+o_p(1). \end{aligned}$$

The convexity of the limiting objective function above guarantees the uniqueness of the minimizer. We have

$$\begin{aligned} \left( \varvec{\delta }_n^T,v_{n,i}\right) ^T=-\frac{1}{2w_0\widetilde{f}(0)}D_1^{-1}\varvec{W}_n+o_p(1). \end{aligned}$$

Under Assumption A1(i) and using the equality $\text {var}_{\varepsilon }[w_i\text {sgn}(\widetilde{\varepsilon }_i)]=E_{\varepsilon }(w_1^2)=1/12$, we have $E_{\varepsilon }\varvec{W}_n=\varvec{0}$ and $\text {var}_{\varepsilon }\varvec{W}_n\rightarrow D_1/12$ as $n\rightarrow \infty $. By the Cramér–Wald device and CLT, we have $\varvec{W}_n\mathop {\rightarrow }\limits ^{\mathcal {L}}\mathcal {N}\left( \varvec{0},D_1/12\right) $. Thus,

$$\begin{aligned} \left( \varvec{\delta }_n^T,v_{n,i}\right) ^T\mathop {\rightarrow }\limits ^{\mathcal {L}}\mathcal {N}\left( \varvec{0},\frac{1}{48w_0^2\widetilde{f}^2(0)}D_1^{-1}\right) . \end{aligned}$$

Combining all the results for $i=1,\ldots ,n$ completes the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Tian, M. An effective method to reduce the computational complexity of composite quantile regression. Comput Stat 32, 1375–1393 (2017). https://doi.org/10.1007/s00180-017-0749-8

Download citation

Received: 30 January 2016
Accepted: 09 July 2017
Published: 20 July 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s00180-017-0749-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective method to reduce the computational complexity of composite quantile regression

Abstract

Access this article

Similar content being viewed by others

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Entropy-Based Subsampling Methods for Big Data

Residuals-based distributionally robust optimization with covariate information

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Comparison of computational complexity between ICQR and CQR

Appendix B: Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An effective method to reduce the computational complexity of composite quantile regression

Abstract

Access this article

Similar content being viewed by others

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Entropy-Based Subsampling Methods for Big Data

Residuals-based distributionally robust optimization with covariate information

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Comparison of computational complexity between ICQR and CQR

Appendix B: Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation