Residual projection for quantile regression in vertically partitioned big data

Fan, Ye; Li, Jr-Shin; Lin, Nan

doi:10.1007/s10618-022-00914-4

Residual projection for quantile regression in vertically partitioned big data

Published: 17 January 2023

Volume 37, pages 710–735, (2023)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

414 Accesses
2 Altmetric
Explore all metrics

Abstract

Standard regression techniques model only the mean of the response variable. Quantile regression (QR) is more powerful in that it depicts a comprehensive relationship between the response variable and independent covariates at different quantiles. It is particularly useful for non-normally distributed data with skewness or heterogeneity, which appear routinely in many scientific fields, such as economics, finance, public health and biology. Although its theory has been well developed in the literature, its computation in big data still faces multiple challenges, especially for vertically stored big data in modern distributed environments, where communication efficiency and security are usually the primary considerations. While the popular alternating direction method of multipliers (ADMM) provides a general computational solution, its slow convergence becomes a bottleneck when communication cost dominates local computational consumption, such as Internet of Things (IoT) networks. Motivated by the residual projection technique, in this paper we propose an innovative iterative parallel framework, PIQR, that converges faster and has a more secure data transmission plan, and establish its convergence property. This framework is further extended to composite quantile regression (CQR), a modified QR technique that improves estimation efficiency at extreme quantiles. Simulation studies show that both the ADMM-based method and the PIQR enjoy favorable estimation accuracy in distributed environments. While PIQR is inferior to the ADMM-based method at local computation, it requires much fewer iterations to achieve convergence, and hence significantly improves the overall computational efficiency when communication cost is the dominating factor. Moreover, PIQR transmits only data involving the residual information between different machines, and can better prevent the leakage of important data information compared with the ADMM-based method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed quantile regression for longitudinal big data

Article 17 January 2023

Ye Fan, Nan Lin & Liqun Yu

Communication-efficient sparse composite quantile regression for distributed data

Article 16 June 2022

Yaohong Yang & Lei Wang

A partitioned quasi-likelihood for distributed statistical inference

Article 09 March 2020

Guangbao Guo, Yue Sun & Xuejun Jiang

References

Ai M, Wang F, Yu J, Zhang H (2021) Optimal subsampling for large-scale quantile regression. J Complex 62:101512
Article MathSciNet MATH Google Scholar
Allen DE, Gerrans P, Powell R, Singh AK (2009) Quantile regression: its application in investment analysis. Finsia J Appl Finance 1(4):7–12
Google Scholar
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Article MATH Google Scholar
Briollais L, Durrieu G (2014) Application of quantile regression to recent genetic and -omic studies. Hum Genet 133(8):951–966
Article Google Scholar
Chen C, Wei Y (2005) Computational issues for quantile regression. Sankhyā Indian J Stat 67(2):399–417
MathSciNet MATH Google Scholar
Chen X, Xie M-G (2014) A split-and-conquer approach for analysis of extraordinarily large data. Stat Sin 24(4):1655–1684
MathSciNet MATH Google Scholar
Chen L, Zhou Y (2020) Quantile regression in big data: a divide and conquer based strategy. Comput Stat Data Anal 144:106892
Article MathSciNet MATH Google Scholar
Chen S, Billings SA, Luo W (1989) Orthogonal least squares methods and their application to non-linear system identification. Int J Control 50(5):1873–1896
Article MATH Google Scholar
Chen X, Liu W, Zhang Y (2019) Quantile regression under memory constraint. Ann Stat 47(6):3244–3273
Article MathSciNet MATH Google Scholar
Chen X, Liu W, Mao X, Yang Z (2020) Distributed high-dimensional regression under a quantile loss function. J Mach Learn Res 21(182):1–43
MathSciNet MATH Google Scholar
Fitzenberger B, Koenker R, Machado JAF (2013) Economic applications of quantile regression. Physica-Verlag Heidelberg, New York
Google Scholar
Gamal ME, Lai L (2015) Are Slepian–Wolf Rates necessary for distributed parameter estimation? In: 2015 53rd annual Allerton conference on communication, control, and computing (Allerton), IEEE. pp 1249–1255
Gu Y, Zou H (2020) Sparse composite quantile regression in ultrahigh dimensions with tuning parameter calibration. IEEE Trans Inf Theory 66(11):7132–7154
Article MathSciNet MATH Google Scholar
He X, Pan X, Tan KM, Zhou WX (2021) Smoothed quantile regression for large-scale inference. J Econom. https://doi.org/10.1016/j.jeconom.2021.07.010
Article MATH Google Scholar
Hu A, Jiao Y, Liu Y, Shi Y, Wu Y (2021) Distributed quantile regression for massive heterogeneous data. Neurocomputing 448:249–262
Article Google Scholar
Huang C, Huo X (2019) A distributed one-step estimator. Math Program 174(1):41–76
Article MathSciNet MATH Google Scholar
Hunter DR, Lange K (2000) Quantile regression via an MM algorithm. J Comput Gr Stat 9(1):60–77
MathSciNet Google Scholar
Hunter DR, Lange K (2000) Optimization transfer using surrogate objective functions: rejoinder. J Comput Gr Stat 9(1):52–59
Google Scholar
Ivkin N, Rothchild D, Ullah E, Braverman V, Stoica I, Arora R (2019) Communication-efficient distributed SGD with sketching. In: Proceedings of the 33rd conference on neural information processing systems (NeurIPS), pp 1–11
Jiang R, Yu K (2022) Renewable quantile regression for streaming data sets. Neurocomputing 508:208–224
Article Google Scholar
Jordan MI, Lee JD, Yang Y (2019) Communication-efficient distributed statistical inference. J Am Stat Assoc 114(526):668–681
Article MathSciNet MATH Google Scholar
Kibria BG, Joarder AH (2006) A short review of multivariate T-distribution. J Stat Res 40(1):59–72
MathSciNet Google Scholar
Koenker R (2017) Quantreg: quantile regression. https://CRAN.R-project.org/package=quantreg
Koenker R (2005) Quantile regression. Cambridge University Press, New York
Book MATH Google Scholar
Koenker R, Bassett JG (1978) Regression quantiles. Econometrica 46(1):33–50
Article MathSciNet MATH Google Scholar
Konečnỳ J, McMahan HB, Yu FX, Richtárik P, Suresh AT, Bacon D (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492
Lange K, Hunter DR, Yang I (2000) Optimization transfer using surrogate objective functions. J Comput Gr Stat 9(1):1–20
MathSciNet Google Scholar
Lee JD, Liu Q, Sun Y, Taylor JE (2017) Communication-efficient sparse regression. J Mach Learn Res 18(1):115–144
MathSciNet MATH Google Scholar
Lin N, Xi R (2011) Aggregated estimating equation estimation. Stat Interface 4(1):73–83
Article MathSciNet MATH Google Scholar
Li A, Sun J, Wang B, Duan L, Li S, Chen Y, Li H (2020) LotteryFL: personalized and communication-efficient federated learning with lottery ticket hypothesis on non-IID datasets. arXiv preprint arXiv:2008.03371
Miao W, Narayanan V, Li J-S (2020) Parallel residual projection: a new paradigm for solving linear inverse problems. Sci Rep 10(1):12846
Article Google Scholar
Pan R, Ren T, Guo B, Li F, Li G, Wang H (2022) A note on distributed quantile regression by pilot sampling and one-step updating. J Bus Econ Stat 40(4):1691–1700
Article MathSciNet Google Scholar
Peng L, Huang Y (2008) Survival analysis with quantile regression models. J Am Stat Assoc 103(482):637–649
Article MathSciNet MATH Google Scholar
Pietrosanu M, Gao J, Kong L, Jiang B, Niu D (2021) Advanced algorithms for penalized quantile and composite quantile regression. Comput Stat 36(1):333–346
Article MathSciNet MATH Google Scholar
Portnoy S, Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12(4):279–300
Article MathSciNet MATH Google Scholar
R Development Core Team (2013) R: a language and environment for statistical computing. http://www.R-project.org
Royen T (1995) On some central and non-central multivariate chi-square distributions. Stat Sin 5:373–397
MathSciNet MATH Google Scholar
Sherwood B, Wang L, Zhou X-H (2013) Weighted quantile regression for analyzing health care cost data with missing covariates. Stat Med 32(28):4967–4979
Article MathSciNet Google Scholar
Shi L, Ye Y, Chu X, Lu G (2020) Computation bits maximization in a backscatter assisted wirelessly powered MEC network. IEEE Commun Lett 25(2):528–532
Article Google Scholar
Takeuchi I, Le QV, Sears TD, Smola AJ (2006) Nonparametric quantile estimation. J Mach Learn Res 7(45):1231–1264
MathSciNet MATH Google Scholar
Tan KM, Battey H, Zhou WX (2022) Communication-constrained distributed quantile regression with optimal statistical guarantees. J Mach Learn Res 23:1–61
Google Scholar
Trofimov I, Genkin A (2017) Distributed coordinate descent for generalized linear models with regularization. Pattern Recognit Image Anal 27(2):349–364
Article Google Scholar
Trofimov I, Genkin A (2015) Distributed coordinate descent for L1-regularized logistic regression. In: International conference on analysis of images, social networks and texts, Springer. pp 243–254
Volgushev S, Chao S-K, Cheng G (2019) Distributed inference for quantile regression processes. Ann Stat 47(3):1634–1662
Article MathSciNet MATH Google Scholar
Wang H, Li C (2017) Distributed quantile regression over sensor networks. IEEE Trans Signal Inf Process Netw 4(2):338–348
MathSciNet Google Scholar
Wang H, Ma Y (2021) Optimal subsampling for quantile regression in big data. Biometrika 108(1):99–112
Article MathSciNet MATH Google Scholar
Wang L, Wu Y, Li R (2012) Quantile regression for analyzing heterogeneity in ultra-high dimension. J Am Stat Assoc 107(497):214–222
Article MathSciNet MATH Google Scholar
Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2(1):224–244
Article MathSciNet MATH Google Scholar
Wu Y, Liu Y (2009) Variable selection in quantile regression. Stat Sin 19(2):801–817
MathSciNet MATH Google Scholar
Xi R, Lin N, Chen Y (2008) Compression and aggregation for logistic regression analysis in data cubes. IEEE Trans Knowl Data Eng 21(4):479–492
Google Scholar
Yang J, Meng X, Mahoney MW (2014) Quantile regression for large-scale applications. SIAM J Sci Comput 36(5):78–110
Article MathSciNet MATH Google Scholar
Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol 10(2):1–19
Article Google Scholar
Ye Y, Shi L, Chu X, Li D, Lu G (2021) Delay minimization in wireless powered mobile edge computing with hybrid Backcom and AT. IEEE Wirel Commun Lett 10(7):1532
Article Google Scholar
Yu L, Lin N (2017) ADMM for penalized quantile regression in big data. Int Stat Rev 85(3):494–518
Article MathSciNet Google Scholar
Yu K, Lu Z, Stander J (2003) Quantile regression: applications and current research areas. J R Stat Soc Ser D 52(3):331–350
MathSciNet Google Scholar
Yu L, Lin N, Wang L (2017) A parallel algorithm for large-scale nonconvex penalized quantile regression. J Comput Gr Stat 26(4):935–939
Article MathSciNet Google Scholar
Zheng H, Kulkarni SR, Poor HV (2010) Attribute-distributed learning: models, limits, and algorithms. IEEE Trans Signal Process 59(1):386–398
Article MathSciNet MATH Google Scholar
Zhang Y, Duchi JC, Wainwright MJ (2013a) Communication-efficient algorithms for statistical optimization. J Mach Learn Res 14(1):3321–3363
Zhang Y, Duchi JC, Jordan MI, Wainwright MJ (2013b) Information-theoretic lower bounds for distributed statistical estimation with communication constraints. In: Proceedings of the 26th international conference on neural information processing systems (NIPS), pp 2328–2336
Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36(3):1108–1126
Article MathSciNet MATH Google Scholar
Zou Y, Xu J, Gong S, Guo Y, Niyato D, Cheng W (2019) Backscatter-aided hybrid data offloading for wireless powered edge sensor networks. In: 2019 IEEE global communications conference (GLOBECOM). IEEE, pp 1–6

Download references

Funding

Nan Lin’s work is supported by NVDIA GPU grant program. Ye Fan’s work is supported by Initial Scientific Research Fund of Young Teachers in Capital University of Economics and Business [Grant No. XRZ2022062], and partly supported by Special Fund for Basic Scientific Research of Beijing Municipal Colleges in Capital University of Economics and Business [Grant No. QNTD202207]. Jr-Shin Li’s work is supported by the Air Force Office of Scientific Research under the award FA9550-21-1-0335.

Author information

Authors and Affiliations

School of Statistics, Capital University of Economics and Business, Beijing, 100070, China
Ye Fan
Department of Electrical and Systems Engineering, Washington University in St. Louis, St. Louis, MO, 63130, USA
Jr-Shin Li
Department of Mathematics and Statistics, Washington University in St. Louis, St. Louis, MO, 63130, USA
Nan Lin

Authors

Ye Fan
View author publications
You can also search for this author in PubMed Google Scholar
Jr-Shin Li
View author publications
You can also search for this author in PubMed Google Scholar
Nan Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nan Lin.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this work.

Additional information

Responsible editor: Aristides Gionis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 797 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fan, Y., Li, JS. & Lin, N. Residual projection for quantile regression in vertically partitioned big data. Data Min Knowl Disc 37, 710–735 (2023). https://doi.org/10.1007/s10618-022-00914-4

Download citation

Received: 17 December 2021
Accepted: 16 December 2022
Published: 17 January 2023
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10618-022-00914-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Residual projection for quantile regression in vertically partitioned big data

Abstract

Access this article

Similar content being viewed by others

Distributed quantile regression for longitudinal big data

Communication-efficient sparse composite quantile regression for distributed data

A partitioned quasi-likelihood for distributed statistical inference

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 797 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Residual projection for quantile regression in vertically partitioned big data

Abstract

Access this article

Similar content being viewed by others

Distributed quantile regression for longitudinal big data

Communication-efficient sparse composite quantile regression for distributed data

A partitioned quasi-likelihood for distributed statistical inference

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 797 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation