An exact approach to ridge regression for big data

Zhang, Tonglin; Yang, Baijian

doi:10.1007/s00180-017-0731-5

An exact approach to ridge regression for big data

Original Paper
Published: 05 May 2017

Volume 32, pages 909–928, (2017)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Tonglin Zhang¹ &
Baijian Yang²

1015 Accesses
9 Citations
Explore all metrics

Abstract

Ridge regression is an important approach in linear regression when explanatory variables are highly correlated. Although expressions of estimators of ridge regression parameters have been successfully obtained via matrix operation after observed data are standardized, they cannot be used to big data since it is impossible to load the entire data set to the memory of a single computer and it is hard to standardize the original observed data. To overcome these difficulties, the present article proposes new methods and algorithms. The basic idea is to compute a matrix of sufficient statistics by rows. Once the matrix is derived, it is not necessary to use the original data again. Since the entire data set is only scanned once, the proposed methods and algorithms can be extremely efficient in the computation of estimates of ridge regression parameters. It is expected that the basic knowledge gained in this article will have a great impact on statistical approaches to big data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recent results in ridge regression methods

Article 15 July 2015

Ridge-Type MML Estimator in the Linear Regression Model

Article 02 March 2018

Obtaining a threshold for the stewart index and its extension to ridge regression

Article 20 November 2020

References

Alwal J, Herquet M, Maltoni F, Mattelaer O, Stelzer T (2011) MadGraph 5: going beyond. J High Energy Phys 1106, Article 128
Baldi P, Sadowski P, Whiteson D (2014) Searching for exotic particles in high-energy phusics with deep learning. Nat Commun 5, Article 4308
Chen Y, Dong G (2006) Regression cubes with lossless compression and aggregation. IEEE Trans Knowl Data Eng 18:1585–1599
Article Google Scholar
Dean J, Ghamawat S (2004) MapRedue: simplified data processing on large clusters. In: Proceeding of OSDI, pp 137–150
Deng Z, Choi K, Jiang Y, Wang S (2014) Generalized hidden-mapping ridge regression, knowledge-leveraged inductive transfer learning for neural networks, fuzzy systems and kernel methods. IEEE Trans Cybern 44:2585–2599
Article Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression, with discussions. Ann Stat 32:407–499
Article MATH Google Scholar
Emerson JW, Kane MJ (2012) Don’t drown in the data. Significance 9:38–39
Article Google Scholar
Enea M (2009) Fitting linear models and generalized linear models with large data sets. In: R. Statistical methods for the analysis of large datasets: book of short papers, pp 411–414
Fan J, Han F, Liu H (2014) Challenges of big data analysis. Natl Sci Rev 1:293–314
Article Google Scholar
Fernández A, del Río S, López V, Bawakid A, del Jesus M,Bent́ez JM, Herrera F,(2014) Big data with cloud computing: an insight on the computing environment, MapReduce, and programming frameworks. WIREs Data Min Knowl Discov. doi:10.1002/widm.1134
Guha S, Hafen R, Rounds J, Xia J, Li J, Xi B, Cleveland WS (2012) Large complex data: divide and recombine (D&R) with Rhipe. Stat 1:53–67
Article Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 2:109–135
MATH Google Scholar
Hogg RV, McKean JW, Craig AT (2005) Introduction to mathematical statistics, 6th edn. Pearson Prentice Hall, Upper Saddle River
Google Scholar
Howarth J, Shawei-Toylor J, Cheng T, Wang J (2014) Local online kernel ridge regression for forecasting of urban travel times. Transp Res Part C Energ Technol 46:151–178
Article Google Scholar
Karloff H, Suri S, Vassilvitskii S (2010) A model of computation for MapReduce. In: Proceeding of SODA’10 proceedings of the twenty-first annual ACM-SIAM symposium on discrete algorithms, pp 938–948
Lin N, Xi R (2011) Aggregated estimating equation estimation. Stat Interface 4:73–83
Article MathSciNet MATH Google Scholar
Ma P, Sun X (2015) Leveraging for bid data regression. WIREs Comput Stat 7:70–76
Article Google Scholar
Marquardt DW (1970) Generalized inverse, ridge regression, biased linear estimation and nonlinear estimation. Technometrics 12:591–612
Article MATH Google Scholar
Meeker WQ, Hong Y (2014) Reliability meets big data: opportunities and challenges. Qual Eng 26:102–116
Article Google Scholar
Miner D, Shook A (2012) MapReduce design patterns: building effective algorithms and analytics for hadoop and other systems. O’Reilly Media Inc, Sebastpool
Google Scholar
Moreno E, Girón J, Casella G (2010) Consistency of objective Bayes factors as the model dimension grows. Ann Stat 38:1937–1952
Article MathSciNet MATH Google Scholar
Moreno E, Girón J, Casella G (2015) Posterior model consistency in variable selection as the model dimension grows. Stat Sci 30:228–241
Article MathSciNet MATH Google Scholar
Ovyn S, Rouby X, Lemaitre V (2009) DELPHES, a framework for fast simulation of a generic collider experiment. Preprint at arXiv:0903.2225
Popo J, Carrera D, Becerra Y, Steinder M, Whalley I (2010) Performance-driven task co-scheduling for MapReduce environments. In: NOMS, pp 374–380
Shen X, Alam M, Fikse F, Rönnegard L (2013) A novel generalized ridge regression method for quantitative genetics. Genetics 193:1255–1268
Article Google Scholar
Sjöstrand T, Mrenna S, Skands P (2006) PYTHIA 6.4 physics and manual. J High Energy Phys 0605, Article 026
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc B 58:267–288
MathSciNet MATH Google Scholar
Vitter JS (2008) Algorithms and data structures for external memory. Now Publication Inc, Hanover
MATH Google Scholar
Wang M, Sun X (2014) Bayes factor consistency for nested linear models with a growing number of parameters. J Stat Plan Inference 147:95–105
Article MathSciNet MATH Google Scholar
Xue H, Zhu Y, Chen S (2009) Local ridge regression for face recognition. Nerocomputing 72:1342–1346
Article Google Scholar
Zhan H, Xu S (2012) Adaptive ridge regression for rare variant detection. PLoS ONE 7, Article 8
Zou H, Zhang HH (2009) On the adaptive elastic-net with a diverging number of parameters. Ann Stat 37:1733–1751
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors appreciate comments from the editor, an associate editor, and two anonymous reviewers, which significantly improve the quality of the article.

Author information

Authors and Affiliations

Department of Statistics, Purdue University, 250 North University Street, West Lafayette, IN, 47907-2066, USA
Tonglin Zhang
Department of Computer and Information Technology, Purdue University, 401 North Grant Street, West Lafayette, IN, 47907-2086, USA
Baijian Yang

Authors

Tonglin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Baijian Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tonglin Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, T., Yang, B. An exact approach to ridge regression for big data. Comput Stat 32, 909–928 (2017). https://doi.org/10.1007/s00180-017-0731-5

Download citation

Received: 12 November 2015
Accepted: 26 April 2017
Published: 05 May 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s00180-017-0731-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An exact approach to ridge regression for big data

Abstract

Access this article

Similar content being viewed by others

Recent results in ridge regression methods

Ridge-Type MML Estimator in the Linear Regression Model

Obtaining a threshold for the stewart index and its extension to ridge regression

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An exact approach to ridge regression for big data

Abstract

Access this article

Similar content being viewed by others

Recent results in ridge regression methods

Ridge-Type MML Estimator in the Linear Regression Model

Obtaining a threshold for the stewart index and its extension to ridge regression

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation