Stable prediction in high-dimensional linear models

Lin, Bingqing; Wang, Qihua; Zhang, Jun; Pang, Zhen

doi:10.1007/s11222-016-9694-6

Stable prediction in high-dimensional linear models

Published: 27 August 2016

Volume 27, pages 1401–1412, (2017)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Bingqing Lin¹,
Qihua Wang^1,2,
Jun Zhang¹ &
…
Zhen Pang³

924 Accesses
65 Citations
Explore all metrics

Abstract

We propose a Random Splitting Model Averaging procedure, RSMA, to achieve stable predictions in high-dimensional linear models. The idea is to use split training data to construct and estimate candidate models and use test data to form a second-level data. The second-level data is used to estimate optimal weights for candidate models by quadratic optimization under non-negative constraints. This procedure has three appealing features: (1) RSMA avoids model overfitting, as a result, gives improved prediction accuracy. (2) By adaptively choosing optimal weights, we obtain more stable predictions via averaging over several candidate models. (3) Based on RSMA, a weighted importance index is proposed to rank the predictors to discriminate relevant predictors from irrelevant ones. Simulation studies and a real data analysis demonstrate that RSMA procedure has excellent predictive performance and the associated weighted importance index could well rank the predictors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Check your outliers! An introduction to identifying statistical outliers in R with easystats

Article 25 March 2024

Rémi Thériault, Mattan S. Ben-Shachar, … Dominique Makowski

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Aki Vehtari, Andrew Gelman & Jonah Gabry

A review of unsupervised feature selection methods

Article 29 January 2019

Saúl Solorio-Fernández, J. Ariel Carrasco-Ochoa & José Fco. Martínez-Trinidad

References

Ando, T., Li, K.C.: A model-averaging approach for high-dimensional regression. J. Am. Stat. Assoc. 109, 254–265 (2014)
Article MathSciNet Google Scholar
Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5, 232–253 (2011)
Article MathSciNet MATH Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996a)
MATH Google Scholar
Breiman, L.: Stacked regressions. Mach. Learn. 24, 49–64 (1996b)
MATH Google Scholar
Bühlmann, P., Geer, V.D.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Berlin, Heidelberg (2011)
Book MATH Google Scholar
Bühlmann, P., Mandozzi, J.: High-dimensional variable screening and bias in subsequent inference, with an empirical comparison. Comput. Stat. 29, 407–430 (2014)
Article MathSciNet MATH Google Scholar
Bühlmann, P., Kalisch, M., Meier, L.: High-dimensional statistics with a view toward aplications in biology. Annu. Rev. Stat. Appl. 1, 255–278 (2014). subse
Article Google Scholar
Efron, B., Hastie, T., Johnstone, L., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–499 (2004)
Article MathSciNet MATH Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B 70, 849–911 (2008)
Article MathSciNet Google Scholar
Friedman, J., Hastie, T., Hofling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1, 302–332 (2007)
Article MathSciNet MATH Google Scholar
Hansen, B.E.: Least squares model averaging. Econometrica 75, 1175–1189 (2007)
Article MathSciNet MATH Google Scholar
Hansen, B.E., Racine, J.S.: Jackknife model averaging. Technical report (2010)
Hjort, N.L., Claeskens, G.: Frequentist model average estimators. J. Am. Stat. Assoc. 98, 879–899 (2003)
Article MathSciNet MATH Google Scholar
Li, K.C.: Asymptotic optimality for C\(_p\), C\(_l\), cross-validation and generalized cross-validation: discrete index set. Ann. Stat. 15, 958–975 (1987)
Article MATH Google Scholar
Liang, H., Zou, G., Wan, A.T.K., Zhang, X.: Optimal weight choice for frequentist model average estimators. J. Am. Stat. Assoc. 106, 1053–1066 (2011)
Article MathSciNet MATH Google Scholar
Marioni, J.C., Mason, C.E., Mane, S.M.: Rna-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008)
Article Google Scholar
Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. Ser. B 72, 417–473 (2010)
Article MathSciNet Google Scholar
Nan, Y., Yang, Y.: Variable selection diagnostics measures for high-dimensional regression. J. Comput. Graph. Stat. 23, 636–656 (2014)
Article MathSciNet Google Scholar
R Core Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/ (2014)
Raftery, A., Madigan, D., Hoeting, J.: Bayesian model averaging for lienar regression models. J. Am. Stat. Assoc. 92, 179–191 (1997)
Article MATH Google Scholar
Rao, J.S., Tibshirani, R.: The out-of-bootstrap method for model averaging and selection. Technical Report (1997)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 268–288 (1996)
MathSciNet MATH Google Scholar
Ullah, A., Wang, H.: Parametric and nonparametric frequentist model selection and model averaging. Econometrics 1, 157–179 (2013)
Article Google Scholar
Yuan, Z., Yang, Y.: Combining linear regression models: when and how? J. Am. Stat. Assoc. 100, 1202–1214 (2005)
Article MathSciNet MATH Google Scholar
Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Article MathSciNet MATH Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Article MathSciNet MATH Google Scholar
Zou, H., Zhang, H.H.: On the adaptive elastic-net with a diverging number of parameters. Ann. Stat. 37, 1733–1751 (2009)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors thank the associate editor and two referees for their constructive suggestions that helped them to improve the early manuscript. Lin’s research was supported by the Natural Science Foundation of Shenzhen University (Grant No. 201542). Wang’s research was supported by the National Science Fund for Distinguished Young Scholars in China (Grant No. 10725106), the National Natural Science Foundation of China (Grant No. 11171331 and Grant No. 11331011), a grant from the Key Lab of Random Complex Structure and Data Science, CAS and Natural Science Foundation of Shenzhen University. Zhang’s research was supported by the National Natural Science Foundation of China (Grant No. 11401391), the Project of Department of Education of Guangdong Province of China (Grant No. 2014KTSCX112), and the Natural Science Foundation of Shenzhen University (Grant No. 701, 000360023408). Pang’s research was supported by the Central Research Grant from the Hong Kong Polytechnic University (Grant No. G-YBKQ).

Author information

Authors and Affiliations

Institute of Statistical Sciences, College of Mathematics and Statistics, Shenzhen University, Shenzhen, 518060, China
Bingqing Lin, Qihua Wang & Jun Zhang
Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
Qihua Wang
Department of Applied Mathematics, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Zhen Pang

Authors

Bingqing Lin
View author publications
You can also search for this author in PubMed Google Scholar
Qihua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Pang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Zhang.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 206 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, B., Wang, Q., Zhang, J. et al. Stable prediction in high-dimensional linear models. Stat Comput 27, 1401–1412 (2017). https://doi.org/10.1007/s11222-016-9694-6

Download citation

Received: 15 October 2015
Accepted: 16 August 2016
Published: 27 August 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s11222-016-9694-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stable prediction in high-dimensional linear models

Abstract

Access this article

Similar content being viewed by others

Check your outliers! An introduction to identifying statistical outliers in R with easystats

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

A review of unsupervised feature selection methods

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 206 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Check your outliers﻿! An introduction to identifying statistical outliers in R with easystats

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

A review of unsupervised feature selection methods

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 206 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Check your outliers! An introduction to identifying statistical outliers in R with easystats