Skip to main content
Log in

Stable prediction in high-dimensional linear models

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We propose a Random Splitting Model Averaging procedure, RSMA, to achieve stable predictions in high-dimensional linear models. The idea is to use split training data to construct and estimate candidate models and use test data to form a second-level data. The second-level data is used to estimate optimal weights for candidate models by quadratic optimization under non-negative constraints. This procedure has three appealing features: (1) RSMA avoids model overfitting, as a result, gives improved prediction accuracy. (2) By adaptively choosing optimal weights, we obtain more stable predictions via averaging over several candidate models. (3) Based on RSMA, a weighted importance index is proposed to rank the predictors to discriminate relevant predictors from irrelevant ones. Simulation studies and a real data analysis demonstrate that RSMA procedure has excellent predictive performance and the associated weighted importance index could well rank the predictors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Ando, T., Li, K.C.: A model-averaging approach for high-dimensional regression. J. Am. Stat. Assoc. 109, 254–265 (2014)

    Article  MathSciNet  Google Scholar 

  • Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5, 232–253 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996a)

    MATH  Google Scholar 

  • Breiman, L.: Stacked regressions. Mach. Learn. 24, 49–64 (1996b)

    MATH  Google Scholar 

  • Bühlmann, P., Geer, V.D.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Berlin, Heidelberg (2011)

    Book  MATH  Google Scholar 

  • Bühlmann, P., Mandozzi, J.: High-dimensional variable screening and bias in subsequent inference, with an empirical comparison. Comput. Stat. 29, 407–430 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Bühlmann, P., Kalisch, M., Meier, L.: High-dimensional statistics with a view toward aplications in biology. Annu. Rev. Stat. Appl. 1, 255–278 (2014). subse

    Article  Google Scholar 

  • Efron, B., Hastie, T., Johnstone, L., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B 70, 849–911 (2008)

    Article  MathSciNet  Google Scholar 

  • Friedman, J., Hastie, T., Hofling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1, 302–332 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Hansen, B.E.: Least squares model averaging. Econometrica 75, 1175–1189 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Hansen, B.E., Racine, J.S.: Jackknife model averaging. Technical report (2010)

  • Hjort, N.L., Claeskens, G.: Frequentist model average estimators. J. Am. Stat. Assoc. 98, 879–899 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, K.C.: Asymptotic optimality for C\(_p\), C\(_l\), cross-validation and generalized cross-validation: discrete index set. Ann. Stat. 15, 958–975 (1987)

    Article  MATH  Google Scholar 

  • Liang, H., Zou, G., Wan, A.T.K., Zhang, X.: Optimal weight choice for frequentist model average estimators. J. Am. Stat. Assoc. 106, 1053–1066 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Marioni, J.C., Mason, C.E., Mane, S.M.: Rna-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008)

    Article  Google Scholar 

  • Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. Ser. B 72, 417–473 (2010)

    Article  MathSciNet  Google Scholar 

  • Nan, Y., Yang, Y.: Variable selection diagnostics measures for high-dimensional regression. J. Comput. Graph. Stat. 23, 636–656 (2014)

    Article  MathSciNet  Google Scholar 

  • R Core Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/ (2014)

  • Raftery, A., Madigan, D., Hoeting, J.: Bayesian model averaging for lienar regression models. J. Am. Stat. Assoc. 92, 179–191 (1997)

    Article  MATH  Google Scholar 

  • Rao, J.S., Tibshirani, R.: The out-of-bootstrap method for model averaging and selection. Technical Report (1997)

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 268–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Ullah, A., Wang, H.: Parametric and nonparametric frequentist model selection and model averaging. Econometrics 1, 157–179 (2013)

    Article  Google Scholar 

  • Yuan, Z., Yang, Y.: Combining linear regression models: when and how? J. Am. Stat. Assoc. 100, 1202–1214 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Zou, H., Zhang, H.H.: On the adaptive elastic-net with a diverging number of parameters. Ann. Stat. 37, 1733–1751 (2009)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors thank the associate editor and two referees for their constructive suggestions that helped them to improve the early manuscript. Lin’s research was supported by the Natural Science Foundation of Shenzhen University (Grant No. 201542). Wang’s research was supported by the National Science Fund for Distinguished Young Scholars in China (Grant No. 10725106), the National Natural Science Foundation of China (Grant No. 11171331 and Grant No. 11331011), a grant from the Key Lab of Random Complex Structure and Data Science, CAS and Natural Science Foundation of Shenzhen University. Zhang’s research was supported by the National Natural Science Foundation of China (Grant No. 11401391), the Project of Department of Education of Guangdong Province of China (Grant No. 2014KTSCX112), and the Natural Science Foundation of Shenzhen University (Grant No. 701, 000360023408). Pang’s research was supported by the Central Research Grant from the Hong Kong Polytechnic University (Grant No. G-YBKQ).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Zhang.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 206 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, B., Wang, Q., Zhang, J. et al. Stable prediction in high-dimensional linear models. Stat Comput 27, 1401–1412 (2017). https://doi.org/10.1007/s11222-016-9694-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9694-6

Keywords

Navigation