Skip to main content
Log in

Model selection using PRESS statistic

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The most popularly used statistic \(R^2\) has a fundamental weakness in model building: it favors adding more predictors to the model because \(R^2\) can only increase. In effect, additional predictors start fitting noise to the data. Other measures used in selecting a regression model such as \(R^2_{adj}\), AIC, SBC, and Mallow’s \(C_p\) does not guarantee that the model selected will also make better prediction of future values. To avoid this, data scientists withhold a percentage of the data for validation purposes. The PRESS statistic does something similar by withholding each observation in calculating its own predicted value. In this paper, we investigated the behavior of \(R^2_{PRESS}\), and how it performs compared to other criterion in model selection in the presence of unnecessary predictors. Using simulated data, we found \(R^2_{PRESS}\) has generally performed best in selecting the true model as the best model for prediction among the model selection measures considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Allen DM (1971) Mean square error of prediction as a criterion for selecting variables. Technometrics 13(3):469–475

    Article  MATH  Google Scholar 

  • Chang L-Y (2005) Analysis of freeway accident frequencies: negative binomial regression versus artificial neural network. Saf Sci 43(8):541–557

    Article  Google Scholar 

  • Hettmansperger TP, McKean JW (2010) Robust nonparametric statistical methods, 2nd edn. CRC Press, Boca Raton, FL

    Book  MATH  Google Scholar 

  • Landram FG, Abdullat A, Shah V (2011) The coefficient of prediction for model specification. Southwest Econ Rev 32:149–156

    Google Scholar 

  • Ma R (2017) The influence factors of highway traffic accident and accident rates model. In: Proceedings of 3rd international symposium on social science (ISSS 2017)

  • Ma W, Yuan Z (2018) Analysis and comparison of traffic accident regression prediction model. In: 3rd International conference on electromechanical control technology and transportation

  • McQuarrie AD, Tsai C-L (1998) Regression and time series model selection. World Scientific, Singapore

    Book  MATH  Google Scholar 

  • Mediavilla F, Landram F, Shah V (2008) A comparison of the coefficient of predictive power, the coefficient of determination and AIC for linear regression. J Appl Bus Econ 8(4):44

    Google Scholar 

  • Murtaugh PA (1998) Methods of variable selection in regression modeling. Commun Stat Simul Comput 27(3):711–734

    Article  MATH  Google Scholar 

  • Pretis F, Reade JJ, Sucarrat G (2018) Automated general-to-specific (GETS) regression modeling and indicator saturation for outliers and structural breaks. J Stat Softw 86:1–44

    Article  Google Scholar 

  • Tamhane A, Dunlop D (2000) Statistics and data analysis: from elementary to intermediate. Prentice Hall, New Jersey

    Google Scholar 

  • Weisberg S (1985) Applied linear regression. Wiley, New York

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ida Marie Alcantara.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alcantara, I.M., Naranjo, J. & Lang, Y. Model selection using PRESS statistic. Comput Stat 38, 285–298 (2023). https://doi.org/10.1007/s00180-022-01228-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-022-01228-1

Keywords