Model selection using PRESS statistic

Alcantara, Ida Marie; Naranjo, Joshua; Lang, Yanda

doi:10.1007/s00180-022-01228-1

Model selection using PRESS statistic

Original paper
Published: 03 May 2022

Volume 38, pages 285–298, (2023)
Cite this article

Computational Statistics Aims and scope Submit manuscript

496 Accesses
Explore all metrics

Abstract

The most popularly used statistic $R^2$ has a fundamental weakness in model building: it favors adding more predictors to the model because $R^2$ can only increase. In effect, additional predictors start fitting noise to the data. Other measures used in selecting a regression model such as $R^2_{adj}$, AIC, SBC, and Mallow’s $C_p$ does not guarantee that the model selected will also make better prediction of future values. To avoid this, data scientists withhold a percentage of the data for validation purposes. The PRESS statistic does something similar by withholding each observation in calculating its own predicted value. In this paper, we investigated the behavior of $R^2_{PRESS}$, and how it performs compared to other criterion in model selection in the presence of unnecessary predictors. Using simulated data, we found $R^2_{PRESS}$ has generally performed best in selecting the true model as the best model for prediction among the model selection measures considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of Bayesian predictive methods for model selection

Article Open access 07 April 2016

Simple measures of uncertainty for model selection

Article 01 November 2020

Model Selection

References

Allen DM (1971) Mean square error of prediction as a criterion for selecting variables. Technometrics 13(3):469–475
Article MATH Google Scholar
Chang L-Y (2005) Analysis of freeway accident frequencies: negative binomial regression versus artificial neural network. Saf Sci 43(8):541–557
Article Google Scholar
Hettmansperger TP, McKean JW (2010) Robust nonparametric statistical methods, 2nd edn. CRC Press, Boca Raton, FL
Book MATH Google Scholar
Landram FG, Abdullat A, Shah V (2011) The coefficient of prediction for model specification. Southwest Econ Rev 32:149–156
Google Scholar
Ma R (2017) The influence factors of highway traffic accident and accident rates model. In: Proceedings of 3rd international symposium on social science (ISSS 2017)
Ma W, Yuan Z (2018) Analysis and comparison of traffic accident regression prediction model. In: 3rd International conference on electromechanical control technology and transportation
McQuarrie AD, Tsai C-L (1998) Regression and time series model selection. World Scientific, Singapore
Book MATH Google Scholar
Mediavilla F, Landram F, Shah V (2008) A comparison of the coefficient of predictive power, the coefficient of determination and AIC for linear regression. J Appl Bus Econ 8(4):44
Google Scholar
Murtaugh PA (1998) Methods of variable selection in regression modeling. Commun Stat Simul Comput 27(3):711–734
Article MATH Google Scholar
Pretis F, Reade JJ, Sucarrat G (2018) Automated general-to-specific (GETS) regression modeling and indicator saturation for outliers and structural breaks. J Stat Softw 86:1–44
Article Google Scholar
Tamhane A, Dunlop D (2000) Statistics and data analysis: from elementary to intermediate. Prentice Hall, New Jersey
Google Scholar
Weisberg S (1985) Applied linear regression. Wiley, New York
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Western Washington University, 516 High Street, Bellingham, 98225, USA
Ida Marie Alcantara
Department of Statistics, Western Michigan University, 1903 W Michigan Ave, Kalamazoo, MI, 49008, USA
Joshua Naranjo
Department of Epidemiology and Biostatistics, Temple University, 1301 Cecil B. Moore Ave, Philadelphia, PA, 19122, USA
Yanda Lang

Authors

Ida Marie Alcantara
View author publications
You can also search for this author inPubMed Google Scholar
Joshua Naranjo
View author publications
You can also search for this author inPubMed Google Scholar
Yanda Lang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ida Marie Alcantara.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alcantara, I.M., Naranjo, J. & Lang, Y. Model selection using PRESS statistic. Comput Stat 38, 285–298 (2023). https://doi.org/10.1007/s00180-022-01228-1

Download citation

Received: 12 September 2021
Accepted: 11 April 2022
Published: 03 May 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s00180-022-01228-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model selection using PRESS statistic

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparison of Bayesian predictive methods for model selection

Simple measures of uncertainty for model selection

Model Selection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now