Skip to main content

Advertisement

Log in

Model selection criteria based on cross-validatory concordance statistics

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In the logistic regression framework, we present the development and investigation of three model selection criteria based on cross-validatory analogues of the traditional and adjusted c-statistics. These criteria are designed to estimate three corresponding measures of predictive error: the model misspecification prediction error, the fitting sample prediction error, and the sum of prediction errors. We aim to show that these estimators serve as suitable model selection criteria, facilitating the identification of a model that appropriately balances goodness-of-fit and parsimony, while achieving generalizability. We examine the properties of the selection criteria via an extensive simulation study designed as a factorial experiment. We then employ these measures in a practical application based on modeling the occurrence of heart disease.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) 2nd international symposium on information theory. Akademia Kiado, Budapest, pp 267–281

    Google Scholar 

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control AC–19:716–723

    Article  MathSciNet  MATH  Google Scholar 

  • Allen DM (1974) The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16:125–127

    Article  MathSciNet  MATH  Google Scholar 

  • Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79

    Article  MathSciNet  MATH  Google Scholar 

  • Bengtsson T, Cavanaugh JE (2006) An improved Akaike information criterion for state-space model selection. Comput Stat Data Anal 50:2635–2654

    Article  MathSciNet  MATH  Google Scholar 

  • Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52:345–370

    Article  MathSciNet  MATH  Google Scholar 

  • Cavanaugh JE (1999) A large-sample model selection criterion based on Kullback’s symmetric divergence. Stat Probab Lett 44:333–344

    Article  MathSciNet  MATH  Google Scholar 

  • Cavanaugh JE, Shumway RH (1997) A bootstrap variant of AIC for state-space model selection. Stat Sin 7:473–496

    MathSciNet  MATH  Google Scholar 

  • Cook NR (2007) Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation 115:928–935

    Article  Google Scholar 

  • Davies SL, Neath AA, Cavanaugh JE (2005) Cross validation model selection criteria for linear regression based on the Kullback–Leibler discrepancy. Stat Methodol 2:249–266

    Article  MathSciNet  MATH  Google Scholar 

  • Efron B (1983) Estimating the error rate of a prediction rule: improvement on cross-validation. J Am Stat Assoc 78:316–331

    Article  MathSciNet  MATH  Google Scholar 

  • Efron B (1986) How biased is the apparent error rate of a prediction rule? J Am Stat Assoc 81:461–470

    Article  MathSciNet  MATH  Google Scholar 

  • Golub GH, Heath M, Wahba G (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21:215–223

    Article  MathSciNet  MATH  Google Scholar 

  • Gonen M, Heller G (2005) Concordance probability and discriminatory power in proportional hazards regression. Biometrika 92:965–970

    Article  MathSciNet  MATH  Google Scholar 

  • Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  • Heagerty PJ, Zheng Y (2005) Survival model predictive accuracy and ROC curves. Biometrics 61:92–105

    Article  MathSciNet  MATH  Google Scholar 

  • Hilden J, Habbema JD, Bjerregaard B (1978) The measurement of performance in probabilistic diagnosis. II. Trustworthiness of the exact values of the diagnostic probabilities. Methods Inf Med 17:227–237

    Article  Google Scholar 

  • Hosmer DW, Lemeshow S (1980) A goodness-of-fit test for the multiple logistic regression model. Commun Stat A10:1043–1069

    Article  MATH  Google Scholar 

  • Hurvich CM, Tsai CL (1989) Regression and time series model selection in small samples. Biometrika 76:297–307

    Article  MathSciNet  MATH  Google Scholar 

  • Hurvich CM, Shumway RH, Tsai CL (1990) Improved estimators of Kullback–Leibler information for autoregressive model selection in small samples. Biometrika 77:709–719

    MathSciNet  Google Scholar 

  • Ishiguro M, Sakamoto Y, Kitagawa G (1997) Bootstrapping log likelihood and EIC, an extension of AIC. Ann Inst Stat Math 49:411–434

    Article  MathSciNet  MATH  Google Scholar 

  • Kullback S (1968) Information theory and statistics. Dover, New York

    MATH  Google Scholar 

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86

    Article  MathSciNet  MATH  Google Scholar 

  • Lemeshow S, Hosmer DW (1982) A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol 115:92–106

    Article  Google Scholar 

  • Linhart H, Zucchini W (1986) Model selection. Wiley, New York

    MATH  Google Scholar 

  • Mallows CL (1973) Some comments on \(C_p\). Technometrics 15:661–675

    MATH  Google Scholar 

  • Metz CE (1986) ROC methodology in radiologic imaging. Investig Radiol 21:720–733

    Article  Google Scholar 

  • Metz CE (1989) Some practical issues of experimental design and data analysis in radiologic ROC studies. Investig Radiol 24:234–245

    Article  Google Scholar 

  • Pan W (2001) Akaike’s information criterion in generalized estimating equations. Biometrics 57:120–125

    Article  MathSciNet  MATH  Google Scholar 

  • Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr, Vasan RS (2008) Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 27:157–172

    Article  MathSciNet  Google Scholar 

  • Royston P, Altman DG (2010) Visualizing and assessing discrimination in the logistic regression model. Stat Med 29:2508–2520

    Article  MathSciNet  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88:486–495

    Article  MathSciNet  MATH  Google Scholar 

  • Shibata R (1980) Asymptotically efficient selection of the order of the model for estimating parameters of a linear process. Ann Stat 8:147–164

    Article  MathSciNet  MATH  Google Scholar 

  • Shibata R (1981) An optimal selection of regression variables. Biometrika 68:45–54

    Article  MathSciNet  MATH  Google Scholar 

  • Shibata R (1997) Bootstrap estimate of Kullback–Leibler information for model selection. Stat Sin 7:375–394

    MathSciNet  MATH  Google Scholar 

  • Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW (2010) Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology 21:128–138

    Article  Google Scholar 

  • Stone M (1977) An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. J R Stat Soc Ser B 39:44–47

    MathSciNet  MATH  Google Scholar 

  • Sugiura N (1978) Further analysis of the data by Akaike’s information criterion and the finite corrections. Commun Stat A7:13–26

    Article  MathSciNet  MATH  Google Scholar 

  • Takeuchi K (1976) Distribution of information statistics and criteria for adequacy of models. Math Sci 153:12–18 (in Japanese)

    Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Ten Eyck P, Cavanaugh JE (2015) The adjusted concordance statistic. In: Karagrigoriou A, Oliveira T, Skiadas C (eds) Statistical, stochastic and data analysis methods and applications. ISAST, Athens, pp 143–156

    Google Scholar 

  • Vieu P (1994) Choice of regressors in nonparametric estimation. Comput Stat Data Anal 17:575–594

    Article  MATH  Google Scholar 

  • Zhang P (1991) Variable selection in nonparametric regression with continuous covariates. Ann Stat 19:1869–1882

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou XH, Obuchowski NA, McClish DK (2002) Stat Methods Diagn Med. Wiley, New York

    Book  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We wish to thank our referees for their valuable feedback, which served to improve the original version of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Ten Eyck.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ten Eyck, P., Cavanaugh, J.E. Model selection criteria based on cross-validatory concordance statistics. Comput Stat 33, 595–621 (2018). https://doi.org/10.1007/s00180-017-0766-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-017-0766-7

Keywords

Navigation