Abstract
A discrepancy function provides for an evaluation of a candidate model by quantifying the disparity between the candidate model and the true model that generated the observed data. The favored model from a candidate class is the one judged to have minimum discrepancy with the true model. The observed data can be regarded as a manifestation of the underlying true model. However, since the data provides only partial information as to the nature of the true model, the selection of a model is a decision that is made in the presence of uncertainty. To characterize this uncertainty, we consider employing resampling to generate multiple manifestations of the true model. Each of the candidate models can then be judged against each of the simulated versions of the true model, resulting in multiple panels of discrepancies. Model evaluation is subsequently achieved by providing an overall judgment on each candidate model. This overall assessment is based on combining the information in the individual discrepancy panels. As social choice theory, or voting theory, addresses the problem of turning individual preferences into a group preference, we see that social choice theory can be used in developing a novel approach to model evaluation.
Similar content being viewed by others
References
Akaike H (1969) Fitting autoregressive models for prediction. Ann Inst Stat Math 21:243–247
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csáki F (eds) 2nd international symposium on information theory. Akadémia Kiadó, Budapest, pp 267–281
Allen DM (1974) The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16:125–127
Arrow K (2002) Collected papers of Kenneth J. Arrow. Belknap Press, Cambridge
Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach, 2nd edn. Springer, New York
Christensen R (2011) Plane answers to complex questions: the theory of linear models, 4th edn. Springer, New York
Claeskens G, Hjort NL (2008) Model selection and model averaging. University Press, Cambridge
Draper NR, Smith H (1998) Applied regression analysis. Wiley, New York
Efron B (1983) Estimating the error rate of a prediction rule: improvement on cross validation. J Am Stat Assoc 78:316–331
Efron B (1986) How biased is the apparent error rate of a prediction rule? J Am Stat Assoc 81:461–470
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman and Hall, London
Fujikoshi Y, Satoh K (1997) Modified AIC and Cp in multivariate linear regression. Biometrika 84:707–716
Gelman A, Carlin JB, Stern HS, Rubin DB (2003) Bayesian data analysis. Chapman and Hall, London
George EI (2000) The variable selection problem. J Am Stat Assoc 95:1304–1308
Hannan EJ, Quinn HG (1979) The determination of the order of an autoregression. J R Stat Soc B 41:190–195
Hurvich CM, Tsai CL (1989) Regression and time series model selection in small samples. Biometrika 76:297–307
Ishiguro M, Sakamoto Y, Kitigawa G (1997) Bootstrapping log-likelihood and EIC, an extension of AIC. Ann Inst Stat Math 49:411–434
Kullback S (1968) Information theory and statistics. Dover, New York
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:76–86
Linhart H, Zucchini W (1986) Model selection. Wiley, New York
Mallows CL (1973) Some comments on Cp. Technometrics 15:661–675
McQuarrie ADR, Tsai C-L (1998) Regression and time series model selection. World Scientific, River Edge
Neath AA, Cavanaugh JE, Riedle B (2012) A bootstrap method for assessing uncertainty in Kullback-Leibler discrepancy model selection problems. Math Eng Sci Aerosp 3:381–391
Neath AA, Zhang Z, Cavanaugh JE (2009) Linear model selection for replicated and nearly replicated data. In: 2009 Proceedings of the American Statistical Association, (CD-ROM) Alexandria, Virginia
Saari DG (2001) Decisions and elections: explaining the unexpected. University Press, Cambridge
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Shibata R (1980) Asymptotically efficient selection of the order of the model for estimating parameters of a linear process. Ann Stat 80:147–164
Shibata R (1981) An optimal selection of regression variables. Biometrika 68:45–54
Takeuchi K (1976) Distributions of information statistics and criteria for adequacy of models. Math Sci 153:12–18 (in Japanese)
Thompson G (2010) Keeping things in proportion: how can voting systems be fairer? Significance 7:128–132
Acknowledgments
The authors wish to extend their appreciation to the associate editor and to two anonymous reviewers for carefully reading the original version and the first revision of this manuscript, and for providing constructive suggestions that served to improve the presentation and content.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Neath, A.A., Cavanaugh, J.E. & Weyhaupt, A.G. Model evaluation, discrepancy function estimation, and social choice theory. Comput Stat 30, 231–249 (2015). https://doi.org/10.1007/s00180-014-0532-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-014-0532-z