Abstract
We devise a classification algorithm based on generalized linear mixed model (GLMM) technology. The algorithm incorporates spline smoothing, additive model-type structures and model selection. For reasons of speed we employ the Laplace approximation, rather than Monte Carlo methods. Tests on real and simulated data show the algorithm to have good classification performance. Moreover, the resulting classifiers are generally interpretable and parsimonious.
Similar content being viewed by others
References
BOYD, S., and VANDENBERGHE, L. (2004), Convex Optimization, New York: Cambridge University Press.
BREIMAN, L. (2001), “Statistical Modeling: The Two Cultures (With Discussion)”, Statistical Science, 16, 199–231.
BREIMAN, L., FRIEDMAN, J.H., OLSHEN, R.A., and STONE, C.J. (1984), Classification and Regression Trees, Belmont, California: Wadsworth Publishing.
BRESLOW, N.E., and CLAYTON, D.G. (1993), “Approximate Inference in Generalized Linear Mixed Models”, Journal of the American Statistical Association, 88, 9–25.
BUJA, A., HASTIE, T., and TIBSHIRANI, R. (1989), “Linear Smoothers and Additive Models”, The Annals of Statistics, 17, 453–510.
CHAMBERS, J. M., and HASTIE, T. J. (1992), Statistical Models in S, New York: Chapman and Hall.
COX, D., and KOH, E. (1989), “A Smoothing Spline Based Test of Model Adequacy in Polynomial Regression”, Annals of the Institute of Statistical Mathematics, 41, 383–400.
DURBÁN, M., and CURRIE, I. (2003), “A Note on P-Spline Additive Models with Correlated Errors”, Computational Statistics, 18, 263–292.
GRAY, R. J. (1994), “Spline-based Tests in Survival Analysis”, Biometrics, 50, 640–652.
GUYON, I., and ELISSEEFF, A. (2003), “An Introduction to Variable and Feature Selection”, Journal of Machine Learning Research, 3, 1157–1182.
HAND, D.J. (2006), “Classifier Technology and the Illusion of Progress (With Discussion)”, Statistical Science, 21, 1–34.
HASTIE, T. (2006), “Gam 0.97, R Package”, http://cran.r-project.org .
HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J. (2001), The Elements of Statistical Learning, New York: Springer-Verlag.
HASTIE, T.J., and TIBSHIRANI,R.J. (1990), Generalized AdditiveModels, London: Chapman and Hall.
IMHOF, J.P. (1961), “Computing the Distribution of Quadratic Forms in Normal Variables”, Biometrika, 48, 419–426.
KAUERMANN, G., KRIVOBOKOVA, T., and FAHRMEIR, L. (2009), “Some Asymptotic Results on Generalized Penalized Spline Smoothing”, Journal of the Royal Statistical Society, Series B, 71, 487–503.
KOOPERBERG, C., BOSE, S., and STONE, C.J. (1997), “Polychotomous Regression.”, Journal of the American Statistical Association, 92, 117–127.
LIN, X. (1997), “Variance Component Testing in Generalised Linear Models with Random Effects”, Biometrika, 84, 309–326.
MCCULLOCH, C.E., and SEARLE, S.R. (2000), Generalized, Linear, and Mixed Models, New York: John Wiley and Sons.
ORMEROD, J.T. (2008), “On Semiparametric Regression and Data Mining”, PhD Thesis, School of Mathematics and Statistics, The University of New South Wales, Sydney, Australia.
RAO, C.R. (1973), Linear Statistical Inference and Its Applications, New York: JohnWiley and Sons.
RUPPERT, D., WAND, M. P., and CARROLL, R.J. (2003), Semiparametric Regression, New York: Cambridge University Press.
STONE, C. J., HANSEN, M. H., KOOPERBERG, C. ,and TRUONG, Y. K. (1997), “Polynomial Splines and Their Tensor Products in Extended Linear Modeling”, The Annals of Statistics, 25, 1371–1425.
VAIDA, F., and BLANCHARD, S. (2005), “Conditional Akaike Information for Mixedeffect Models”, Biometrika, 92, 351–370.
VERBEKE, G., and MOLENBERGHS, G. (2000), Linear Mixed Models for Longitudinal Data, New York: Springer-Verlag.
WAGER, C., VAIDA, F., and KAUERMANN, G. (2007), “Model Selection for P-Spline Smoothing Using Akaike Information Criteria”, Australian and New Zealand Journal of Statistics, 49, 173–190.
WAKEFIELD, J.C., BEST, N.G., and WALLER, L. (2000), “Bayesian Approaches to Disease Mapping”, in Spatial Epidemiology, eds. P. Elliott, J.C. Wakefield, N.G. Best, and D.J. Briggs, Oxford: Oxford University Press, pp. 104–127.
WAND, M.P. (2002), “Vector Differential Calculus in Statistics”, The American Statistician, 56, 55–62.
WAND, M. P. (2003), “Smoothing and Mixed Models”, Computational Statistics, 18, 223–249.
WAND, M.P. (2007), “Fisher Information for Generalised Linear Mixed Models”, Journal of Multivariate Analysis, 98, 1412–1416.
WAND, M.P., and Ormerod, J.T. (2008), “On Semiparametric Regression with O’Sullivan Penalised Splines”, Australian and New Zealand Journal of Statistics, 50, 179–198.
WELHAM, S.J., CULLIS, B.R., KENWARD, M.G., and THOMPSON, R. (2007), “A Comparison ofMixedModel Splines for Curve Fitting”, Australian and New Zealand Journal of Statistics, 49, 1–23.
WOOD, S.N. (2003), “Thin-plate Regression Splines”, Journal of the Royal Statistical Society, Series B, 65, 95–114.
WOOD, S.N. (2006), “Mgcv 1.3, R Package”, http://cran.r-project.org .
YAU, P., KOHN, R., and WOOD, S. (2003), “Bayesian Variable Selection and Model Averaging in High-Dimensional Multinomial Nonparametric Regression”, Journal of Computational and Graphical Statistics, 12, 1–32.
ZHANG, D., and LIN, X. (2003), “Hypothesis Testing in Semiparametric Additive Mixed Models”, Biostatistics, 4, 57–74.
ZHAO, Y., STAUDENMAYER, J., COULL, B.A., and WAND, M.P. (2006), “General Design Bayesian Generalized Linear Mixed Models”, Statistical Science, 21, 35–51.
Author information
Authors and Affiliations
Corresponding author
Additional information
The first author acknowledges support from the Deutsche Forschungsgemeinschaft (Project D 310 122 40). The second and third authors acknowledge support from the Australian Research Council(Project DP0556518).
Rights and permissions
About this article
Cite this article
Kauermann, G., Ormerod, J.T. & Wand, M.P. Parsimonious Classification Via Generalized Linear Mixed Models. J Classif 27, 89–110 (2010). https://doi.org/10.1007/s00357-010-9045-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-010-9045-9