Abstract
Model selection is an important component of data analysis. This study focuses on issues of model selection for the trend vector model, a model for the analysis of longitudinal multinomial outcomes. The trend vector model is a so-called marginal model, focusing on population averaged evolutions over time. A quasi-likelihood method is employed to obtain parameter estimates. Such an optimization function in theory invalidates likelihood-based statistics, such as the likelihood ratio statistic. Moreover, standard errors obtained from the Hessian are biased. In this paper, the performances of different model selection methods for the trend vector model are studied in detail. We specifically focused on two aspects of model selection: variable selection and dimensionality determination. Based on the quasi-likelihood function, selection criteria analogous to the likelihood ratio statistics, AIC and BIC, were employed. Additionally, Wald and resampling statistics were included as variable selection criteria. A series of simulations were carried out to evaluate the relative performance of these criteria. The results suggest that model selection can be best performed using either the quasi likelihood ratio statistic or the quasi-BIC. A special study on dimensionality selection found that the quasi-AIC also performs well for cases with degrees of freedom greater than 8. Another important finding is that the sandwich estimator for standard errors used in Wald statistics does not perform well. Even for larger sample sizes, the bias-correction procedure for the sandwich estimator is needed to give satisfactory results.
Similar content being viewed by others
References
ADACHI, K. (2000), “Scaling of a Longitudinal Variable with Time-Varying Representation of Individuals”, British Journal of Mathematical and Statistical Psychology, 53, 233-253.
AKAIKE, H. (1974), “A New Look at the Statistical Model Identification”, IEEE Transactions on Automatic Control, 19(6), 716–723.
DE ROOIJ, M. (2009), “Trend Vector Models for the Analysis of Change in Continuous Time for Multiple Groups”, Computational Statistics and Data Analysis, 53, 3209-3216.
DE ROOIJ, M., and SCHOUTEDEN, M. (2012), “The Mixed Effects Trend Vector Model”, Multivariate Behavioral Research, 47, 635–664.
EFRON, B., and TIBSHIRANI, R.J. (1993), An Introduction to the Bootstrap, New York: Chapman and Hall.
FAY, M.P., and GRAUBARD, B.I. (2001), “Small-Sample Adjustments for Wald-Type Tests Using Sandwich Estimators”, Biometrics, 57, 1198–1206.
HEDEKER, D., and GIBBONS, R.D. (2006), Longitudinal Data Analysis, New York: John Wiley & Sons.
KAUERMANN, G., CARROLL, R.J. (2001), “A Note on the Efficiency of Sandwich Covariance Matrix Estimation”, Journal of the American Statistical Association, 96 (456), 1387–1398.
LIANG, K.Y., and ZEGER, S.L. (1986), “Longitudinal Data Analysis Using Generalized Linear Models”, Biometrika, 73, 13–22.
LIPSITZ, S.R., KIM, K., and ZHAO, L. (1994), “Analysis of Repeated Categorical Data Using Generalized Estimating Equations”, Statistics in Medicine, 13, 1149–1163.
MANCL, L.A., and DEROUEN, T.A. (2001), “A Covariance Estimator for GEE with Improved Small-Sample Properties”, Biometrics, 57, 126–134.
MOLENBERGHS, G., and VERBEKE, G. (2005), Models for Discrete Longitudinal Data, New York: Springer.
NEUHAUS, J.M. (1993), “Estimation Efficiency and Tests of Covariate Effects with Clustered Binary Data”, Biometrics, 49, 989–996.
PAN, W. (2001), “Akaike’s Information Criterion in Generalized Estimating Equations”, Biometrics, 57, 120–125.
PAN, W., and LE, C.T. (2001), “Bootstrap Model Selection in Generalized Linear Models”, Journal of Agricultural, Biological, and Environmental Statistics, 6, 49–61.
PAN, W., and WALL, M.W. (2002), “Small-Sample Adjustments in Using the Sandwich Variance Estimator in Generalized Estimating Equations”, Statistics in Medicine, 21, 1429-1441.
PREISSER, J.S., and QAQISH, B.F. (1996), “Deletion Diagnostics for Generalized Estimating Equations, Biometrika, 83, 551–562.
SCHWARZ, G. (1978), “Estimating the Dimensions of a Model”, Annals of Statistics, 6, 461-464.
SHERMAN, M., and LE CESSIE, S. (1997), “A Comparison Between Bootstrap Methods and Generalized Estimating Equations for Correlated Outcomes in Generalized Linear Models”, Communications in Statistics-Simulation and Computation, 26, 901-925.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was conducted while both authors were sponsored by the Netherlands Organisation for Scientific Research (NWO), Innovational Grant, no. 452-06-002. The first author would also like to thank Ming-Mei Wang for her extensive help in revising this paper.
Rights and permissions
About this article
Cite this article
Yu, HT., de Rooij, M. Model Selection for the Trend Vector Model. J Classif 30, 338–369 (2013). https://doi.org/10.1007/s00357-013-9138-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-013-9138-3