Abstract
Count outcomes are often modelled using the Poisson regression. However, this model imposes a strict mean-variance relationship that is unappealing in many contexts. Several studies in the life sciences result in count outcomes with excessive amounts of zeros. The presence of the excess zeros introduces extra dispersion in the data which cannot be accounted for by the traditional Poisson regression. The zero-inflated Poisson (ZIP) and zero-inflated negative binomial models are popular alternative. The zero-inflated models comprise two key components; a logistic part which models the zeros, and a Poisson component to handle the positive counts. Both components allow the inclusion of covariates. Civettini and Hines [3] investigated misspecification effects in the zero-inflated negative binomial regression models. Long,Preisser, Herring and Golin [10] proposed a so-called marginalized zero-inflated Poisson (MZIP) model that allows direct marginal interpretation for fixed effect estimates to overcome the often sub-population specific interpretation of the traditional zero-inflated models. In this research, the effects of misspecification of components of the MZIP regression model are investigated through a comprehensive simulation study. Two different incorrect specifications of the components of an MZIP model were considered, namely ‘Omission’ and ‘Misspecification’. Bias, standard error (precision) of estimates and mean square error (MSE) are computed while varying the sample size. Type I error rates are also evaluated for the misspecified models. Results of a Monte Carlo simulation are reported. It was observed that omissions in both parts of the models lead to biases in the estimated parameters. The intercept parameters were the most severely affected. Furthermore, in all the types of omissions, parameters in the zero-inflated part of the models were much affected compared to the Poisson part in terms of both bias and MSE. Generally, bias and MSE decrease as sample sizes increase for all parameters. It was also found that misspecification can either increase, preserve or decrease the type I error rates depending on the sample size.
Funding statement: Samuel Iddi gratefully acknowledges financial support from University of Ghana, through ORID Research Grant.
A Supplementary appendix
Models | Quantity | ||||||||
---|---|---|---|---|---|---|---|---|---|
CM | Est | 0.2395 | 0.4075 | 0.2496 | 1.39867 | 0.6039 | 0.2530 | ||
Std Err | 0.0997 | 0.0793 | 0.0186 | 0.10830 | 0.1739 | 0.1871 | 0.0303 | 0.2497 | |
Bias | 0.0075 | 0.0039 | 0.0030 | 0.0018 | |||||
OMIT1 | Est | 0.1926 | 0.3106 | 0.3432 | 1.31915 | 0.9577 | – | ||
Std Err | 0.0977 | 0.0811 | 0.0066 | 0.10580 | 0.1664 | 0.2101 | – | 0.2630 | |
Bias | 0.0932 | 0.3577 | 0.0606 | – | 0.0895 | ||||
OMIT2 | Est | 0.4505 | 0.4376 | – | 1.37737 | 0.2318 | 0.5151 | ||
Std Err | 0.1060 | 0.0812 | – | 0.12110 | 0.1680 | 0.1430 | 0.0122 | 0.2226 | |
Bias | 0.2005 | 0.0376 | – | 0.4297 | 0.2651 | 0.4223 | |||
OMIT3 | Est | 0.6946 | 0.5295 | – | 1.36019 | 0.9309 | – | ||
Std Err | 0.0957 | 0.0818 | – | 0.10340 | 0.1634 | 0.2065 | – | 0.2529 | |
Bias | 0.4446 | 0.1295 | – | 0.3309 | 0.0187 | – | 0.0991 | ||
OMIT4 | Est | 0.7166 | 0.4406 | – | 0.90191 | 0.5046 | – | ||
Std Err | 0.0886 | 0.0846 | – | 0.07160 | 0.1215 | 0.1450 | 0.0115 | – | |
Bias | 0.4666 | 0.0406 | – | 0.5156 | 0.2546 | – |
Models | Quantity | ||||||||
---|---|---|---|---|---|---|---|---|---|
CMMIS1 | Est | 0.2453 | 0.4034 | – | 1.39386 | 0.6104 | 0.2522 | ||
Std Err | 0.1067 | 0.0841 | – | 0.12190 | 0.1894 | 0.2122 | 0.0271 | 0.2878 | |
Bias | 0.0034 | – | 0.0104 | 0.0022 | |||||
MIS1 | Est | 0.2486 | 0.4040 | 0.0028 | 1.39413 | 0.6156 | 0.2533 | ||
Std Err | 0.1106 | 0.0842 | 0.0252 | 0.12210 | 0.1955 | 0.2137 | 0.0697 | 0.2888 | |
Bias | 0.0040 | 0.0028 | 0.0156 | 0.0033 | |||||
CMMIS2 | Est | 0.2463 | 0.4030 | 0.2512 | 1.39887 | 0.5969 | 0.2484 | – | |
Std Err | 0.1204 | 0.1145 | 0.0273 | 0.07170 | 0.1525 | 0.1894 | 0.0370 | – | |
Bias | 0.0030 | 0.0012 | – | ||||||
MIS2 | Est | 0.2433 | 0.4049 | 0.2506 | 1.40758 | 0.6068 | 0.2484 | 0.0052 | |
Std Err | 0.1315 | 0.1155 | 0.0274 | 0.13200 | 0.1935 | 0.1898 | 0.0372 | 0.2334 | |
Bias | 0.0049 | 0.0006 | 0.00760 | 0.0068 | 0.0052 | ||||
CMMIS3 | Est | 0.2486 | 0.4022 | 0.2483 | – | 0.5948 | 0.2540 | – | |
Std Err | 0.1213 | 0.1264 | 0.0347 | – | 0.1588 | 0.2006 | 0.0466 | – | |
Bias | 0.0022 | – | 0.0040 | – | |||||
MIS3 | Est | 0.2439 | 0.4027 | 0.2488 | 0.5858 | 0.2519 | |||
Std Err | 0.1469 | 0.1266 | 0.0347 | 0.16540 | 0.2102 | 0.2015 | 0.0472 | 0.2766 | |
Bias | 0.0027 | 0.0019 | |||||||
CMMIS4 | Est | 0.2434 | 0.4049 | 0.2499 | – | 0.6034 | – | ||
Std Err | 0.0746 | 0.0824 | 0.0046 | – | 0.1602 | 0.2727 | – | 0.2797 | |
Bias | 0.0049 | – | 0.0034 | – | 0.0020 | ||||
MIS4 | Est | 0.2540 | 0.4029 | 0.2491 | 0.6076 | 0.0030 | |||
Std Err | 0.1026 | 0.0850 | 0.0087 | 0.12390 | 0.2015 | 0.2730 | 0.0308 | 0.3445 | |
Bias | 0.0040 | 0.0029 | 0.0076 | 0.0030 |
References
[1] A. Agresti, Foundations of Linear and Generalized Linear Models, John Wiley & Sons, Hoboken, 2015. Search in Google Scholar
[2] A. Agresti, B. Caffo and P. Ohman-Strickland, Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies, Comput. Statist. Data Anal. 47 (2004), no. 3, 639–653. 10.1016/j.csda.2003.12.009Search in Google Scholar
[3] A. J. Civettini and E. Hines, Misspecification effects in zero-inflated negative binomial regression models: Common cases, Annual Meeting of the Southern Political Science Association, New Orleans, (2005). Search in Google Scholar
[4] P. J. Heagerty, Marginally specified logistic-normal models for longitudinal binary data, Biometrics 55 (1999), 688–698. 10.1111/j.0006-341X.1999.00688.xSearch in Google Scholar
[5] S. Iddi and K. Doku-Amponsah, Statistical model for overdispersed count outcome with many zeros: An approach for direct marginal inference, South African J. Stat. 50 (2016), 313–330. 10.37920/sasj.2016.50.2.9Search in Google Scholar
[6] S. Iddi and G. Molenberghs, A combined overdispersed and marginalized multilevel model, Comput. Statist. Data Anal. 56 (2012), 1944–1951. 10.1016/j.csda.2011.11.021Search in Google Scholar
[7] D. Lambert, Zero-inflated Poisson regression, with an application to defects in manufacturing, Technometrics 34 (1992), no. 1, 1–14. 10.2307/1269547Search in Google Scholar
[8] S. Litière, A. Alonso and G. Molenberghs, Type I and type II error under random-effects misspecification in generalized linear mixed models, Biometrics 63 (2007), no. 4, 1038–1044. 10.1111/j.1541-0420.2007.00782.xSearch in Google Scholar PubMed
[9] S. Litière, A. Alonso and G. Molenberghs, The impact of a misspecified random-effects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models, Stat. Med. 27 (2008), 3125–3144. 10.1002/sim.3157Search in Google Scholar PubMed
[10] D. L. Long, J. Preisser, A. Herring and C. Golin, A marginalized zero-inflated regression model with overall exposure effects, Stat. Med. 33 (2014), 5151–5165. 10.1002/sim.6293Search in Google Scholar PubMed PubMed Central
[11] R. W. M. Wedderburn, Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method, Biometrika 61 (1974), 439–447. 10.1093/biomet/61.3.439Search in Google Scholar
[12] W. F. W. Yaacob, M. A. Lazim and Y. B. Wah, A practical approach in modelling count data, Proceedings of the Regional Conference on Statistical Sciences (Malaysia 2010), IEEE Press, Piscataway (2010), 176–183. Search in Google Scholar
© 2017 Walter de Gruyter GmbH, Berlin/Boston