Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter May 18, 2017

Effect of covariate misspecifications in the marginalized zero-inflated Poisson model

  • Samuel Iddi ORCID logo EMAIL logo and Esther O. Nwoko

Abstract

Count outcomes are often modelled using the Poisson regression. However, this model imposes a strict mean-variance relationship that is unappealing in many contexts. Several studies in the life sciences result in count outcomes with excessive amounts of zeros. The presence of the excess zeros introduces extra dispersion in the data which cannot be accounted for by the traditional Poisson regression. The zero-inflated Poisson (ZIP) and zero-inflated negative binomial models are popular alternative. The zero-inflated models comprise two key components; a logistic part which models the zeros, and a Poisson component to handle the positive counts. Both components allow the inclusion of covariates. Civettini and Hines [3] investigated misspecification effects in the zero-inflated negative binomial regression models. Long,Preisser, Herring and Golin [10] proposed a so-called marginalized zero-inflated Poisson (MZIP) model that allows direct marginal interpretation for fixed effect estimates to overcome the often sub-population specific interpretation of the traditional zero-inflated models. In this research, the effects of misspecification of components of the MZIP regression model are investigated through a comprehensive simulation study. Two different incorrect specifications of the components of an MZIP model were considered, namely ‘Omission’ and ‘Misspecification’. Bias, standard error (precision) of estimates and mean square error (MSE) are computed while varying the sample size. Type I error rates are also evaluated for the misspecified models. Results of a Monte Carlo simulation are reported. It was observed that omissions in both parts of the models lead to biases in the estimated parameters. The intercept parameters were the most severely affected. Furthermore, in all the types of omissions, parameters in the zero-inflated part of the models were much affected compared to the Poisson part in terms of both bias and MSE. Generally, bias and MSE decrease as sample sizes increase for all parameters. It was also found that misspecification can either increase, preserve or decrease the type I error rates depending on the sample size.

MSC 2010: 62Jxx; 00A72

Funding statement: Samuel Iddi gratefully acknowledges financial support from University of Ghana, through ORID Research Grant.

A Supplementary appendix

Table 2

Results of the correct and misspecified models based on 500 simulations for sample of size 500.

ModelsQuantityβ0β1β2β3α0α1α2α3
CMEst0.23950.40750.24961.398670.6039-2.02170.2530-1.4982
Std Err0.09970.07930.01860.108300.17390.18710.03030.2497
Bias-0.01050.0075-0.0004-0.00130.0039-0.02170.00300.0018
OMIT1Est0.19260.31060.34321.319150.9577-1.9394-1.4105
Std Err0.09770.08110.00660.105800.16640.21010.2630
Bias-0.0574-0.08940.0932-0.08080.35770.06060.0895
OMIT2Est0.45050.43761.377370.2318-1.57030.5151-1.0777
Std Err0.10600.08120.121100.16800.14300.01220.2226
Bias0.20050.0376-0.0226-0.36820.42970.26510.4223
OMIT3Est0.69460.52951.360190.9309-1.9813-1.4009
Std Err0.09570.08180.103400.16340.20650.2529
Bias0.44460.1295-0.03980.33090.01870.0991
OMIT4Est0.71660.44060.90191-0.3501-1.48440.5046
Std Err0.08860.08460.071600.12150.14500.0115
Bias0.46660.0406-0.4981-0.95010.51560.2546
Table 3

Results of the correct and misspecified models based on 500 simulations for sample of size 500.

ModelsQuantityβ0β1β2β3α0α1α2α3
CMMIS1Est0.24530.40341.393860.6104-2.00510.2522-1.5044
Std Err0.10670.08410.121900.18940.21220.02710.2878
Bias-0.00470.0034-0.00610.0104-0.00510.0022-0.0044
MIS1Est0.24860.40400.00281.394130.6156-2.01090.2533-1.5055
Std Err0.11060.08420.02520.122100.19550.21370.06970.2888
Bias-0.00140.00400.0028-0.00590.0156-0.01090.0033-0.0055
CMMIS2Est0.24630.40300.25121.398870.5969-2.00160.2484
Std Err0.12040.11450.02730.071700.15250.18940.0370
Bias-0.00370.00300.0012-0.0011-0.0031-0.0016-0.0016
MIS2Est0.24330.40490.25061.407580.6068-2.00630.24840.0052
Std Err0.13150.11550.02740.132000.19350.18980.03720.2334
Bias-0.00670.00490.00060.007600.0068-0.0063-0.00160.0052
CMMIS3Est0.24860.40220.24830.5948-2.00340.2540
Std Err0.12130.12640.03470.15880.20060.0466
Bias-0.00140.0022-0.0017-0.0052-0.00340.0040
MIS3Est0.24390.40270.2488-0.007920.5858-2.00480.2519-0.0001
Std Err0.14690.12660.03470.165400.21020.20150.04720.2766
Bias-0.00610.0027-0.0012-0.00790-0.0142-0.00480.0019-0.0001
CMMIS4Est0.24340.40490.24990.6034-2.0262-1.4980
Std Err0.07460.08240.00460.16020.27270.2797
Bias-0.00660.0049-0.00010.0034-0.02620.0020
MIS4Est0.25400.40290.2491-0.005130.6076-2.01310.0030-1.5068
Std Err0.10260.08500.00870.123900.20150.27300.03080.3445
Bias0.00400.0029-0.0009-0.005100.0076-0.01310.0030-0.0068

References

[1] A. Agresti, Foundations of Linear and Generalized Linear Models, John Wiley & Sons, Hoboken, 2015. Search in Google Scholar

[2] A. Agresti, B. Caffo and P. Ohman-Strickland, Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies, Comput. Statist. Data Anal. 47 (2004), no. 3, 639–653. 10.1016/j.csda.2003.12.009Search in Google Scholar

[3] A. J. Civettini and E. Hines, Misspecification effects in zero-inflated negative binomial regression models: Common cases, Annual Meeting of the Southern Political Science Association, New Orleans, (2005). Search in Google Scholar

[4] P. J. Heagerty, Marginally specified logistic-normal models for longitudinal binary data, Biometrics 55 (1999), 688–698. 10.1111/j.0006-341X.1999.00688.xSearch in Google Scholar

[5] S. Iddi and K. Doku-Amponsah, Statistical model for overdispersed count outcome with many zeros: An approach for direct marginal inference, South African J. Stat. 50 (2016), 313–330. 10.37920/sasj.2016.50.2.9Search in Google Scholar

[6] S. Iddi and G. Molenberghs, A combined overdispersed and marginalized multilevel model, Comput. Statist. Data Anal. 56 (2012), 1944–1951. 10.1016/j.csda.2011.11.021Search in Google Scholar

[7] D. Lambert, Zero-inflated Poisson regression, with an application to defects in manufacturing, Technometrics 34 (1992), no. 1, 1–14. 10.2307/1269547Search in Google Scholar

[8] S. Litière, A. Alonso and G. Molenberghs, Type I and type II error under random-effects misspecification in generalized linear mixed models, Biometrics 63 (2007), no. 4, 1038–1044. 10.1111/j.1541-0420.2007.00782.xSearch in Google Scholar PubMed

[9] S. Litière, A. Alonso and G. Molenberghs, The impact of a misspecified random-effects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models, Stat. Med. 27 (2008), 3125–3144. 10.1002/sim.3157Search in Google Scholar PubMed

[10] D. L. Long, J. Preisser, A. Herring and C. Golin, A marginalized zero-inflated regression model with overall exposure effects, Stat. Med. 33 (2014), 5151–5165. 10.1002/sim.6293Search in Google Scholar PubMed PubMed Central

[11] R. W. M. Wedderburn, Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method, Biometrika 61 (1974), 439–447. 10.1093/biomet/61.3.439Search in Google Scholar

[12] W. F. W. Yaacob, M. A. Lazim and Y. B. Wah, A practical approach in modelling count data, Proceedings of the Regional Conference on Statistical Sciences (Malaysia 2010), IEEE Press, Piscataway (2010), 176–183. Search in Google Scholar

Received: 2016-11-9
Accepted: 2017-4-28
Published Online: 2017-5-18
Published in Print: 2017-6-1

© 2017 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 25.4.2024 from https://www.degruyter.com/document/doi/10.1515/mcma-2017-0106/html
Scroll to top button