Abstract
In this paper, we give detailed descriptions of the Zero-Modified Negative Binomial distribution for analyzing count data. In particular, we study the characterizations and properties of this distribution, whose main advantage is its flexibility which makes it suitable for modeling a wide range of overdispersed and underdispersed count data (which may or may not be caused by zero-modification, i.e., the inflation or deflation of zeroes), without requiring previous knowledge about any of these inherent data characteristics. We derive maximum likelihood estimation of the model parameters based on positive observations, and evaluate the loss of efficiency by considering this procedure. We illustrate the suitability of this distribution on real data sets with different types of zero-modification.
Similar content being viewed by others
References
Aragón J, Eberly D, Eberly S (1992) Existence and uniqueness of the maximum likelihood estimator for the two-parameter negative binomial distribution. Stat Probab Lett 15(5):375–379
Bayarri MJ, Berger JO, Datta GS (2008) Objective bayes testing of poisson versus inflated poisson models. Inst Math Stat 3:105–121
Binns M (1975) Sequential estimation of the mean of a negative binomial distribution. Biometrika 62(2):433–440
Bliss CI, Fisher RA (1953) Fitting the negative binomial distribution to biological data. Biometrics 9(2):176–200
Cohen AC (1960) An extension Ao a truncated Poisson distribution. Biometrics 16:447–450
Conceição KS, Louzada F, Andrade MG, Helou E (2017) Zero-modified power series distribution and its hurdle distribution version. J Stat Comput Simul 87:1842–1862
Conigliani C, Castro JI, O’Hagan A (2000) Bayesian assessment of goodness of fit against nonparametric alternatives. Canadian J Stat 28(2):327–342
Consul PC (1990) New class of location-parameter discrete probability distributions and their characterizations. Commun Stat Theory Methods 19:4653–4666
Consul PC, Famoye F (2006) Lagrangian probability distributions. Birkhäuser, Boston
Cordeiro GM, Andrade MG, de Castro M (2009) Power series generalized nonlinear models. Comput Stat Data Anal 53:1155–1166
David FN, Johnson NI (1952) The truncated Poisson. Biometrics 8:275–285
Dietz E, Böhning D (2000) On estimation of the poisson parameter in zero-modified poisson models. Comput Stat Data Anal 34:441–459
Frome EL (1983) The analysis of rates using Poisson regression model. Biometrics 39:665–674
Frome EL, Checkoway H (1985) Use of Poisson regression models in estimating incidence rates and ratios. Am J Epidemiol 121(2):309–323
Gourieroux C, Monfort A (1995) Testing encompassing and simulating dynamic econometric model. Econom Theory 11:195–228
Gupta RC (1974) Modified power series distribution and some of its applications. Indian J Stat 36(3):288–298
Heilbron DC (1994) Zero-altered and other rRegression models for count data with aAdded zeros. Biometrical J 36(5):531–547
Hinde J, Demetrio CGB (1998) Overdispersion: models and estimation. Comput Stat Data Anal 27:151–170
Jain GC, Consul PC (1971) A generalized negative binomial distribution. SIAM J Appl Math 21(4):501–513
Jennrich RI, Sampson PF (1976) Newton-raphson and related algorithms for maximum likelihood variance component estimation. Technometrics 18(1):11–17
Johnson N. L, Kotz S (1969) Discrete Distributions, first. Wiley, New York
Johnson N. L, Kemp A. W, Kotz S (2005) Univariate Discrete Distributions, third. Wiley, New York
Kashyap RL (1982) Optimal Choice of AR and MA Parts in Autoregressive Moving Average Models. IEEE Trans Pattern Anal Mach Intell PAMI–4(2):99–104
Lu M, Mizon GE (1996) The encompassing principle and hypothesis testing. Econometric Theory 12(5):845–858
Mizon GE, Richard JF (1986) The encompassing principle and its application to testing non-nested hypotheses. Econometrica 54(3):657–78
M’Kendrick AG (1926) Applications of mathematics to medical problems. Proc Edinburgh Math Soc 44:98–103
Mullahy J (1986) Specification and testing of some modified count data models. J Econom 33:341–365
Ng T (1989) A new class of modified binomial distributions with applications to certain toxicological experiments. Commun Stat Theory Methods 18(9):3477–3492
Pesaran MH, Weeks M (1999) Non-nested Hypothesis Testing: An overview. Technical report, Faculty of Economics and Politics. University of Cambridge, Cambridge
Piegorsch WW (1990) Maximum likelihood estimation for the negative binomial dispersion parameter. Biometrics 46:863–867
Podlich HM, Faddy MJ, Smyth GK (2002) A general approach to modeling and analysis of species abundance data with extra zeros. J Agric Biol Environ Stat 7(3):324–334
R Core Team (2020) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria
Ridout, M., Demétrio, C. G. B. & Hinde, J. (1998). Models for count data with many zeros. Proceedings of the XIXth International Biometrics Conference, pages 179–192. Cape Town, Invited Papers
Shahmandi M, Wilson P, Thelwall M (2020) A new algorithm for zero-modified models applied to citation counts. Scientometrics 125:993–1010
Umbach D (1981) On inference for a mixture of a Poisson and a degenerate distribution. Commun Stat Theory Methods 10:299–306
Welsh AH, Cunningham RB, Donnelly CF, Lindenmayer DB (1996) Modelling the abundance of rare species: statistical models for counts with extra zeros. Ecol Modell 88:297–308
Ye M, Meyer PD, Neuman SP (2008) On model selection criteria in multimodal analysis. Water Resour Res 44:1–12
Acknowledgements
We are indebted to the Editorial Boarding and Referees for their valuable comments, criticisms, and suggestions, which have substantially improved the text of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Katiane S. Conceição is supported by the Brazilian organization Fundação de Amparo à Pesquisa do Estado de São Paulo - FAPESP (2019/22412-5); Marinho G. Andrade is supported by the Brazilian organization Fundação de Amparo à Pesquisa do Estado de São Paulo - FAPESP (2019/21766-8); Francisco Louzada is supported by the Brazilian organizations CNPq (301976/2017-1) and FAPESP (2013/07375-0).
Fisher score method
Fisher score method
We considered the Fisher score method to calculate the maximum likelihood estimates for \(\mu \) and \(\phi \) parameters of ZMNB (or ZTNB) distribution. For this, we use the iterative equations:
The maximum likelihood estimates of the parameters are obtained when \((\mathcal {U}_{\mu }^{^+(j)})^2+(\mathcal {U}_{\phi }^{^+(j)})^2< \varepsilon \) occur, where \(\varepsilon \) is the error in the estimation (ie, when the difference between iterations is less than a pre-established error \(\varepsilon \)). A detailed description of the Fisher score algorithm is presented as follows:
Rights and permissions
About this article
Cite this article
Conceição, K.S., Andrade, M.G., Louzada, F. et al. Characterizations and generalizations of the negative binomial distribution. Comput Stat 37, 1255–1286 (2022). https://doi.org/10.1007/s00180-021-01150-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-021-01150-y