Original articlesTesting for the Poisson–Tweedie distribution
Introduction
Modeling count data in an important issue in many applied sciences such as medicine (see, for example, Joe and Zhu [10]), biology (see, for example, Esnaola et al. [5]) and economy (see, for example, Cui and Zheng [3]), among many others. The Poisson distribution plays an important role with this aim. Nevertheless, observed count data often exhibit over-dispersion (variance bigger than the mean), zero-inflation (more zeros than expected) and even heavy tails, and therefore, in these cases the Poisson distribution is not adequate for fitting the data. There is a number of distributions that can model these features. Classical examples are the negative binomial (NB) distribution for over-dispersion and the zero-inflated Poisson distribution for zero-inflation. However, from a practical point of view, model selection becomes an issue. A solution is to try a family of distributions able to model a wide range of mean–variance relationships and tail heaviness. An example is the Poisson–Tweedie (PT) distribution (see El-Shaarawi et al. [17]) which includes some distributions commonly used such as Poisson, NB, Poisson-inverse Gaussian, as well as other less used such as discrete stable, Poisson–Pascal and Neyman Type A.
A crucial aspect of data analysis is model validation. Since the PT distribution is defined by means of its probability generating function (PGF), the test in Rueda and O’Reilly [17] can be applied for testing goodness-of-fit (GOF) to this distribution. These authors proposed a Cramér–von Mises type GOF test for count distributions, which is based on comparing the empirical PGF (EPGF) of the data with the PGF in the null hypothesis. Although they gave equal weight to all differences, several authors have considered to use a different weight (see, for example, the tests in Baringhaus and Henze [1], Gürtler and Henze [7], Jiménez-Gamero and Batsidis [9]). Because the PT distribution can be seen as a particular case of the generalized Poisson distribution, as defined in Meintanis [12], the test in that paper can be also used for testing GOF to the PT distribution. The application of these tests requires the choice of a weight function, which is rather arbitrary.
This paper proposes a GOF test for the PT distribution which is based on the following: since the PGF of the PT distribution is the unique PGF satisfying certain differential equation, and the EPGF consistently estimates the PGF, the EPGF should approximately satisfy such equation. The proposed test statistic is a function of the coefficients of the polynomial of the equation that results when one replaces the PGF by the EPGF in the aforementioned differential equation. An advantage of the test proposed in this paper over those in the above paragraph is that its use does not entail the choice of any weight function.
The paper is organized as follows. With the aim of stating some notation, Section 2 recalls the definition of the PT distribution and that shows the PT distribution is the unique discrete distribution whose PGF satisfies certain differential equation. This result is used in Section 3 to propose a test statistic for testing GOF to the PT distribution. It will be seen that it can be considered as a generalization of the one in Nakamura and Pérez-Abreu [13], which was designed for testing GOF to the Poisson distribution. It is shown that such test statistic converges to a non-negative quantity, which is equal to zero if and only if the null hypothesis is true. Thus, the null hypothesis should be rejected for large values of the test statistic proposed. Since its asymptotic null distribution depends on unknown quantities, a parametric bootstrap is studied to consistently approximate the null distribution. The goodness of the bootstrap approximation for finite sample sizes was numerically assessed by means of a simulation experiment. Section 4 outlines the obtained results. This section also compares the power of the proposed test with others. As expected from the results in Janssen [8], asserting that the global power function of any nonparametric test is flat on balls of alternatives except for alternatives coming from a finite dimensional subspace, none of the considered tests is uniformly more powerful against all alternatives tried. Some applications to real data sets are also displayed in this section. Section 6 summarizes. All proofs are deferred to Appendix A. Appendix B deals with some applied issues such as the practical calculation of the test statistic.
Section snippets
Preliminaries
This section recalls the definition of the PT distribution and gives a characterization of it, which will be used in next section to propose a GOF test.
The PT distribution has been discovered independently by several authors with different parametrizations. Here we consider the definition given in El-Shaarawi et al. [4], where the authors also relate their definition to some previous ones.
Let . A random variable taking values in is said to belong to the PT distribution
The test statistic
Let be independent, identically distributed (IID) random observations from a population taking values in , with PGF . Let denote the EPGF of . Based on the sample, the objective is to test the composite null hypothesis against the alternative
As seen before, the PGF of the PT distribution is the only PGF satisfying the differential equation (1). By Proposition 1 in Novoa-Muñoz and Jiménez-Gamero [14], the PGF
Finite sample performance
The properties so far studied are asymptotic. To study the finite sample performance of the proposed test, we conducted some simulation experiments. In this section we briefly describe them and display a summary of the results obtained. Real data set applications are also displayed. All computations in this paper were performed by using programs written in the R language [16]. Some practical issues related to the calculation of the test statistic and the bootstrap approximation to their null
Boundary cases
As indicated in Section 2, three boundary cases where excluded from our development. The two non-trivial cases are the Poisson distribution and the family of discrete stable distributions. Many GOF tests have been developed in order to check if the data can be assumed to come from a Poisson distribution (see Gürtler and Henze [7] for a review). In particular, one could use the test in Nakamura and Pérez-Abreu [13], which inspired us to propose the test in this paper. A GOF test for the family
Summary
The paper proposes a GOF test for the PT distribution. The tests in Meintanis [12] and in Rueda and O’Reilly [17] can be also used for testing GOF to this distribution. The three tests are consistent against fixed alternatives, and the practical calculation of the -values requires in all cases a bootstrap approximation to the null distribution of the associated test statistics. An advantage of the test studied in this paper over those in [[12], [17]] is that the calculation of its test
Acknowledgments
The authors thank the anonymous reviewers for their constructive comments. M.V. Alba-Fernández has been partially supported by grant CTM2015–68276–R of the Spanish Ministry of Economy and Competitiveness. M.D. Jiménez-Gamero has been partially supported by grant MTM2017-89422-P of the Spanish Ministry of Economy, Industry and Competitiveness, the State Agency of Investigation, the European Regional Development Fund, and CRoNoS COST Action IC1408.
References (18)
- et al.
A goodness of fit test for the Poisson distribution based on the empirical generating function
Statist. Probab. Lett.
(1992) - et al.
Conditional maximum likelihood estimation for a class of observation-driven time series models for count data
Statist. Probab. Lett.
(2017) - et al.
Recent and classical goodness-of-fit tests for the Poisson distribution
J. Statist. Plann. Inference
(2000) - et al.
Central limit theorems revisited
Statist. Probab. Lett.
(2000) - et al.
Modelling heavy-tailed count data using a generalised Poisson-inverse Gaussian family
Statist. Probab. Lett.
(2009) - et al.
Penalized minimum disparity methods for multinomial models
Statist. Sinica
(1998) - et al.
Modelling species abundance using the Poisson-Tweedie family
Environmetrics
(2011) - et al.
A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments
BCM Bioinf.
(2013) - et al.
A warp-speed method for conducting Monte Carlo experiments involving bootstrap estimators
Econom. Theory
(2013)
Cited by (8)
On Goodness-of-Fit Tests for the Neyman Type A Distribution
2023, Revstat Statistical JournalGoodness-of-fit test for count distributions with finite second moment
2023, Journal of Nonparametric StatisticsDiscrete Tempered Stable Distributions
2022, Methodology and Computing in Applied Probability