Original articles
Testing for the Poisson–Tweedie distribution

https://doi.org/10.1016/j.matcom.2018.08.001Get rights and content

Abstract

In practice, count data exhibit over-dispersion, zero-inflation and even heavy tails. The Poisson–Tweedie distribution is a flexible parametric family able to accommodate these features. This paper proposes and studies a computationally convenient goodness-of-fit test for this distribution, which is based on an empirical counterpart of a system of equations. The test is consistent against fixed alternatives. The null distribution of the test can be consistently approximated by a parametric bootstrap. The goodness of the bootstrap estimator and the power for finite sample sizes are numerically assessed. Comparisons with other tests are also included. Applications to two real data sets are displayed.

Introduction

Modeling count data in an important issue in many applied sciences such as medicine (see, for example, Joe and Zhu [10]), biology (see, for example, Esnaola et al. [5]) and economy (see, for example, Cui and Zheng [3]), among many others. The Poisson distribution plays an important role with this aim. Nevertheless, observed count data often exhibit over-dispersion (variance bigger than the mean), zero-inflation (more zeros than expected) and even heavy tails, and therefore, in these cases the Poisson distribution is not adequate for fitting the data. There is a number of distributions that can model these features. Classical examples are the negative binomial (NB) distribution for over-dispersion and the zero-inflated Poisson distribution for zero-inflation. However, from a practical point of view, model selection becomes an issue. A solution is to try a family of distributions able to model a wide range of mean–variance relationships and tail heaviness. An example is the Poisson–Tweedie (PT) distribution (see El-Shaarawi et al. [17]) which includes some distributions commonly used such as Poisson, NB, Poisson-inverse Gaussian, as well as other less used such as discrete stable, Poisson–Pascal and Neyman Type A.

A crucial aspect of data analysis is model validation. Since the PT distribution is defined by means of its probability generating function (PGF), the test in Rueda and O’Reilly [17] can be applied for testing goodness-of-fit (GOF) to this distribution. These authors proposed a Cramér–von Mises type GOF test for count distributions, which is based on comparing the empirical PGF (EPGF) of the data with the PGF in the null hypothesis. Although they gave equal weight to all differences, several authors have considered to use a different weight (see, for example, the tests in Baringhaus and Henze [1], Gürtler and Henze [7], Jiménez-Gamero and Batsidis [9]). Because the PT distribution can be seen as a particular case of the generalized Poisson distribution, as defined in Meintanis [12], the test in that paper can be also used for testing GOF to the PT distribution. The application of these tests requires the choice of a weight function, which is rather arbitrary.

This paper proposes a GOF test for the PT distribution which is based on the following: since the PGF of the PT distribution is the unique PGF satisfying certain differential equation, and the EPGF consistently estimates the PGF, the EPGF should approximately satisfy such equation. The proposed test statistic is a function of the coefficients of the polynomial of the equation that results when one replaces the PGF by the EPGF in the aforementioned differential equation. An advantage of the test proposed in this paper over those in the above paragraph is that its use does not entail the choice of any weight function.

The paper is organized as follows. With the aim of stating some notation, Section 2 recalls the definition of the PT distribution and that shows the PT distribution is the unique discrete distribution whose PGF satisfies certain differential equation. This result is used in Section 3 to propose a test statistic for testing GOF to the PT distribution. It will be seen that it can be considered as a generalization of the one in Nakamura and Pérez-Abreu [13], which was designed for testing GOF to the Poisson distribution. It is shown that such test statistic converges to a non-negative quantity, which is equal to zero if and only if the null hypothesis is true. Thus, the null hypothesis should be rejected for large values of the test statistic proposed. Since its asymptotic null distribution depends on unknown quantities, a parametric bootstrap is studied to consistently approximate the null distribution. The goodness of the bootstrap approximation for finite sample sizes was numerically assessed by means of a simulation experiment. Section 4 outlines the obtained results. This section also compares the power of the proposed test with others. As expected from the results in Janssen [8], asserting that the global power function of any nonparametric test is flat on balls of alternatives except for alternatives coming from a finite dimensional subspace, none of the considered tests is uniformly more powerful against all alternatives tried. Some applications to real data sets are also displayed in this section. Section 6 summarizes. All proofs are deferred to Appendix A. Appendix B deals with some applied issues such as the practical calculation of the test statistic.

Section snippets

Preliminaries

This section recalls the definition of the PT distribution and gives a characterization of it, which will be used in next section to propose a GOF test.

The PT distribution has been discovered independently by several authors with different parametrizations. Here we consider the definition given in El-Shaarawi et al. [4], where the authors also relate their definition to some previous ones.

Let N0=N{0}={0,1,2,3,}. A random variable X taking values in N0 is said to belong to the PT distribution

The test statistic

Let X1,,Xn be independent, identically distributed (IID) random observations from a population X taking values in N0, with PGF g(t). Let gn(t)=1ni=1ntXi,denote the EPGF of X1,,Xn. Based on the sample, the objective is to test the composite null hypothesis H0:XPT(θ) for some θΘ,against the alternative H1:XPT(θ),θΘ.

As seen before, the PGF of the PT distribution is the only PGF satisfying the differential equation (1). By Proposition 1 in Novoa-Muñoz and Jiménez-Gamero [14], the PGF g(t)

Finite sample performance

The properties so far studied are asymptotic. To study the finite sample performance of the proposed test, we conducted some simulation experiments. In this section we briefly describe them and display a summary of the results obtained. Real data set applications are also displayed. All computations in this paper were performed by using programs written in the R language [16]. Some practical issues related to the calculation of the test statistic and the bootstrap approximation to their null

Boundary cases

As indicated in Section 2, three boundary cases where excluded from our development. The two non-trivial cases are the Poisson distribution and the family of discrete stable distributions. Many GOF tests have been developed in order to check if the data can be assumed to come from a Poisson distribution (see Gürtler and Henze [7] for a review). In particular, one could use the test in Nakamura and Pérez-Abreu [13], which inspired us to propose the test in this paper. A GOF test for the family

Summary

The paper proposes a GOF test for the PT distribution. The tests in Meintanis [12] and in Rueda and O’Reilly [17] can be also used for testing GOF to this distribution. The three tests are consistent against fixed alternatives, and the practical calculation of the p-values requires in all cases a bootstrap approximation to the null distribution of the associated test statistics. An advantage of the test studied in this paper over those in [[12], [17]] is that the calculation of its test

Acknowledgments

The authors thank the anonymous reviewers for their constructive comments. M.V. Alba-Fernández has been partially supported by grant CTM2015–68276–R of the Spanish Ministry of Economy and Competitiveness. M.D. Jiménez-Gamero has been partially supported by grant MTM2017-89422-P of the Spanish Ministry of Economy, Industry and Competitiveness, the State Agency of Investigation, the European Regional Development Fund, and CRoNoS COST Action IC1408.

References (18)

There are more references available in the full text version of this article.
View full text