Compound Poisson INAR(1) processes: Stochastic properties and testing for overdispersion

https://doi.org/10.1016/j.csda.2014.03.005Get rights and content

Abstract

The compound Poisson INAR(1) model for time series of overdispersed counts is considered. For such CPINAR(1) processes, explicit results are derived for joint moments, for the k-step-ahead distribution as well as for the stationary distribution. It is shown that a CPINAR(1) process is strongly mixing with exponentially decreasing weights. This result is utilized to design a test for overdispersion in INAR(1) processes and to derive its asymptotic power function. An application of our results to a real-data example and a study of the finite-sample performance of the test are presented.

Introduction

The study of count data time series (i.e., having a range contained in N0={0,1,}) has attracted a lot of attention in recent years. In particular, the integer-valued autoregressive model of order 1 (INAR(1) model) for stationary processes (Yt)tZ with an AR(1)-like dependence structure, as introduced by McKenzie (1985), was studied intensively, see Section  2.2 for details. This model has been applied to several real count data time series y1,,yT, see Freeland (1998), Freeland and McCabe (2004), Jung et al. (2005), Weiß (2008) as well as Section  5 below.

For such applications, especially the Poisson INAR(1) model, having a Poisson marginal distribution, would be an attractive choice for describing the data. One important characteristic of the Poisson distribution is equidispersion, i.e., its mean and variance are equal to each other. In contrast, many real-world data examples (see the above references) exhibit empirical overdispersion, i.e., the empirical variance is larger than the empirical mean. Typical reasons for observing overdispersion in practice are surveyed by Weiß (2009). But is the empirically observed degree of overdispersion really significant? And if so, how can we apply the INAR(1) model to overdispersed count data time series?

We first focus on the second question. Several modifications of the INAR(1) model have been suggested in the literature, usually changing the thinning operation used for defining the INAR(1) model (Weiß , 2008). The approach taken here focuses on the distribution of the innovations, by considering the larger class of compound Poisson (CP) distributions; a brief survey of the CP-distribution is provided in Section  2. We introduce the compound Poisson INAR(1) model (CPINAR(1) model) which constitutes a convenient way to account for overdispersion in an INAR(1) process and comprises a number of specialized INAR(1) models within one model. To make the CPINAR(1) model broadly applicable, we derive its k-step-ahead forecast distribution, its stationary marginal distribution, closed-form expressions for its joint moments up to order 4 as well as mixing properties in Section  3.

We then turn our attention towards the first of the questions posed above and consider the (Poisson) index of dispersion, IY, which equals 1 in the case of the Poisson distribution. This index and its empirical counterpart are commonly defined as IYσY2μYandIˆYSY2Ȳ,  respectively. Here, Ȳ=1Tt=1TYt and SY2=1Tt=1T(YtȲ)2=(1Tt=1TYt2)Ȳ2. Based on the results established for CPINAR(1) processes, especially concerning joint moments and mixing properties, we apply a central limit theorem, which, in turn, is used in Section  4 to derive closed-form expressions for the asymptotic distribution of the index of dispersion for CPINAR(1) processes. Utilizing these expressions, we develop a test based on the index of dispersion, and we investigate its power for uncovering overdispersion in INAR(1) processes. In Section  5, we apply our test to a real-data example. We conclude in Section  6 and outline issues for future research.

Section snippets

The compound Poisson distribution

In the sequel, the moments about the origin of a random variable ϵ are abbreviated as μϵ,kE[ϵk] with μϵμϵ,1. The central moments are denoted as μ̄ϵ,kE[(ϵμϵ)k], with σϵ2μ̄ϵ,2. For the compound Poisson distribution, we adapt the notations and definitions from Chapter XII in Feller (1968).

Definition 2.1.1 Compound Poisson Distribution

Let X1,X2, be i.i.d. random variables with the range being contained in N={1,2,}; let ν denote the upper limit of the range (we allow the case ν=). Denote the probability generating function (pgf) of the X

Forecasting

We consider the k-step-ahead conditional distribution of a CPINAR(1) process for arbitrary kN.

Theorem 3.1.1

k-Step-Ahead Forecasting

Let (Yt)tZ be a  CPINAR(1)  process according to   Definition  2.3.1. Then the conditional pgf of Yt+k given Yt satisfiespgfYt+k|Yt(z)=(1αk+αkz)Ytpgfϵ(k)(z),where ϵ(k) is a CPν(λ(k),H(k))-distributed random variable withλ(k)=λi=1k(1H(1αi1)),λ(k)(H(k)(z)1)=λi=1k(H(1αi1+αi1z)1).

The proof of Theorem 3.1.1 is provided by Appendix A.1. Note that the first equation in (6) is included in the

Testing for overdispersion in CPINAR(1) processes

The index of dispersion IˆY from (1) has been analyzed in detail for the case of i.i.d. counts (Rao and Chakravarti, 1956, Böhning, 1994). In the sequel, we shall analyze its asymptotic behavior for serially dependent counts stemming from a (rather general) INAR(1) process. With the help of this result, we are able to test the null hypothesis of a Poisson INAR(1) process and to analyze the power of this test if we are indeed concerned with a (true) CPINAR(1) process.

Application: a time series of claims counts

Freeland (1998) presented a time series y1,,yT of monthly claims counts (Jan. 1987 to Dec. 1994, length T=96), which was further analyzed by Weiß (2009). The counts refer to workers in the heavy manufacturing industry collecting benefits due to a burn related injury (Freeland, 1998, p. 22). The data are plotted in Fig. 2. They show an AR(1)-like autocorrelation structure such that the Poisson INAR(1) model appears to be a reasonable candidate model. However, empirical mean and variance are

Conclusions and future research

We studied the stochastic properties of compound Poisson INAR(1) processes and derived a test for diagnosing overdispersion. We showed that the CPINAR(1) model allows for the explicit calculation of the k-step-ahead probability distribution as well as the stationary distribution of the process. Using the newly derived formulas for joint moments in an INAR(1) process together with the mixing properties of a CPINAR(1) process, we were able to find the asymptotic distribution of the index of

Acknowledgments

The authors thank the editor, two associate editors and a referee for very useful comments on an earlier draft of this article. They are also grateful to Prof. Anthony Pakes, University of Western Australia, for a helpful discussion about his article Pakes (1971).

References (25)

  • R.K. Freeland et al.

    Forecasting discrete valued low count time series

    Int. J. Forecast.

    (2004)
  • S. Aki

    Discrete distributions of order k on a binary sequence

    Ann. Inst. Statist. Math.

    (1985)
  • S. Aki et al.

    On discrete distributions of order k

    Ann. Inst. Statist. Math.

    (1984)
  • D. Böhning

    A note on a test for Poisson overdispersion

    Biometrika

    (1994)
  • R.C. Bradley

    Basic properties of strong mixing conditions. A survey and some open questions

    Probab. Surv.

    (2005)
  • H.A. David

    Bias of S2 under dependence

    Amer. Statist.

    (1985)
  • J.B. Douglas

    Analysis with Standard Contagious Distributions

    (1980)
  • W. Feller

    An Introduction to Probability Theory and Its Applications—Vol. I

    (1968)
  • Freeland, R.K., 1998. Statistical analysis of discrete time series with applications to the analysis of workers...
  • C.R. Heathcote

    Corrections and comments of the paper “A branching process allowing immigration”

    J. R. Stat. Soc. Ser. B

    (1966)
  • I. Ibragimov

    Some limit theorems for stationary processes

    Theory Probab. Appl.

    (1962)
  • N.L. Johnson et al.

    Univariate Discrete Distributions

    (2005)
  • Cited by (116)

    • On MCMC sampling in self-exciting integer-valued threshold time series models

      2022, Computational Statistics and Data Analysis
      Citation Excerpt :

      An integer-valued time series is count data formed by the states of a certain phenomenon at different moments. It is widely used in various fields in the real world, including industrial (Schweer and Weiß (2014)), commercial (Agosto et al. (2016)), economics (Brännäs and Quoreshi (2010)), insurance actuarial (Guan and Hu (2021)), quality control (Li et al. (2019a)). Such data appears in, for example, the number of hospital visits per day, the number of insurance claims per week, and the number of confirmed patients of a certain infectious disease per month.

    • Zero-inflated Poisson INAR(1) model with periodic structure

      2024, Communications in Statistics - Theory and Methods
    • A simple INAR(1) model for analyzing count time series with multiple features

      2024, Communications in Statistics - Theory and Methods
    View all citing articles on Scopus
    1

    Tel.: +49 6221 54 8981.

    View full text