A model for integer-valued time series with conditional overdispersion

https://doi.org/10.1016/j.csda.2012.04.011Get rights and content

Abstract

In this paper, a new model, motivated by the weekly dengue cases in Singapore from year 2001 to 2010, is proposed to handle the conditional equidispersion, overdispersion and underdispersion in integer-valued pure time series. It is shown that the INARCH model studied by earlier researchers is a special case. Conditions for weak and strict stationarity of this model are also given in our paper. Some basic properties of this model are shown to be parallel to those of the classical autoregressive model. Three distribution based methods and two non-distribution based methods are presented for parameter estimation. These methods are compared in a simulation study for the conditional overdispersed situation with an integer-valued pure time series of order one. Finally, this model is applied to the motivating example.

Introduction

Time series count data are widely observed in practice. Among these data, count data with overdispersion are very common. An integer-valued generalized autoregressive conditional heteroscedastic (INGARCH) model, also known as an autoregressive conditional Poisson (ACP) model, has been used for handling overdispersion. This model was first proposed by Heinen (2003), and then its properties were studied by Ferland et al. (2006), Ghahramani and Thavaneswaran (2009), Weiß, 2009, Weiß, 2010, and Zhu and Wang, 2010, Zhu and Wang, 2011. The purely autoregressive INGARCH(0,p) model is also called an INARCH(p) model by Weiß (2009). The INARCH(p) model is defined as {Xt|Ft1:P(λt),λt=β0+i=1pβiXti, where Xt is the observation at time t, tZ, β0>0, βi0, i=1,,p, Ft1 represents the information on the process available at time t1: Xt1,Xt2,,X0,X1,,P(λ) represents a Poisson distribution with mean λ.

As the conditional distribution of Xt given Ft1 in the INARCH(p) model is Poisson, model (1) cannot handle conditional overdispersion and underdispersion. In order to address this issue, some authors proposed to change the Poisson distribution to other distributions, such as the double Poisson (DP) distribution (Heinen, 2003), the negative Binomial (NB) distribution (Zhu, 2011) and the generalized Poisson (GP) (Zhu, 2012), with a certain characteristic parameter of the substituted distribution satisfying the second equation of model (1). For different purposes, other candidate distributions can also be taken into account, such as the zero-inflated Poisson distribution (Xie et al., 2001, Yang et al., 2011) for modeling excessive number of zero observations, and the weighted Poisson distribution (proposed by Castillo and Pérez-Casany (1998) and then generalized by Castillo and Pérez-Casany (2005)) for flexibly handling the conditional overdispersion and underdispersion.

In this article, instead of specifying the conditional distribution, a more general model motivated by analyzing weekly dengue cases in Singapore from year 2001 to 2010 is proposed. In this model, the conditional mean of Xt given Ft1 is assumed to satisfy the second equation of model (1) and the conditional variance is assumed to have a constant ratio to the conditional mean. Overdispersion and underdispersion can be reflected by the value of the ratio. It is not surprising to see that model (1), the model proposed by Heinen (2003) and the model proposed by Zhu (2012) are all special cases of the new model. Some basic properties of the new model are shown to parallel those of a classical autoregressive (AR(p)) model, and meanwhile conditions for weak and strict stationarity of the new model are proposed and proved (detailed proofs are shown in Supplementary Material, Supplementary Material 1). Both distribution based estimation methods and non-distribution based estimation methods are presented. A simulation study has been conducted to compare these estimation methods when overdispersion exists and p=1.

Section snippets

A motivating example

Dengue is a serious threat to public health in tropical and sub-tropical areas of the world. It is transmitted by the bite of an Aedes mosquito which carries the virus from an infected person. Singapore has a long tradition in studying and collecting data on dengue. Many pioneers have contributed in modeling and analyzing the factors effecting the number of dengue cases in Singapore, such as Burattini et al. (2008), Hii et al. (2009) and Hsieh and Ma (2009). Hii et al. (2009) investigated the

Definition of the DINARCH(p) model

The DINARCH(p) model is defined as {E[Xt|Ft1]=λt,V AR[Xt|Ft1]=αλt, where λt satisfies the second expression of model (1), α (>0, may be related to Ft1) is assumed constant in this article. Model (2) indicates conditional overdispersion and underdispersion when α>1 and α<1, respectively. When α=1, model (2) is identical to model (1). When p=1, our model is also a non-Gaussian conditional linear AR(1) model analyzed by Grunwald et al. (2000). If α is not constant, model (2) also includes the NB

Properties of the DINARCH(p) model

As stationarity is a very important property for time series, in this section, we will study this and some other related properties of the DINARCH(p) model.

Parameter estimation

When the conditional distribution of Xt|Ft1 is known, the parameters can be estimated according to conditional maximum likelihood estimation (MLE). Otherwise, if the conditional distribution is unknown, we adopt two other methods. One is conditional weighted least-squares estimation (WLSE) method and the other is the Yule–Walker (YW) estimation which is usually used in estimating parameters of ARMA models.

In the following analysis, we assume that the first p observations are given, i.e., X0,X

Simulation

In order to study the performance of the parameter estimation methods in Section 5, a simulation study is carried out for a DINARCH(1) model with conditional overdispersion, i.e., the corresponding parameters are α (>1), β0 and β1. Relative standard error (RSE) defined below are used for comparison: RSEi=m=1M(θˆi,mθi(0))2/Mθi(0) where i=1,2,3, θ1α, θ2β0 and θ3β1, θi(0) represents the real value of parameter θi, θˆi,m is its estimate according to the m-th sample, and M is the number of

Conclusions

A new model which generalizes the INARCH model is proposed in this paper motivated by the study of weekly dengue cases in Singapore from year 2001 to 2010. Conditions for stationarity of the model are studied. Parameter estimation methods are also presented and compared in a simulation study when the order of the integer-valued pure time series is 1 and conditional overdispersion exists. Simulation results show that when both β1 and α are large, model mis-selection affects the estimation of α

Acknowledgments

We would like to express our sincere gratitude to the associate Editor and the two referees for their constructive comments and suggestions which have helped us to improve this paper greatly. This research is supported by NUS research grant No. R-266-000-044-112 and the work was done while the first author was with National University of Singapore as a research fellow. The research is also partially supported by a grant from City University of Hong Kong (project No. 9380058).

References (26)

  • P.C. Consul

    Generalized Poisson Distributions: Properties and Applications

    (1989)
  • J. Du et al.

    The integer-valued autoregressive (INAR(p)) model

    Journal of Time Series Analysis

    (1991)
  • B. Efron

    Double exponential families and their use in generalized linear regression

    Journal of the American Statistical Association

    (1986)
  • Cited by (0)

    View full text