Bootstrap confidence regions in multinomial sampling

https://doi.org/10.1016/S0096-3003(03)00777-XGet rights and content

Abstract

Power divergences can be used to give a measure of distance between two probability vectors. In multinomial sampling arguments can be substituted by empirical and theoretical proportions to obtain confidence regions of parameters. In this paper the bootstrap versions of these confidence regions are constructed. Monte Carlo simulation experiments are carried out to calculate average coverage probabilities and to compare the behavior of the introduced procedures.

Introduction

Let Y1,Y2,…,Yn be independent and identically distributed random variables with realizations on the sample space X={1,2,…,k}. Consider the probability vector p=(p1,…,pk), with components pj=P(Yi=j)>0, and the random variablesXj=∑i=1nI{j}(Yi),j=1,…,k.The statistic (X1,…,Xk) is sufficient for p in the statistical model under consideration and is multinomially distributed; that is,P(X1=x1,…,Xk=xk)=n!x1!⋯xk!p1x1×⋯×pkxk,for integers x1,…,xk⩾0 such that x1+⋯+xk=n.

For testingH0:p=p0,the classical goodness-of-fit Pearson's statistic isXn2=n∑i=1k(p̂i−pi0)2pi0,where p0=(p10,…,pk0), p̂=(p̂1,…,p̂k) and p̂j=Xj/n, j=1,…,k. Other common tests are based on φ-divergences between observed and theoretical proportions p̂ and p0 respectively. This family of divergences was introduced by Csiszár [2] and Ali and Silvey [1], for convex functions φ:(0,∞)→R, by the formulaDφ(p̂,p0)≡∑j=1kpj0φp̂jpj0;φ∈Φ,where Φ is the class of all convex functions φ(x), x>0, such that at x=1, φ(1)=0, φ(1)>0, and at x=0, φ(0/0)=0 and φ(p/0)=limu→∞φ(u)/u. Their properties and statistical applications have been extensively studied in Liese and Vajda [5] and Vajda [8]. For every φ∈Φ that is differentiable at x=1, the functionψ(x)≡φ(x)−φ(1)(x−1)also belongs to Φ. Then Dψ(p̂,p0)=Dφ(p̂,p0), and ψ has the additional property that ψ(1)=0. Because the two divergence measures are equivalent, we may consider the set Φ to be equivalent to the setΦ≡Φ∩{φ:φ(1)=0}.

In statistical inference, an important family of φ-divergences is the one introduced and studied by Cressie and Read [3]. Functions φ in the so called power-divergence family areφλ(x)=(λ(λ+1))−1(xλ+1−x);λ≠0,λ≠−1,φ0(x)=limλ→0φλ(x),φ−1(x)=limλ→−1φλ(x).Observe that φλ(x) and ψλ(x)≡φλ(x)−(x−1)(λ+1)−1 define the same divergence measure. In the following, the power-divergence measures are denoted byIλ(p̂,p0)≡Dφλ(p̂,p0)=Dψλ(p̂,p0).

Zografos et al. [11] established under (1.1) thatTφ,n=2nφ(1)Dφ(p̂,p0)is asymptotically chi-square distributed with k−1 degrees of freedom. Obviously the Pearson's statistic (1.2) coincides with Tφ,n for φ(x)=12(x−1)2. AlsoTφλ,n=2nIλ(p̂,p0)is asymptotically chi-square distributed with k−1 degrees of freedom. This result was established by Cressie and Read [3].

In a similar way to Jhun and Jeong [4], in this paper we are interested in constructing confidence regions for p based on the asymptotic distribution of (1.5) as well as on bootstrap methods. These authors present a simulation study based on the Pearson statistic given in (1.2). From a chronological point of view, Watson and Nguyen [9] and Watson [10] were the first authors who considered the problem of constructing confidence regions for p=(p1,p2,p3) in Trinomial distributions. Their method was based on the asymptotic distribution of Pearson's statistic given in (1.2). Medak and Cressie [6] extended their results by using the power-divergence family of statistics given in (1.6).

In Section 2 confidence regions are introduced. In Section 3 two Monte Carlo simulation experiments are carried out to calculate average coverage probabilities and to make comparisons.

Section snippets

Simultaneous confidence regions

Let {S(x1,…,xk)} be a family of subsets of the parameter spaceΔk=p=(p1,…,pk):pj>0,∑j=1kpj=1,where x1,…,xk⩾0 are integers such that x1+⋯+xk=n. {S(x1,…,xk)} is said to be a family of confidence regions for p at confidence level 1−α, ifPp(S(X1,…,Xk)containsp)=1−αforallp∈Δk.

Confidence regions for proportions of a Multinomial population is one of the basic tools in statistical inference for categorical data. Divergence measures play also an important role in this area (see, e.g. Read and Cressie [7]

Monte Carlo investigation

In this section two Monte Carlo simulations are done in order to study the performance of (2.1) and (2.2) for φλ(x) given in (1.4) and λ∈L={−2,−1,−1/2,0,2/3,1,4/3,2,3}. Some of these selected values of λ correspond to well known goodness-of-fit test statistics, like Neyman modified X2 (λ=−2), Minimum discrimination information (λ=−1), Freeman–Tukey (λ=−1/2), log-likelihood ratio (λ=0), Cressie–Read (λ=2/3) and Pearson's X2 (λ=1).

In the first experiment 10 000 samples (replications) are drawn

References (11)

  • M Jhun et al.

    Applications of bootstrap methods for categorical data analysis

    Computational Statistics & Data Analysis

    (2000)
  • S.M Ali et al.

    A general class of coefficient of divergence of one distribution from another

    Journal of Royal Statistical Society, Series B

    (1966)
  • I Csiszár

    Eine Informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten

    Publications of the Mathematical Institute of Hungarian Academy of Sciences, Series A

    (1963)
  • N.A.C Cressie et al.

    Multinomial goodness-of-fit tests

    Journal of the Royal Statistical Society, Series B

    (1984)
  • F Liese et al.

    Convex Statistical Distances

    (1987)
There are more references available in the full text version of this article.

Cited by (0)

This work was supported by the grants BMF2003-00892 and BMF 2003-04820.

View full text