Testing hypothesis for a simple ordering in incomplete contingency tables

https://doi.org/10.1016/j.csda.2016.01.003Get rights and content

Abstract

A test for ordered categorical variables is of considerable importance, because they are frequently encountered in biomedical studies. This paper introduces a simple ordering test approach for the two-way r×c contingency tables with incomplete counts by developing six test statistics, i.e., the likelihood ratio test statistic, score test statistic, global score test statistic, Hausman–Wald test statistic, Wald test statistic and distance-based test statistic. Bootstrap resampling methods are also presented. The performance of the proposed tests is evaluated with respect to their empirical type I error rates and empirical powers. The results show that the likelihood ratio test statistic based on the bootstrap resampling methods perform satisfactorily for small to large sample sizes. A real example from a wheeze study in six cities is used to illustrate the proposed methodologies.

Introduction

In biomedical studies, especially in clinical trials, incomplete ordered categorical data arise quite frequently. Incomplete ordered data can occur for various reasons. For example, Ware et al. (1984), Lipsitz and Fitzmaurice (1996) and Tang et al., 2007a, Tang et al., 2007b considered a wheeze study from six cities. The data are summarized in Table 1. The columns of Table 1 correspond to the wheezing status (Y=1: no wheeze; Y=2: wheeze with cold; Y=3: wheeze apart from cold) of a child at age 10. The rows represent the smoking status of the child’s mother (X=1: none; X=2: moderate; X=3: heavy) during that time. Note that for some individuals the maternal smoking variable is missing whereas for others the child’s wheezing status is missing. Thus, the resultant data include two parts: the complete observations and the incomplete observations. Following the arguments of Ware et al. (1984), Lipsitz and Fitzmaurice (1996) and Tang et al., 2007a, Tang et al., 2007b, we assume that the missing mechanism is missing at random (MAR; Rubin, 1976).

Under missing at random, one is often interested in investigating whether there is positive association between two ordered variables; that is, as the maternal smoking increases, whether a more effects on respiratory illness in children tends to occur. For this purpose, we can consider testing ordering alternatives. Statistical inference for ordering alternatives has been widely studied in the literature. For example, Robertson and Wright (1981) considered the likelihood ratio statistic to test equality of two multinomial distributions against the simple stochastic ordering, and presented explicit expressions of the maximum likelihood estimates (MLEs). Lucas and Wright (1991) considered a similar problem and derived the asymptotic distributions for the simple stochastic ordering in discrete univariate and bivariate cases. For more than two multinomial populations, Wang (1996) developed a test for the equality of distributions against simple stochastic ordering in several populations. In that paper, the limit distribution of likelihood ratio test statistic was characterized by minimization problems and has no closed form. Dardanoni and Forcina (1998) considered the same hypothesis test problem and gave the asymptotic distribution of the likelihood ratio statistic. Their results were cited by Silvapulle and Sen (2005). Feng and Wang (2007) also considered the same hypothesis test and obtained the asymptotic distribution of likelihood ratio test statistic by completely different approach from that in Dardanoni and Forcina (1998). In order to obtain the null asymptotic distribution of the likelihood ratio statistic, Feng and Wang (2007) transformed the simple stochastic ordering constraint into a polyhedral cone constraint, and the restricted MLEs are characterized by maximization problem with a concave objective function and a series of linear inequality constraints. Thus, the desired null asymptotic distribution was obtained by limit theory of the optimization. Klingenberg et al. (2009) presented an alternative bootstrap approach to test marginal homogeneity against stochastic ordering in two-sample multivariate ordinal data. Davidov and Peddada (2011) developed a general methodology for testing the multivariate stochastic ordering.

However, none of the aforementioned works has been generalized to incomplete r×c tables with correlated ordinal data. Besides, the score statistic and the Wald statistic are not yet discussed in above-mentioned references. Note that the likelihood ratio, score and Wald test statistics are asymptotically equivalent. However, when parameter space is constrained, the likelihood ratio, score, Wald test statistics for a simple ordering restriction with incomplete data have not yet been considered in the literature. Hence, it is the aim of this article to consider the likelihood ratio test statistic, the score test statistic, the global score test statistic, the Hausman–Wald test statistic, the Wald test statistic and distance-based test statistic and to present bootstrap resampling methods to test a simple ordering restriction in incomplete r×c tables.

The rest of this paper is organized as follows. In Section  2, we transform the problem of testing simple ordering into a polyhedral cone constrained problem. Section  3 presents the likelihood ratio test statistic, the score test statistic, the global score test statistic, the Hausman–Wald test statistic, the Wald test statistic and distance-based test statistic and bootstrap resampling methods for testing a simple ordering in incomplete r×c tables. Simulation studies are conducted to investigate the performance of various methods in Section  4. A real example from the aforementioned wheeze study in six cities is used to illustrate the proposed methodologies in Section  5. Some concluding remarks are given in Section  6.

Section snippets

The formulation of the simple ordering test

Let X and Y be two correlated ordinal variables with the joint distribution πij=Pr(X=i,Y=j) for i=1,,r and j=1,,c. The observed counts and the corresponding cell probabilities for n=i=1rj=1cnij complete observations and mx+my=i=1rmix+j=1cmyj partially incomplete observations are listed in Table 2. In this paper, we use Tn={(x1,,xn):xk0,k=1,,n,k=1nxk=1} to denote the n-dimensional hyperplane.

Let π=(π1,,πr)Trc be the unknown parameter vector, where πi=(πi1,,πic). Assume that nij

Six testing hypothesis approaches

In this section, we develop six test statistics to test H0 against H1 in (2.5) and provide a bootstrap method for calculating the corresponding p-values for these tests.

Monte Carlo simulation studies

In this section, to investigate the finite-sample performance of the preceding proposed test statistics, we calculate the type I error rates and powers in various parameter settings via the Monte Carlo simulations. For simplicity, we focus our discussion on the number of ordinal categories r=c=3.

The total sample size N is set to be 10, 30, 50, 100 and 200. The proportion of the incomplete data is set to be 20%, 30%, 40% and 50%. The parameter vector of the missing-data mechanism (ϕ1,ϕ2,ϕ3) is

A real example

In this section, we re-visit the study considered by Ware et al. (1984), Lipsitz and Fitzmaurice (1996) and Tang et al. (2007a). From Table 1, we have n11=287, n12=39, n13=38, n21=18, n22=6, n23=4, n31=91, n32=22, n33=23, m1x=279, m2x=27, m3x=201, my1=59, my2=18 and my3=26. We are primarily interested in testing the effects of maternal smoking on respiratory illness in children. The proposed test procedures are used to test H0 against H1 specified by (2.5). Values of the six test statistics

A discussion

In this article, we considered the test for simple ordering restriction in r×c contingency tables with incomplete observations. An example taken from the maternal smoking and childhood’s wheezing study is used to illustrate the motivation and applicability of the proposed methods. The likelihood ratio test statistic, the score test statistic, the global score test statistic, the Hausman–Wald test statistic, the Wald test statistic and distance-based test statistic were proposed to test a simple

Acknowledgments

The authors would like to thank an Associate Editor and two referees for their helpful comments and suggestions, which resulted in a significant improvement of the paper. The research of Hui-Qiong LI was fully supported by the Natural Science Foundation of China (11201412, 11561075). Xuejun Jiang’s research was partially supported by a grant (NSFC   11101432) from the Natural Science Foundation of China.

References (18)

There are more references available in the full text version of this article.
View full text