Abstract
For a two-way contingency table, odds ratios are commonly used to describe the relationships between the row and column variables. In the ordinary case cells are mutually exclusive, that is each subject must fit into one and only one cell. However, in many surveys respondents may select more than one outcome category, commonly referred to as multiple responses. We discuss model-based and Mantel–Haenszel estimators of an assumed common odds ratio for several \(2\times c\) tables, where the two rows refer to independent groups and the c columns to multiple responses, treating the multiple responses as an extension of the multinomial sampling model. We derive new dually consistent (co)variance estimators for the Mantel–Haenszel odds ratio estimators and show their performance in a simulation study and illustrate the estimators on a linguistic data set.
Similar content being viewed by others
References
Agresti A (2013) Categorical data analysis, 3rd edn. Wiley series in probability and statistics. Wiley, New Jersey
Agresti A, Liu I (2001) Strategies for modeling a categorical variable allowing multiple category choices. Sociol Method Res 29(4):403–434
Agresti A, Liu IM (1999) Modeling a categorical variable allowing arbitrarily many category choices. Biometrics 55(3):936–943
Bergsma W, Croon M, Hagenaars J (2009) Marginal models for dependent, clustered, and longitudinal categorical data. Springer, New York
Bilder CR, Loughin TM (2001) On the first-order Rao–Scott correction of the Umesh–Loughin–Scherer statistic. Biometrics 57(4):1253–1255
Bilder CR, Loughin TM (2002) Testing for conditional multiple marginal independence. Biometrics 58(1):200–208
Bilder CR, Loughin TM (2004) Testing for marginal independence between two categorical variables with multiple responses. Biometrics 60(1):241–248
Bilder CR, Loughin TM (2009) Modeling multiple-response categorical data from complex surveys. Canad J Stat Rev Canadienne De Statistique 37(4):553–570
Bilder CR, Loughin TM, Nettleton D (2000) Multiple marginal independence testing for pick any/c variables. Commun Stat Simul Comput 29(4):1285–1316
Davison A, Hinkley D (1997) Bootstrap methods and their application. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Oxford
Decady YJ, Thomas DR (2000) A simple test of association for contingency tables with multiple column responses. Biometrics 56(3):893–896
Greenland S (1989) Generalized Mantel–Haenszel estimators for K 2\(\times \)J tables. Biometrics 45(1):183–191
Gu PY, Hu G, Zhang LJ (2005) Investigating language learner strategies among lower primary school pupils in Singapore. Lang Educ 19(4):281–303
Gu Y (2002) Gender, academic major, and vocabulary learning strategies of Chinese EFL learners. RELC J 33(1):35–54
Haber M (1985) Maximum-likelihood methods for linear and log-linear models in categorical-data. Comput Stat Data Anal 3(1):1–10
Keogh RH, Cox DR (2014) Case–control studies, vol 4. Cambridge University Press, Cambridge
Kleinbaum D, Kupper L, Morgenstern H (1982) Epidemiologic research: principles and quantitative methods. Lifetime Learning Publications, Belmont
Lang JB (1996) Maximum likelihood methods for a generalized class of log-linear models. Ann Stat 24(2):726–752
Lang JB, Agresti A (1994) Simultaneously modelling joint and marginal distributions of multivariate categorical responses. J Am Stat Assoc 89(426):625–632
Liang KY, Zeger SL (1986) Longitudinal data-analysis using generalized linear-models. Biometrika 73(1):13–22
Liu I (2003) Describing ordinal odds ratios for stratified r \(\times \) c tables. Biometrical J 45(6):730–750
Liu I, Suesse T (2008) The analysis of stratified multiple responses. Biometrical J 50(1):135–149
Liu IM, Agresti A (1996) Mantel–Haenszel-type inference for cumulative odds ratios with a stratified ordinal response. Biometrics 52(4):1223–1234
Loughin TM, Scherer P (1998) Testing for association in contingency tables with multiple column responses. Biometrics 54:630–637
Mantel N, Haenszel W (1959) Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22(4):719–748
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, New York
Mickey RM, Elashoff RM (1985) A generalization of the Mantel–Haenszel estimator of partial association for 2\(\times \)J\(\times \)K-tables. Biometrics 41(3):623–635
Nurminen M (1981) Asymptotic efficiency of general non-iterative estimators of common relative risk. Biometrika 68(2):525–530
Sen PK, Singer JM (1993) Large sample methods in statistics: an introduction with applications. Chapman & Hall, New York
Suesse T, Liu I (2012) Mantel–Haenszel estimators of odds ratios for stratified dependent binomial data. Comput Stat Data Anal 56:2705–2717
Thomas DR, Decady YJ (2004) Testing for association using multiple response survey data: approximate procedures based on the Rao–Scott approach. Int J Test 4(1):43–59
Acknowledgements
We would like to thank the referees for their helpful comments that greatly improved the paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Dual consistency of \({{\varPsi }}^*_{jh}\)
The estimator
which converges to
Under model (1\(^*\)), the term on the right hand side becomes 1, and dual consistency applies because \(\theta _h= \exp (\beta _{ah}-\beta _{bh})\). However under model (1), this is not the case because of the \(\gamma _{ik}\) terms. This means that the estimator \({\hat{\theta }}_h\) converges to \(\theta _h \times c\), where c is a constant.
Now \({\hat{\theta }}_j\) converges under model (1) to \(\theta _j \times c\), therefore \({{\varPsi }}^*_{jh}={\hat{\theta }}_j/{\hat{\theta }}_h\) converges to \(\frac{\theta _j \times c}{\theta _h \times c}={\varPsi }_{jh}\), and dual consistency even applies under model (1).
Dual consistency of covariance estimator of two MH relative risk estimators
Showing that \(\text {Cov}(L_j,L_h)\) is consistent is equivalent to showing that \(\text {Cov}({\hat{\theta }}_j,{\hat{\theta }}_h)\) is consistent by application of delta method to log-function. Hence we need to show that
is consistent for \(\text {Cov}({\hat{\theta }}_j,{\hat{\theta }}_h)\).
We can show that
and
which can be estimated under both limiting models by
with \(d_{jh|ak}=(X_{j|ak}X_{h|ak}-X_{jh|ak})/n_{ak}'\) and \(n_{ak}'=n_{ak}-1\).
Dual consistency of ordinary MH estimator
1.1 Sparse data limiting model
For the sparse data limiting model, the number of observations per stratum is bounded (\(O(N_k)=1\)) and K approaches infinity.
From \(\pi _{j|1k}\pi _{h|2k}= {\varPsi }_{jh}\pi _{h|1k}\pi _{j|2k}\), which follows from the assumption of a common odds ratio, and Eq. (11), we derive
We can write
with with \(\omega _{jh|k}:=c_{jh|k}-{\varPsi }_{jh}c_{hj|k}\) and \({\varOmega }_{jh}:=\sum _k\omega _{jh|k}\).
The term \(c_{jh|k}\) is a bounded random variable under model II, hence, the variance of \(C_{jh}\) is \(o(K^2)\) and Chebyshev’s weak law of large numbers states \(({\varOmega }_{jh}-\mathbb {E}{\varOmega }_{jh})/K \)\( {\rightarrow }_p0\). Since \(\mathbb {E}\omega _{jh|k}=0\), the expression \(({\varOmega }_{jh}-\mathbb {E}{\varOmega }_{jh})/K \)\( {\rightarrow }_p0\) reduces to \({\varOmega }_{jh}/K {\rightarrow }_p0\), that is, the numerator of \({\hat{{\varPsi }}}_{jh}-{\varPsi }_{jh}\) converges to zero in probability. Applying the Chebyshev weak law of large numbers again to the denominator yields
This limit is finite and nonzero. Thus, we conclude \({\hat{{\varPsi }}}_{jh}-{\varPsi }_{jh}{\rightarrow }_p0\) by Slutsky’s theorem.
1.2 Large stratum limiting model
Let us consider the case \(N\rightarrow \infty \) with \(N\alpha _{ik}=n_{ik}\) and \(0<\alpha _{ik}<1\), that is, as N approaches infinity the number of subjects \(n_{ik}\), for all rows i and strata k, also approaches infinity. Note \(N_k=n_{1k}+n_{2k}=N\sum _{i}\alpha _{ik}\).
We have
Therefore
Asymptotic covariances
1.1 Sparse-data limiting model
Let \(\text {Var}^a(\cdot )\) and \(\text {Cov}^a(\cdot )\) refer to the asymptotic variance and covariance. From above \({\hat{{\varPsi }}}_{jh}-{\varPsi }_{jh}=\frac{{\varOmega }_{jh} /K}{ C_{hj}/K}=\frac{\sum _k \omega _{jh|k} /K}{ C_{hj}/K}\).
First by independence of rows \(\text {Cov}({\varOmega }_{jh},{\varOmega }_{ts})=\sum _{k=1}^K \text {Cov}(\omega _{jh|k},\omega _{ts|k})\). Note that \( \mathbb {E}|\omega _{jh|k}-\mathbb {E}\omega _{jh|k}|^3\)\(=\mathbb {E}|\omega _{jh|k}|^3\)\(=O(1)\) , because \(c_{jh|k}\) is a bounded random variable under the sparse-data limiting model. By setting \(\delta =1\), we conclude from the Multivariate Central Limit Theorem (Sen and Singer 1993, p.123) that \({K}^{-1/2}\)\(({\varOmega }_{jh},\)\({\varOmega }_{ts})=\)\(\sqrt{K}({\varOmega }_{jh}/K,\)\({\varOmega }_{ts}\) / K) converges to a zero mean multivariate normal distribution with covariance \(\lim _{K\rightarrow \infty }\)\(\frac{1}{K}\)\(\sum _{k=1}^K \text {Cov}(\omega _{jh|k},\omega _{ts|k})\), by noting that \(\mathbb {E}\omega _{jh|k}=0\) and \(\text {Cov}(\omega _{jh},\)\(\omega _{ts})\) exists. We conclude the asymptotic covariance between \({\varOmega }_{jh}\) and \({\varOmega }_{ts}\) is \(\lim _{K\rightarrow \infty } K \cdot \text {Cov}^a({\varOmega }_{jh} ,{\varOmega }_{ts})=\lim _{K\rightarrow \infty }\)\(\frac{1}{K}\)\(\sum _{k=1}^K \text {Cov}(\omega _{jh|k},\omega _{ts|k})\).
Therefore by the delta method, Slutsky’s theorem, Eq. (13), and using that the denominator terms \(\lim _{K}\mathbb {E}C_{hj}/K\) are finite we obtain
for arbitrary indices \(j,h,s,t \in \{1,\dots ,c \}\) with \(j\ne h\) and \(s\ne t\).
Now we obtain the following variance
and covariances
with
The subscript k is often suppressed for convenience only.
The (co)variance estimators were constructed in such a way that they converge exactly to the asymptotic (co)variance(s). We can also express \(U_{jhs}\) as \(U_{jhs}=U_{jhs}^{add}\) omitting \(U_{jhs}^{old}\) but only if \({\hat{v}}_{jhs|abk}^B\) is amended to \({\hat{v}}_{jhs|abk}^B = \frac{1}{N_k^2}X_{j|ak}\{ X_{h|bk}X_{s|bk}-X_{hs|bk} \}\). Then for the covariance estimators we have \(\sum _k {\hat{v}}_k/K \overset{K\rightarrow \infty }{\longrightarrow } \sum _k \mathbb {E}{\hat{v}}_k/K = \lim _K \sum _k v_k/K\) and \(\sum _k c_{jh|k}/K \overset{K\rightarrow \infty }{\longrightarrow }\sum _k \mathbb {E}c_{jh|k}/K\) by Chebyshev’s weak law of large numbers.
1.2 Large-stratum limiting model
By the delta method, the large stratum limiting variance is
and the limiting covariances are
The estimators were constructed such that
Rights and permissions
About this article
Cite this article
Suesse, T., Liu, I. Mantel–Haenszel estimators of a common odds ratio for multiple response data. Stat Methods Appl 28, 57–76 (2019). https://doi.org/10.1007/s10260-018-0429-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-018-0429-z