Abstract
A new means of estimating the correlation coefficient for cluster binary data in the regression settings is introduced. The creation of this method is founded upon the violation of Bartlett’s second identity when adopting the binomial distributions to model binary data that are correlated. The new methodology applies to any sensible link functions that connect the success probability and covariates. One can easily implement the procedure by using any statistical software providing the naïve and the sandwich covariance matrices for regression parameter estimates. Simulations and real data analyses are used to demonstrate the efficacy of our new procedure.
Similar content being viewed by others
References
Birnbaum LS, Harris MW, Stocking LM, Clark AM, Morrissey RE (1989) Retinoic acid selectively enhances teratogenesis in C57BL/6N mice. Toxicol Appl Pharmacol 98:487–500
Birnbaum LS, Morrissey RE, Harris MW (1991) Teratogenic effects of 2,3,7,8-tetrabromodibenzo-p-dioxin and three polybrominated dibenzofurans in C57BL/6N mice. Toxicol Appl Pharmacol 107:141–192
Blizzard L, Hosmer DW (2006) Parameter estimation and goodness-of-fit in log binomial regression. Biom J 48:5–22
Crowder MJ (1978) Beta-binomial ANOVA for proportions. J R Stat Soc Ser C 27:34–37
Heindel JJ, Price CJ, Field EA, Marr MC, Myers CB, Morrissey RE, Schwetz BA (1992) Developmental toxicity of boric acid in mice and rats. Fundam Appl Toxicol 18:266–277
Lee Y (2004) Estimating intraclass correlation for binary data using extended quasi-likelihood. Stat Model 4:113–126
Liang KY, Zeger SL, Qaqish B (1992) Multivariate regression analyses for categorical data (with discussion). J Roy Stat Soc B 54:3–40
McCullagh P (1983) Quasi-likelihood functions. Ann Stat 11:59–67
McCullagh P, Nelder JA (1989) Generalized linear models. Chapman and Hall, London
Presnell B, Boos DD (2004) The IOS test for model misspecification. J Am Stat Assoc 99:216–227
Qaqish BF (2003) A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika 90:455–463
Ridout MS, Demétrio CGB, Firth D (1999) Estimating intraclass correlation for binary data. Biometrics 55:137–148
Slaton TL, Piegorsch WW, Durham SD (2000) Estimation and testing with overdispersed proportions using the beta-logistic regression model of heckman and willis. Biometrics 56:125–133
Stefanski LA, Boos DD (2002) The calculus of M-estimation. Am Stat 56:29–38
Weil CS (1970) Selection of the valid number of sampling units and a consideration of their combination in toxicological studies involving reproduction, teratogenesis or carcinogenesis. Food Cosmet Toxicol 8:177–182
Willams DA (1975) The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics 31:949–952
Zou G, Donner A (2004) Confidence interval estimation of the intraclass correlation. Biometrics 60:807–811
Acknowledgments
This work is supported by grant NSC 100-2118-M-008-001-MY2 of the National Science Council, and National Central University-Land Seed Hospital Joint Research grant NCU-LSH-100-A-005, Taiwan, R.O.C.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
As remarked by Presnell and Boos (2004), the asymptotic variance of \(trace(\widehat{I}^{-1}\widehat{V})\) is equal to the bottom right element of \(C^{-1}D(C^{-1})^{t}/m\), where C and D are \(6\times 6\) matrices, with
and D having elements denoted by \(D_{ij} ,i,j=1,2,3\) introduced later.
Consider a simple regression model with \(\eta _i =\beta _0 + x_i \beta _1\). The log likelihood function is
For simplicity, we will use \(l_i\) to denote \(l_i(\beta )\).
We first let \({p}^{\prime }_i\) and \(p_i^{\prime \prime }\) denote the first and second derivatives of\(p_i \)with respect to \(\beta \). Then \(l_i^{\prime }=\partial l_i /\partial \beta =w_i [1,x_i]^{t}\), where \(w_i =(y_i -n_i p_i){p}^{\prime }_i /\left({p_i (1-p_i)} \right)\) and
where
which is the derivative of \(w_i\) with respect to \(p_i\).
Note that if the logistic function is employed, then \(p_i =e^{\eta _i}/(1+e^{\eta _i})\) giving \({p}^{\prime }_i = p_i\left({1-p_i} \right)\) and \({p}^{\prime \prime }_i = p_i\left({1-p_i}\right)(1-2p_i)\).
Let \(I^{-1}\) denote the inverse of the matrix \(I=-E(l^{\prime \prime })=\frac{1}{m}\sum _{i=1}^m {\frac{n_i {p}^{\prime 2}_i}{p_i (1-p_i)} \left[{{\begin{array}{cc} 1&{x_i} \\ {x_i}&{x_i^2} \\ \end{array}}} \right]}\). One can show that
where \(I_{ij}\) denotes the \(i\text{-}j\text{ th}\) element of \(I\), and
The sub-matrices \(D_{ij},i,j=1,2,3\) constituting \(D\) are, respectively,
where \(E({w}_i^{\prime 2})=Var(Y_i)\left[{\frac{{p}^{\prime \prime 2} +n_i^{2}{p}_i^{\prime 4}}{[p_i (1-p_i)]^{2}}-\frac{2{p}^{\prime \prime }_i ({p}_i ^{\prime 2}-2p_i {p}_i^{\prime 2})}{[p_i (1-p_i)]^{3}}+\frac{({p}_i ^{\prime 2}-2p_i {p}_i^{\prime 2})^{2}}{[p_i (1-p_i)]^{4}}} \right]\),
where \(E(w_i^2 {w}^{\prime }_i)=\frac{E(Y_i -n_i p_i)^{3}{p}_i ^{\prime 2}{p}^{\prime \prime }_i}{[p_i (1-p_i)]^{3}}-\frac{Var(Y_i)n_i {p}_i^{\prime 4}}{[p_i (1-p_i)]^{3}}-\frac{E(Y_i -n_i p_i)^{3}{p}_i ^{\prime 4}(1-2p_i)}{[p_i (1-p_i)]^{4}}\), and
Note that \(D_{33}\) is a scalar.
Rights and permissions
About this article
Cite this article
Tsou, TS., Chen, WC. Estimation of intra-cluster correlation coefficient via the failure of Bartlett’s second identity. Comput Stat 28, 1681–1698 (2013). https://doi.org/10.1007/s00180-012-0372-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-012-0372-7