Estimation of intra-cluster correlation coefficient via the failure of Bartlett’s second identity

Tsou, Tsung-Shan; Chen, Wan-Chen

doi:10.1007/s00180-012-0372-7

Estimation of intra-cluster correlation coefficient via the failure of Bartlett’s second identity

Original Paper
Published: 11 October 2012

Volume 28, pages 1681–1698, (2013)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Tsung-Shan Tsou^1,2 &
Wan-Chen Chen^3,4

230 Accesses
Explore all metrics

Abstract

A new means of estimating the correlation coefficient for cluster binary data in the regression settings is introduced. The creation of this method is founded upon the violation of Bartlett’s second identity when adopting the binomial distributions to model binary data that are correlated. The new methodology applies to any sensible link functions that connect the success probability and covariates. One can easily implement the procedure by using any statistical software providing the naïve and the sandwich covariance matrices for regression parameter estimates. Simulations and real data analyses are used to demonstrate the efficacy of our new procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

Mixed methods research: what it is and what it could be

Article Open access 29 March 2019

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

References

Birnbaum LS, Harris MW, Stocking LM, Clark AM, Morrissey RE (1989) Retinoic acid selectively enhances teratogenesis in C57BL/6N mice. Toxicol Appl Pharmacol 98:487–500
Article Google Scholar
Birnbaum LS, Morrissey RE, Harris MW (1991) Teratogenic effects of 2,3,7,8-tetrabromodibenzo-p-dioxin and three polybrominated dibenzofurans in C57BL/6N mice. Toxicol Appl Pharmacol 107:141–192
Article Google Scholar
Blizzard L, Hosmer DW (2006) Parameter estimation and goodness-of-fit in log binomial regression. Biom J 48:5–22
Article MathSciNet Google Scholar
Crowder MJ (1978) Beta-binomial ANOVA for proportions. J R Stat Soc Ser C 27:34–37
Google Scholar
Heindel JJ, Price CJ, Field EA, Marr MC, Myers CB, Morrissey RE, Schwetz BA (1992) Developmental toxicity of boric acid in mice and rats. Fundam Appl Toxicol 18:266–277
Article Google Scholar
Lee Y (2004) Estimating intraclass correlation for binary data using extended quasi-likelihood. Stat Model 4:113–126
Article MathSciNet MATH Google Scholar
Liang KY, Zeger SL, Qaqish B (1992) Multivariate regression analyses for categorical data (with discussion). J Roy Stat Soc B 54:3–40
Google Scholar
McCullagh P (1983) Quasi-likelihood functions. Ann Stat 11:59–67
Article MathSciNet MATH Google Scholar
McCullagh P, Nelder JA (1989) Generalized linear models. Chapman and Hall, London
MATH Google Scholar
Presnell B, Boos DD (2004) The IOS test for model misspecification. J Am Stat Assoc 99:216–227
Article MathSciNet MATH Google Scholar
Qaqish BF (2003) A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika 90:455–463
Article MathSciNet MATH Google Scholar
Ridout MS, Demétrio CGB, Firth D (1999) Estimating intraclass correlation for binary data. Biometrics 55:137–148
Article MATH Google Scholar
Slaton TL, Piegorsch WW, Durham SD (2000) Estimation and testing with overdispersed proportions using the beta-logistic regression model of heckman and willis. Biometrics 56:125–133
Article MATH Google Scholar
Stefanski LA, Boos DD (2002) The calculus of M-estimation. Am Stat 56:29–38
Article MathSciNet Google Scholar
Weil CS (1970) Selection of the valid number of sampling units and a consideration of their combination in toxicological studies involving reproduction, teratogenesis or carcinogenesis. Food Cosmet Toxicol 8:177–182
Article MathSciNet Google Scholar
Willams DA (1975) The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics 31:949–952
Article Google Scholar
Zou G, Donner A (2004) Confidence interval estimation of the intraclass correlation. Biometrics 60:807–811
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work is supported by grant NSC 100-2118-M-008-001-MY2 of the National Science Council, and National Central University-Land Seed Hospital Joint Research grant NCU-LSH-100-A-005, Taiwan, R.O.C.

Author information

Authors and Affiliations

Institute of Statistics, Institute of Systems Biology and Bioinformatics, Center for Biotechnology and Biomedical Engineering, National Central University, Jhongli, Taiwan
Tsung-Shan Tsou
Cathay Medical Research Institute, Cathay General Hospital, Taipei, Taiwan
Tsung-Shan Tsou
Institute of Statistics, National Central University, Jhongli, Taiwan
Wan-Chen Chen
Department of General Education, Army Academy, Jhongli, Taiwan
Wan-Chen Chen

Authors

Tsung-Shan Tsou
View author publications
You can also search for this author in PubMed Google Scholar
Wan-Chen Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tsung-Shan Tsou.

Appendix

As remarked by Presnell and Boos (2004), the asymptotic variance of $trace(\widehat{I}^{-1}\widehat{V})$ is equal to the bottom right element of $C^{-1}D(C^{-1})^{t}/m$, where C and D are $6\times 6$ matrices, with

$$\begin{aligned} C=\left({{\begin{array}{ccc} {-I}&\quad 0&\quad 0 \\ {E(\partial vech({l}^{\prime \prime })/\partial \beta ^{t})}&\quad {-H_q (I\otimes I)G_q }&\quad 0 \\ {2E({l}^{\prime t}I^{-1}{l}^{\prime \prime })}&\quad {(vech[2E({l}^{\prime }{l}^{\prime t})-diag\{E({l}^{\prime }{l}^{\prime t})\}])^{t}}&\quad {-1} \\ \end{array} }} \right) \end{aligned}$$

and D having elements denoted by $D_{ij} ,i,j=1,2,3$ introduced later.

Consider a simple regression model with $\eta _i =\beta _0 + x_i \beta _1$. The log likelihood function is

$$\begin{aligned} l(\beta )=\sum _{i=1}^m {l_i (\beta )=} \sum _{i=1}^m {\left[{y_i \log \left({\frac{p_i}{1-p_i}}\right)+n_i \log \left({1-p_i} \right)}\right]} +\sum _{i=1}^m {\log } \left({{\begin{array}{c} {n_i } \\ {y_i } \\ \end{array}}} \right). \end{aligned}$$

For simplicity, we will use $l_i$ to denote $l_i(\beta )$.

We first let ${p}^{\prime }_i$ and $p_i^{\prime \prime }$ denote the first and second derivatives of$p_i $with respect to $\beta $. Then $l_i^{\prime }=\partial l_i /\partial \beta =w_i [1,x_i]^{t}$, where $w_i =(y_i -n_i p_i){p}^{\prime }_i /\left({p_i (1-p_i)} \right)$ and

$$\begin{aligned} l_i^{\prime \prime } =\partial ^{2}l_i / \partial \beta \partial \beta ^{t}= w_i^{\prime } \left[{{\begin{array}{cc} 1&{x_i} \\ {x_i}&{x_i^2} \\ \end{array}}} \right], \end{aligned}$$

where

$$\begin{aligned} w_i^{\prime } = \frac{{p}^{\prime \prime }_i (y_i -n_i p_i)-n_i {p}_i ^{\prime 2}}{p_i (1-p_i)}-\frac{{p}_i^{\prime 2}(y_i -n_i p_i )(1 -2p_i)}{[p_i (1-p_i)]^{2}}, \end{aligned}$$

which is the derivative of $w_i$ with respect to $p_i$.

Note that if the logistic function is employed, then $p_i =e^{\eta _i}/(1+e^{\eta _i})$ giving ${p}^{\prime }_i = p_i\left({1-p_i} \right)$ and ${p}^{\prime \prime }_i = p_i\left({1-p_i}\right)(1-2p_i)$.

Let $I^{-1}$ denote the inverse of the matrix $I=-E(l^{\prime \prime })=\frac{1}{m}\sum _{i=1}^m {\frac{n_i {p}^{\prime 2}_i}{p_i (1-p_i)} \left[{{\begin{array}{cc} 1&{x_i} \\ {x_i}&{x_i^2} \\ \end{array}}} \right]}$. One can show that

$$\begin{aligned} E\left[{\frac{\partial }{\partial \beta ^{t}}vech(l^{\prime \prime })} \right]&= \frac{1}{m}\sum _{i=1}^m {\left({\frac{2n_i {p}_i^{\prime 3}(1-2p_i)}{[p_i (1-p_i)]^{2}}-\frac{3n_i {p}^{\prime }_i {p}^{\prime \prime }_i}{p_i (1-p_i)}} \right) \left[{{\begin{array}{cc} 1&{x_i} \\ {x_i}&{x_i^2} \\ {x_i^2}&{x_i^3} \\ \end{array}}} \right]},\\ -H_q (I\otimes I)G_q&= \left[{{\begin{array}{cccc} 1&0&0&0 \\ 0&1&0&0 \\ 0&0&0&1 \\ \end{array}}}\right] \left[{{\begin{array}{cc} {{\begin{array}{cc} {I_{11}^{2}}&{I_{11} I_{12}} \\ {I_{11} I_{12}}&{I_{22}^{2}} \\ \end{array}}}&{{\begin{array}{cc} {I_{11} I_{12}}&{I_{12}^{2}} \\ {I_{12}^{2}}&{I_{12} I_{22}} \\ \end{array}}} \\ {{\begin{array}{cc} {I_{11} I_{12}}&{I_{12} ^{2}} \\ {I_{12}^{2}}&{I_{12} I_{22}} \\ \end{array}}}&{{\begin{array}{cc} {I_{11} I_{22}}&{I_{12} I_{22}} \\ {I_{12} I_{22}}&{I_{22}^{2}} \\ \end{array}}} \\ \end{array}}}\right] \left[{{\begin{array}{ccc} 1&0&0 \\ 0&1&0 \\ 0&1&0 \\ 0&0&1 \\ \end{array}}} \right], \end{aligned}$$

where $I_{ij}$ denotes the $i\text{-}j\text{ th}$ element of $I$, and

$$\begin{aligned} E(l^{\prime t}I^{-1}l^{\prime \prime })&= \frac{1}{m}\sum _{i=1}^m \frac{Var(Y_i)\left[{{p}^{\prime }_i {p}^{\prime \prime }_i p_i (1-p_i)-{p}_i ^{\prime 3}(1-2p_i)} \right]}{[p_i (1-p_i)]^{3}}\\&\quad \left\{ {\left[{1,x_i}\right]I^{-1} \left[{{\begin{array}{cc} 1&{x_i} \\ {x_i}&{x_i^2} \\ \end{array}}}\right]}\right\} . \end{aligned}$$

The sub-matrices $D_{ij},i,j=1,2,3$ constituting $D$ are, respectively,

$$\begin{aligned} D_{11}&= E(l^{\prime }l^{\prime t})=\frac{1}{m}\sum _{i=1}^m {\frac{Var(Y_i){p}_i^{\prime 2}}{[p_i (1-p_i)]^{2}} \left[{{\begin{array}{cc} 1&{x_i} \\ {x_i}&{x_i^2} \\ \end{array}}} \right]},\\ D_{12}&= D_{21}^{t}=E\{{l}^{\prime }(vech{l}^{\prime \prime })^{t}\} \text{=}\frac{\text{1}}{\text{ m}}\sum _{i=1}^m \frac{Var(Y_i)[{p}^{\prime }_i {p}^{\prime \prime }_i p_i(1-p_i)-{p}_i^{\prime 3}(1-2p_i)]}{[p_i(1-p_i)]^{3}}\\&\qquad \qquad \qquad \qquad \qquad \qquad \left[{{\begin{array}{ccc} 1&{x_i}&{x_i^2} \\ {x_i}&{x_i^2}&{x_i^3} \\ \end{array}}} \right],\\ D_{13}&= D_{31}^{t} = E(l^{\prime }l^{\prime t}I^{-1}l^{\prime }) = \frac{1}{m}\sum _{i=1}^{m} \frac{E(Y_i -n_i p_i)^{3}{p}_i^{\prime 3}}{[p_i (1-p_i)]^{3}} \\&\qquad \qquad \qquad \qquad \qquad \qquad \times \left\{ {\left[{{\begin{array}{cc} 1&x_i \\ x_i&x_i^2 \\ \end{array}}}\right] I^{-1} \left[{{\begin{array}{c} 1 \\ x_i \\ \end{array}}}\right]}\right\} ,\\ D_{22}&= E\{(vechl^{\prime \prime })(vechl^{\prime \prime })^{t}\}-(vech(I)(vech(I))^{t} \\&= \frac{1}{m}\sum _{i=1}^m {E({w}_i^{\prime 2}) \left[{{\begin{array}{ccc} 1&{x_i}&{x_i^2} \\ {x_i}&{x_i^2}&{x_i^3} \\ {x_i^2}&{x_i^3}&{x_i^4} \\ \end{array}}}\right]- \left[{{\begin{array}{ccc} {I_{11}^{2}}&{I_{11} I_{12}}&{I_{11} I_{22}} \\ {I_{11} I_{12}}&{I_{12}^{2}}&{I_{12} I_{22}} \\ {I_{11} I_{22}}&{I_{12} I_{22}}&{I_{22}^{2}} \\ \end{array}}} \right]}, \end{aligned}$$

where $E({w}_i^{\prime 2})=Var(Y_i)\left[{\frac{{p}^{\prime \prime 2} +n_i^{2}{p}_i^{\prime 4}}{[p_i (1-p_i)]^{2}}-\frac{2{p}^{\prime \prime }_i ({p}_i ^{\prime 2}-2p_i {p}_i^{\prime 2})}{[p_i (1-p_i)]^{3}}+\frac{({p}_i ^{\prime 2}-2p_i {p}_i^{\prime 2})^{2}}{[p_i (1-p_i)]^{4}}} \right]$,

$$\begin{aligned} D_{23}&= D_{32} ^{t}=E\{l^{\prime t}I^{-1}l^{\prime }(vech(l^{\prime \prime }))\}+trace(I^{-1}V)(vech(I))\\&= \frac{1}{m}\sum _{i=1}^m {E(w_i^2 {w}^{\prime }_i)\left\{ {\left[{1,x_i} \right]I^{-1} \left[{{\begin{array}{c} 1 \\ x_i \\ \end{array}}}\right] \left[{{\begin{array}{c} 1 \\ x_i\\ x_i^2\\ \end{array}}}\right]}\right\} +trace(I^{-1}V) \left[{{\begin{array}{c} {I_{11}} \\ {I_{12}} \\ {I_{22}} \\ \end{array}}} \right]}, \\ \end{aligned}$$

where $E(w_i^2 {w}^{\prime }_i)=\frac{E(Y_i -n_i p_i)^{3}{p}_i ^{\prime 2}{p}^{\prime \prime }_i}{[p_i (1-p_i)]^{3}}-\frac{Var(Y_i)n_i {p}_i^{\prime 4}}{[p_i (1-p_i)]^{3}}-\frac{E(Y_i -n_i p_i)^{3}{p}_i ^{\prime 4}(1-2p_i)}{[p_i (1-p_i)]^{4}}$, and

$$\begin{aligned} D_{33}&= E(l^{\prime t}I^{-1}l^{\prime })^{2}-[trace(I^{-1}V)]^{2} =\frac{1}{m}\sum _{i=1}^m \frac{E(Y_i -n_i p_i )^{4}{p}^{\prime 4}_i}{[p_i (1-p_i)]^{4}}\\&\left\{ {\left[{1,x_i} \right]I^{-1} \left[{{\begin{array}{c} 1 \\ {x_i} \\ \end{array}}}\right]}\right\} ^{2} -[trace(I^{-1}V)]^{2}. \end{aligned}$$

Note that $D_{33}$ is a scalar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsou, TS., Chen, WC. Estimation of intra-cluster correlation coefficient via the failure of Bartlett’s second identity. Comput Stat 28, 1681–1698 (2013). https://doi.org/10.1007/s00180-012-0372-7

Download citation

Received: 07 January 2011
Accepted: 22 September 2012
Published: 11 October 2012
Issue Date: August 2013
DOI: https://doi.org/10.1007/s00180-012-0372-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation of intra-cluster correlation coefficient via the failure of Bartlett’s second identity

Abstract

Access this article

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Mixed methods research: what it is and what it could be

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimation of intra-cluster correlation coefficient via the failure of Bartlett’s second identity

Abstract

Access this article

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Mixed methods research: what it is and what it could be

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation