Efficient tests for one sample correlated binary data with applications

Shan, Guogen; Ma, Changxing

doi:10.1007/s10260-013-0251-6

Efficient tests for one sample correlated binary data with applications

Published: 17 December 2013

Volume 23, pages 175–188, (2014)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

Guogen Shan¹ &
Changxing Ma²

285 Accesses
7 Citations
Explore all metrics

Abstract

Four testing procedures are considered for testing the response rate of one sample correlated binary data with a cluster size of one or two, which often occurs in otolaryngologic and ophthalmologic studies. Although an asymptotic approach is often used for statistical inference, it is criticized for unsatisfactory type I error control in small sample settings. An alternative to the asymptotic approach is an unconditional approach. The first unconditional approach is the one based on estimation, also known as parametric bootstrap (Lee and Young in Stat Probab Lett 71(2):143–153, 2005). The other two unconditional approaches considered in this article are an approach based on maximization (Basu in J Am Stat Assoc 72(358):355–366, 1977), and an approach based on estimation and maximization (Lloyd in Biometrics 64(3):716–723, 2008a). These two unconditional approaches guarantee the test size and are generally more reliable than the asymptotic approach. We compare these four approaches in conjunction with a test proposed by Lee and Dubin (Stat Med 13(12):1241–1252, 1994) and a likelihood ratio test derived in this article, in regards to type I error rate and power for sample sizes from small to medium. An example from an otolaryngologic study is provided to illustrate the various testing procedures. The unconditional approach based on estimation and maximization using the test in Lee and Dubin (Stat Med 13(12):1241–1252, 1994) is preferable due to the power advantageous.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Randomized Block Designs: Ordinal Data

Second-order generalized estimating equations for correlated count data

Article 05 July 2015

A generalized procedure for estimating the multinomial proportions in randomized response sampling using scrambling variables

Article 18 June 2018

References

Agresti A (2002) Categorical data analysis. Wiley series in probability and statistics, 2nd edn. Wiley, Hoboken
Google Scholar
Barnard GA (1945) A new test for 2 $\times $ 2 tables. Nature 156:177
Article MATH MathSciNet Google Scholar
Basu D (1977) On the elimination of nuisance parameters. J Am Stat Assoc 72(358):355–366
Article MATH Google Scholar
Cochran WG (1977) Sampling techniques, 3rd edn. Wiley,New York
MATH Google Scholar
Corcoran C, Ryan L, Senchaudhuri P, Mehta C, Patel N, Molenberghs G (2001) An exact trend test for correlated binary data. Biometrics 57(3):941–948
Article MATH MathSciNet Google Scholar
Donner A, Klar N (1993) Confidence interval construction for effect measures arising from cluster randomization trials. J Clin Epidemiol 46(2):123–131
Article Google Scholar
Evans RJ, Forcina A (2013) Two algorithms for fitting constrained marginal models. Comput Stat Data Anal 66:1–7
Article MathSciNet Google Scholar
Fisher RA (1970) Statistical methods for research workers, 14th edn. Hafner Press, New York
Google Scholar
Jung SH, Ahn C (2000) Estimation of response probability in correlated binary data: a new approach. Drug Inf J 34:599–604
Google Scholar
Kang SH, Park SM (2000) Exact likelihood ratio test of independence of binary responses within clusters. Comput Stat Data Anal 33:15–23
Article MATH Google Scholar
Kang S-HH, Chung S-JJ, Ahn CW (2005) Exact tests for one sample correlated binary data. Biom J Biometrische Zeitschrift 47(2):188–193
Article MathSciNet Google Scholar
Lee EW, Dubin N (1994) Estimation and sample size considerations for clustered binary responses. Stat Med 13(12):1241–1252
Article Google Scholar
Lee S, Young G (2005) Parametric bootstrapping with nuisance parameters. Stat Probab Lett 71(2):143–153
Article MATH MathSciNet Google Scholar
Lloyd CJ (2008a) A new exact and more powerful unconditional test of no treatment effect from binary matched pairs. Biometrics 64(3):716–723
Article MATH MathSciNet Google Scholar
Lloyd CJ (2008b) Exact p-values for discrete models obtained by estimation and maximization. Aust N Z J Stat 50(4):329–345
Article MathSciNet Google Scholar
Lloyd CJ, Moldovan MV (2008) A more powerful exact test of noninferiority from binary matched-pairs data. Stat Med 27(18):3540–3549
Article MathSciNet Google Scholar
Mak TK (1988) Analysing Intraclass Correlation for Dichotomous Variables. J R Stat Soc Ser C (Appl Stat) 37(3):344–352
Mandel EM, Bluestone CD, Rockette HE, Blatter MM, Reisinger KS, Wucher FP, Harper J (1982) Duration of effusion after antibiotic treatment for acute otitis media: comparison of cefaclor and amoxicillin. Pediatr Infect Dis 1(5):310–316
Article Google Scholar
Qu Y, Piedmonte M, Williams G (1994) Small sample validity of latent variable models for correlated binary data. Commun Stat Simul Comput 23(1):243–269
Article MATH Google Scholar
Rosner B (1982) Statistical methods in ophthalmology: an adjustment for the intraclass correlation between eyes. Biometrics 38(1):105–114
Article Google Scholar
Shan G (2013a) A note on exact conditional and unconditional tests for Hardy-Weinberg equilibrium. Hum Hered 76(1):10–17
Article Google Scholar
Shan G (2013b) Exact unconditional testing procedures for comparing two independent Poisson rates. J Stat Comput Simul. doi:10.1080/00949655.2013.855776
Shan G (2013c) More efficient unconditional tests for exchangeable binary data with equal cluster sizes. Stat Probab Lett 83(2):644–649
Article MATH MathSciNet Google Scholar
Shan G, Ma C (2012) Unconditional tests for comparing two ordered multinomials. Stat Methods Med Res. doi:10.1177/0962280212450957
Shan G, Ma C (2013) Exact methods for testing the equality of proportions for binary clustered data from otolaryngologic studies. Stat Biopharm Res. doi:10.1080/19466315.2013.861767
Shan G, Ma C, Hutson AD, Wilding GE (2012) An efficient and exact approach for detecting trends with binary endpoints. Stat Med 31(2):155–164
Article MathSciNet Google Scholar
Shan G, Ma C, Hutson AD, Wilding GE (2013) Some tests for detecting trends based on the modified BaumgartnerWeißSchindler statistics. Comput Stat Data Anal 57(1):246–261
Article MathSciNet Google Scholar
Tang N-S, Tang M-L, Qiu S-F (2008) Testing the equality of proportions for correlated otolaryngologic data. Comput Stat Data Anal 52(7):3719–3729
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

The authors would like to thank the Editor and two referees for their valuable comments and suggestions that improved this article significantly.

Author information

Authors and Affiliations

Department of Environmental and Occupational Health, Epidemiology and Biostatistics Program, School of Community Health Sciences, University of Nevada Las Vegas, Las Vegas, NV, 89154, USA
Guogen Shan
Department of Biostatistics, University at Buffalo, 3435 Main Street, Buffalo, NY, 14214, USA
Changxing Ma

Authors

Guogen Shan
View author publications
You can also search for this author in PubMed Google Scholar
Changxing Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guogen Shan.

Appendix

Likelihood ratio test statistic $\mathbf{T}_\mathbf{LR}$. The log-likelihood is expressed as

$$\begin{aligned} l(\pi , \rho | \mathbf{{N_1}},\mathbf{{N_2}})&= log(\frac{(n-n_1)!}{N_{22}! N_{21}! N_{20}!})+N_{11}\log (\pi )+N_{10}\log (1-\pi )\\&+N_{22}\log [\pi ^2+\rho \pi (1-\pi )]+N_{21}\log [2(1-\rho )(1-\pi )]\\&+N_{20}\log [(1-\pi )^2+\rho \pi (1-\pi )]. \end{aligned}$$

Differentiating $l(\pi , \rho )$ with respect to $(\pi , \rho )$ yields the score function

$$\begin{aligned} \frac{\partial l}{\partial \pi }&= \frac{N_{11}+2N_{22}}{\pi } - \frac{N_{10}+N_{21}+2N_{20}}{1-\pi } - \frac{N_{22} \rho }{\pi (\pi +\rho -\pi \rho )}\\&+\frac{N_{20}\rho }{(1-\pi )(1-\pi +\pi \rho )},\\ \frac{\partial l}{\partial \rho }&= -\frac{N_{21}}{1-\rho }+\frac{N_{22}(1-\pi )}{\pi +\rho -\pi \rho } + \frac{N_{20}\pi }{1-\pi +\pi \rho }. \end{aligned}$$

The unrestricted MLE of $(\pi , \rho )$, denoted by $(\hat{\pi }, \hat{\rho })$ is the solution to the following equations which can be obtained by Fishing-Score method,

$$\begin{aligned} \frac{\partial l}{\partial \pi } = 0 \quad \hbox { and }\quad \frac{\partial l}{\partial \rho } = 0. \end{aligned}$$

After a lengthy algebra calculation, the $\hat{\rho }$ can be derived as a solution of a third-order polynomial

$$\begin{aligned} a \rho ^3 + b\rho ^2 + c\rho +d = 0, \end{aligned}$$

where

$$\begin{aligned} a&= N_{10}(N_{11}-N_{21})n_2,\\ b&= N_{10}^2N_{21}\!+\!N_{10}^2N_{22} +N_{10}N_{11}N_{21} +N_{10}N_{20}N_{21}+2N_{10}N_{20}N_{22} +3N_{10}N_{21}N_{22}\\&+2N_{10}N_{22}^2+N_{11}^2N_{20}+N_{11}^2N_{21}+2N_{11}N_{20}^2 +N_{11}N_{20}N_{21}+2N_{11}N_{20}N_{22}\\&-N_{11}N_{21}^2+N_{11}N_{21}N_{22}-2N_{20}^2N_{21}-2N_{20}N_{21}^2 -2N_{20}N_{21}N_{22}-N_{21}^2N_{22},\\ c&= N_{10}^2N_{21}-N_{10}N_{11}N_{20}+N_{10}N_{11}N_{21} -N_{10}N_{11}N_{22}+4N_{10}N_{20}N_{21}\\&+2N_{10}N_{20}N_{22}+N_{10}N_{21}^2+2N_{10}N_{21}N_{22} -2N_{10}N_{22}^2+N_{11}^2N_{21}-2N_{11}N_{20}^2\\&+N_{11}N_{20}N_{21}+2N_{11}N_{20}N_{22}+3N_{11}N_{21}N_{22} +4N_{20}^2N_{21}+4N_{20}^2N_{22}\\&+2N_{20}N_{21}^2+6N_{20}N_{21}N_{22}+4N_{20}N_{22}^2 +2N_{21}N_{22}^2,\\ d&= -N_{10}^2N_{22}+N_{10}N_{11}N_{21} -4N_{10}N_{20}N_{22}-N_{11}^2N_{20}+2N_{11}N_{20}N_{21}\\&-4N_{11}N_{20}N_{22}+N_{11}N_{21}^2-4N_{20}^2N_{22} -4N_{20}N_{22}^2+N_{21}^2N_{22}, \end{aligned}$$

and

$$\begin{aligned} \hat{\pi }=\frac{(N_{21}+2N_{22}-n_2\hat{\rho })\pm \sqrt{(\hat{\rho }n_2)^2+2\hat{\rho }n_2N_{21}-4N_{20}N_{22}+N_{21}^2}}{2n_2(1-\hat{\rho })}. \end{aligned}$$

We then compute the log-likelihoods for the solutions in the parameters’ space, and the parameter with the largest value is the solution. Another method may be used to derive the LR test (Evans and Forcina 2013).

Under null hypothesis $H_0: \pi =\pi _0$, the MLE of $\rho $ is given by

$$\begin{aligned} \hat{\rho }_{H_0}=-\frac{N_{21}+N_{22}-\pi _0(N_{20}+2N_{21}+3N_{22})+2n_2\pi _0^2\pm \sqrt{f}}{2\pi _0(1-\pi _0)n_2}, \end{aligned}$$

where $f= (4N_{21}n_2+(N_{20}-N_{22})^2)\pi _0^2 + 2(N_{20}(N_{21}+2N_{22})-n_2(2N_{21}+N_{22}))\pi _0+(N_{21}+N_{22})^2$ and only keep the solution in the parameter space. When both $\hat{\rho }_{H_0}$ are in the parameter space, the one with the larger null log likelihood is the solution.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shan, G., Ma, C. Efficient tests for one sample correlated binary data with applications. Stat Methods Appl 23, 175–188 (2014). https://doi.org/10.1007/s10260-013-0251-6

Download citation

Accepted: 06 December 2013
Published: 17 December 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10260-013-0251-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient tests for one sample correlated binary data with applications

Abstract

Access this article

Similar content being viewed by others

Randomized Block Designs: Ordinal Data

Second-order generalized estimating equations for correlated count data

A generalized procedure for estimating the multinomial proportions in randomized response sampling using scrambling variables

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient tests for one sample correlated binary data with applications

Abstract

Access this article

Similar content being viewed by others

Randomized Block Designs: Ordinal Data

Second-order generalized estimating equations for correlated count data

A generalized procedure for estimating the multinomial proportions in randomized response sampling using scrambling variables

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation