Abstract
This article studies the robustness of confidence interval construction for the intraclass kappa statistic based on a dichotomous response when the assumption of marginal homogeneity across two raters is violated. Two methods of construction are considered: the goodness-of-fit approach and the modified Wald method. Evaluation was done by exact calculation of the confidence interval coverage produced by these approaches. It was found that under mild departures from marginal homogeneity (differences in rater success rates of \(<\)10 %), the goodness- of-fit approach can be recommended. Moreover, under these same conditions, Cohen’s kappa tends to be less biased as a point estimator than the intraclass kappa statistic.
Similar content being viewed by others
References
Blackman N, Koval J (1993) Estimating rater agreement in \(2\times 2\) tables: correction for chance and intraclass correlation. Appl Psychol Meas 17:211–233
Bloch DA, Kraemer HC (1989) \(2 \times 2\) coefficients of agreement or association. Biometrics 45:269–287
Burton A, Altman D, Royston P, Holder R (2006) The design of simulation studies in medical statistics. Stat Med 25:4279–4292
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20:37–46
Cornfield J (1956) A statistical problem arising from retrospective studies. In: Neyman J (ed) Proceedings of the third Berkeley symposium on mathematical statistics and probability, vol 4, pp 135–148
Donner A, Eliasziw M (1992) A goodness-of-fit approach to inference procedures for the kappa statistic: confidence interval construction, significance-testing and sample size estimation. Stat Med 11:1511–1519
Donner A, Zou G (2002) Interval estimation for a difference between intraclass kappa statistics. Biometrics 58:209–215
Flack V (1987) Confidence intervals for the inter-rater agreement measure kappa. Commun Stat Theory Methods 16:953–968
Fleiss J (1975) Measuring agreement between two judges on the presence or absence of a trait. Biometrics 31:651–659
Hale C, Fleiss J (1993) Interval estimation under two study designs for kappa with binary classifications. Biometrics 49:523–534
Koval J, Blackman N (1996) Estimators of kappa-exact small sample properties. J Stat Comput Simul 55:315–336
McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12:153–157
Scott W (1955) Reliability of content analysis; the case of nominal scale coding. Public Opin Q 19:321–325
Warrens MJ (2010) Inequalities between kappa and kappa-like statistics for k \(\times \) k tables. Psychometrika 75:176–185
Zwick R (1988) Another look at interrater agreement. Pyschol Bull 103:374–378
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Parpia, S., Koval, J.J. & Donner, A. Evaluation of confidence intervals for the kappa statistic when the assumption of marginal homogeneity is violated. Comput Stat 28, 2709–2718 (2013). https://doi.org/10.1007/s00180-013-0424-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-013-0424-7