Abstract
The problem of analysing interrater — agreement and — reliability is known both in human decision making and in machine interaction. Several measures have been developped in the last 100 years for this purpose, with Cohen’s Kappacoefficient to be the most popular one. Due to methodological considerations, the validity of kappa-type measures for interrater agreement has been discussed in a variety of papers. However, a global comparison of properties of these measures is currently still deficient. In our approach, we constructed an integral measure to evaluate the differences between two reliability measures for dichotomous ratings. Additionally, we studied bifurcation properties of the difference of these measures to quantify areas of minimal differences. From the methodological point of view, our integral-measure can also be used to construct other measures for interrater agreement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AICKIN, M. (1990): Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen’s kappa. Biometrics 46(2),293–302.
COHEN, J. (1960): A coefficient of agreement for nominal scales. Education and Psychological Measurement 20, 37–46.
DJOKI, S., SUCCI, G., PEDRYCZ W. and MINTCHEV, M. (2001): Meta Analysis-A Method of Combining Empirical Results and its Application in Object-Oriented Software Systems. In: Y. Wang, S. Patel and R. Johnston (Eds.): Proceedings of the OOIS’01. Springer, Berlin, 103–112.
ESCUDERO, G., MARQUEZ, L. and RIGAU, G. (2000): A Comparison between Supervised Learning Algorithms for Word Sense Disambiguation. Proceedings of CoNLL-2000 and LLL-2000. Lisbon, Portugal, 31–36.
FEINSTEIN, A.R. and CHICHETTI, D.V. (1990): High agreement but low Kappa: I. The problem of two paradoxes. Journal of Clinical Epidemiology 43, 543–549.
GREINER, B., DOHLE, J., SCHULZE, W., OSTERMANN, T. and HAMEL, J. (1999): The visual assignment of pedographic examination results to anatomical reference areas of the forefoot. Foot and Ankle Surgery 5, 219–226.
KLAUER, K.C.(1996): Urteilerübereinstimmung bei dichotomen Kategoriensystemen. Diagnostica 42, 101–118.
MAYER, H., NONN, C., OSTERBRINK, J. and EVERS, G.C. (2004): Qualitatskriterien von Assessment-instrumenten-Cohen’s Kappa als Mass der Interrater-Reliabilitat (Teil 1). Pflege 17, 36–46.
MILLER, J.(1999): Can Results from Software Engineering Experiments be Safely Combined? IEEE Metrics 1999, 152–158.
MILLER, J. (2000): Applying meta-analytical procedures to software engineering experiments. Journal of Systems and Software 54, 29–39.
OSTERMANN, T., BEER, A-M., and MATTHIESSEN, P.F.(2001): Evaluation stationärer naturheilkundlicher Behandlung — Konzeption und erste Ergebnisse des Blankensteiner Modells. Qualitätsmanagement in Klinik und Praxix 9(4), 104–111.
OSTERMANN, T. and SCHUSTER, R.(2005): On the comparability and construction of reliability measures for dichotomous ratings — a unified algebraic approach. Methodology, Submitted
OSTERMANN, T., VERMAASEN, W. and MATTHIESSEN, P.F. (2005): Evaluation des Auswahlverfahrens von Medizinstudenten an der Universität Witten/Herdecke — Teil I: Inter-Rater-Reliabilität des Interviewverfahrens. GMS Z Med Ausbild 22(1):Doc13.
ROSENTHAL, R. (1991):Meta-analytic procedures for social research. Sage, Beverly Hills.
SCHUSTER, R.(1995): Grundkurs Biomathematik, Teubner-Verlag, Stuttgart.
SCOTT W.A. (1954): Reliability of Content Analysis: The Case of Nominal Scale Coding. Public Opinion Quarterly 19, 321–25.
SQUIRE, D.M. and PUN, T. (1998): Assessing Agreement Between Human and Machine Clusterings of Image Databases. Pattern Recognition, 31(12): 1905–1919.
YULE, G.U. (1911): On the methods of measuring the association between two attributes. Journal of the Royal Statistical Society 75: 579–652.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Berlin · Heidelberg
About this paper
Cite this paper
Ostermann, T., Schuster, R. (2006). On the Comparability of Relialibility Measures: Bifurcation Analysis of Two Measures in the Case of Dichotomous Ratings. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_23
Download citation
DOI: https://doi.org/10.1007/3-540-31314-1_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31313-7
Online ISBN: 978-3-540-31314-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)