On the Comparability of Relialibility Measures: Bifurcation Analysis of Two Measures in the Case of Dichotomous Ratings

Ostermann, Thomas; Schuster, Reinhard

doi:10.1007/3-540-31314-1_23

Thomas Ostermann²² &
Reinhard Schuster^23,24

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2180 Accesses

Abstract

The problem of analysing interrater — agreement and — reliability is known both in human decision making and in machine interaction. Several measures have been developped in the last 100 years for this purpose, with Cohen’s Kappacoefficient to be the most popular one. Due to methodological considerations, the validity of kappa-type measures for interrater agreement has been discussed in a variety of papers. However, a global comparison of properties of these measures is currently still deficient. In our approach, we constructed an integral measure to evaluate the differences between two reliability measures for dichotomous ratings. Additionally, we studied bifurcation properties of the difference of these measures to quantify areas of minimal differences. From the methodological point of view, our integral-measure can also be used to construct other measures for interrater agreement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 159.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AICKIN, M. (1990): Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen’s kappa. Biometrics 46(2),293–302.
MATH MathSciNet Google Scholar
COHEN, J. (1960): A coefficient of agreement for nominal scales. Education and Psychological Measurement 20, 37–46.
Google Scholar
DJOKI, S., SUCCI, G., PEDRYCZ W. and MINTCHEV, M. (2001): Meta Analysis-A Method of Combining Empirical Results and its Application in Object-Oriented Software Systems. In: Y. Wang, S. Patel and R. Johnston (Eds.): Proceedings of the OOIS’01. Springer, Berlin, 103–112.
Google Scholar
ESCUDERO, G., MARQUEZ, L. and RIGAU, G. (2000): A Comparison between Supervised Learning Algorithms for Word Sense Disambiguation. Proceedings of CoNLL-2000 and LLL-2000. Lisbon, Portugal, 31–36.
Google Scholar
FEINSTEIN, A.R. and CHICHETTI, D.V. (1990): High agreement but low Kappa: I. The problem of two paradoxes. Journal of Clinical Epidemiology 43, 543–549.
Google Scholar
GREINER, B., DOHLE, J., SCHULZE, W., OSTERMANN, T. and HAMEL, J. (1999): The visual assignment of pedographic examination results to anatomical reference areas of the forefoot. Foot and Ankle Surgery 5, 219–226.
Article Google Scholar
KLAUER, K.C.(1996): Urteilerübereinstimmung bei dichotomen Kategoriensystemen. Diagnostica 42, 101–118.
Google Scholar
MAYER, H., NONN, C., OSTERBRINK, J. and EVERS, G.C. (2004): Qualitatskriterien von Assessment-instrumenten-Cohen’s Kappa als Mass der Interrater-Reliabilitat (Teil 1). Pflege 17, 36–46.
Google Scholar
MILLER, J.(1999): Can Results from Software Engineering Experiments be Safely Combined? IEEE Metrics 1999, 152–158.
Google Scholar
MILLER, J. (2000): Applying meta-analytical procedures to software engineering experiments. Journal of Systems and Software 54, 29–39.
Article Google Scholar
OSTERMANN, T., BEER, A-M., and MATTHIESSEN, P.F.(2001): Evaluation stationärer naturheilkundlicher Behandlung — Konzeption und erste Ergebnisse des Blankensteiner Modells. Qualitätsmanagement in Klinik und Praxix 9(4), 104–111.
Google Scholar
OSTERMANN, T. and SCHUSTER, R.(2005): On the comparability and construction of reliability measures for dichotomous ratings — a unified algebraic approach. Methodology, Submitted
Google Scholar
OSTERMANN, T., VERMAASEN, W. and MATTHIESSEN, P.F. (2005): Evaluation des Auswahlverfahrens von Medizinstudenten an der Universität Witten/Herdecke — Teil I: Inter-Rater-Reliabilität des Interviewverfahrens. GMS Z Med Ausbild 22(1):Doc13.
Google Scholar
ROSENTHAL, R. (1991):Meta-analytic procedures for social research. Sage, Beverly Hills.
Google Scholar
SCHUSTER, R.(1995): Grundkurs Biomathematik, Teubner-Verlag, Stuttgart.
Google Scholar
SCOTT W.A. (1954): Reliability of Content Analysis: The Case of Nominal Scale Coding. Public Opinion Quarterly 19, 321–25.
Google Scholar
SQUIRE, D.M. and PUN, T. (1998): Assessing Agreement Between Human and Machine Clusterings of Image Databases. Pattern Recognition, 31(12): 1905–1919.
Google Scholar
YULE, G.U. (1911): On the methods of measuring the association between two attributes. Journal of the Royal Statistical Society 75: 579–652.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mediacal Theory and Complementary Medicine, University of Witten/Herdecke, Gerhard-Kienle-Weg 4, 58313, Herdecke, Germany
Thomas Ostermann
Institute of Mathematics, University of Luebeck, Wallstr.40, 23560, Luebeck, Germany
Reinhard Schuster
North German Biometrical Centre, Medical Advisitory Board of the Statutory Health Insurance, Katharinenstr 11a, 23554, Luebeck, Germany
Reinhard Schuster

Authors

Thomas Ostermann
View author publications
You can also search for this author in PubMed Google Scholar
Reinhard Schuster
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Technische und Betriebliche Informationssysteme, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Myra Spiliopoulou
Institut für Wissens- und Sprachverarbeitung, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Rudolf Kruse , Christian Borgelt & Andreas Nürnberger , &
Institut für Entscheidungstheorie und Unternehmensforschung, Universität Karlsruhe (TH), 76128, Karlsruhe
Wolfgang Gaul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ostermann, T., Schuster, R. (2006). On the Comparability of Relialibility Measures: Bifurcation Analysis of Two Measures in the Case of Dichotomous Ratings. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_23

Download citation

DOI: https://doi.org/10.1007/3-540-31314-1_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31313-7
Online ISBN: 978-3-540-31314-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics