Skip to main content

On the Comparability of Relialibility Measures: Bifurcation Analysis of Two Measures in the Case of Dichotomous Ratings

  • Conference paper
From Data and Information Analysis to Knowledge Engineering

Abstract

The problem of analysing interrater — agreement and — reliability is known both in human decision making and in machine interaction. Several measures have been developped in the last 100 years for this purpose, with Cohen’s Kappacoefficient to be the most popular one. Due to methodological considerations, the validity of kappa-type measures for interrater agreement has been discussed in a variety of papers. However, a global comparison of properties of these measures is currently still deficient. In our approach, we constructed an integral measure to evaluate the differences between two reliability measures for dichotomous ratings. Additionally, we studied bifurcation properties of the difference of these measures to quantify areas of minimal differences. From the methodological point of view, our integral-measure can also be used to construct other measures for interrater agreement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 159.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • AICKIN, M. (1990): Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen’s kappa. Biometrics 46(2),293–302.

    MATH  MathSciNet  Google Scholar 

  • COHEN, J. (1960): A coefficient of agreement for nominal scales. Education and Psychological Measurement 20, 37–46.

    Google Scholar 

  • DJOKI, S., SUCCI, G., PEDRYCZ W. and MINTCHEV, M. (2001): Meta Analysis-A Method of Combining Empirical Results and its Application in Object-Oriented Software Systems. In: Y. Wang, S. Patel and R. Johnston (Eds.): Proceedings of the OOIS’01. Springer, Berlin, 103–112.

    Google Scholar 

  • ESCUDERO, G., MARQUEZ, L. and RIGAU, G. (2000): A Comparison between Supervised Learning Algorithms for Word Sense Disambiguation. Proceedings of CoNLL-2000 and LLL-2000. Lisbon, Portugal, 31–36.

    Google Scholar 

  • FEINSTEIN, A.R. and CHICHETTI, D.V. (1990): High agreement but low Kappa: I. The problem of two paradoxes. Journal of Clinical Epidemiology 43, 543–549.

    Google Scholar 

  • GREINER, B., DOHLE, J., SCHULZE, W., OSTERMANN, T. and HAMEL, J. (1999): The visual assignment of pedographic examination results to anatomical reference areas of the forefoot. Foot and Ankle Surgery 5, 219–226.

    Article  Google Scholar 

  • KLAUER, K.C.(1996): Urteilerübereinstimmung bei dichotomen Kategoriensystemen. Diagnostica 42, 101–118.

    Google Scholar 

  • MAYER, H., NONN, C., OSTERBRINK, J. and EVERS, G.C. (2004): Qualitatskriterien von Assessment-instrumenten-Cohen’s Kappa als Mass der Interrater-Reliabilitat (Teil 1). Pflege 17, 36–46.

    Google Scholar 

  • MILLER, J.(1999): Can Results from Software Engineering Experiments be Safely Combined? IEEE Metrics 1999, 152–158.

    Google Scholar 

  • MILLER, J. (2000): Applying meta-analytical procedures to software engineering experiments. Journal of Systems and Software 54, 29–39.

    Article  Google Scholar 

  • OSTERMANN, T., BEER, A-M., and MATTHIESSEN, P.F.(2001): Evaluation stationärer naturheilkundlicher Behandlung — Konzeption und erste Ergebnisse des Blankensteiner Modells. Qualitätsmanagement in Klinik und Praxix 9(4), 104–111.

    Google Scholar 

  • OSTERMANN, T. and SCHUSTER, R.(2005): On the comparability and construction of reliability measures for dichotomous ratings — a unified algebraic approach. Methodology, Submitted

    Google Scholar 

  • OSTERMANN, T., VERMAASEN, W. and MATTHIESSEN, P.F. (2005): Evaluation des Auswahlverfahrens von Medizinstudenten an der Universität Witten/Herdecke — Teil I: Inter-Rater-Reliabilität des Interviewverfahrens. GMS Z Med Ausbild 22(1):Doc13.

    Google Scholar 

  • ROSENTHAL, R. (1991):Meta-analytic procedures for social research. Sage, Beverly Hills.

    Google Scholar 

  • SCHUSTER, R.(1995): Grundkurs Biomathematik, Teubner-Verlag, Stuttgart.

    Google Scholar 

  • SCOTT W.A. (1954): Reliability of Content Analysis: The Case of Nominal Scale Coding. Public Opinion Quarterly 19, 321–25.

    Google Scholar 

  • SQUIRE, D.M. and PUN, T. (1998): Assessing Agreement Between Human and Machine Clusterings of Image Databases. Pattern Recognition, 31(12): 1905–1919.

    Google Scholar 

  • YULE, G.U. (1911): On the methods of measuring the association between two attributes. Journal of the Royal Statistical Society 75: 579–652.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer Berlin · Heidelberg

About this paper

Cite this paper

Ostermann, T., Schuster, R. (2006). On the Comparability of Relialibility Measures: Bifurcation Analysis of Two Measures in the Case of Dichotomous Ratings. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_23

Download citation

Publish with us

Policies and ethics