Generalized Cohen’s Kappa: A Novel Inter-rater Reliability Metric for Non-mutually Exclusive Categories

Figueroa, Andrea; Ghosh, Sourojit; Aragon, Cecilia

doi:10.1007/978-3-031-35132-7_2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14015))

Included in the following conference series:

International Conference on Human-Computer Interaction

957 Accesses
3 Citations

Abstract

Qualitative coding of large datasets has been a valuable tool for qualitative researchers. In terms of inter-rater reliability, existing metrics have not evolved to fit current approaches, presenting a variety of restrictions. In this paper, we propose Generalized Cohen’s kappa, a novel IRR metric that can be applied in a variety of qualitative coding situations, such as variable number of coders, texts, and non-mutually exclusive categories. We show that under the preconditions for Cohen’s kappa, GCK performs very similarly, thus demonstrating their interchangeability. We then extend GCK to the aforementioned situations and demonstrate it to be stable under different permutations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andrés, A.M., Marzo, P.F.: Delta: a new measure of agreement between two raters. Br. J. Math. Stat. Psychol. 57(1), 1–19 (2004)
Article MathSciNet Google Scholar
Bazeley, P.: Issues in mixing qualitative and quantitative approaches to research. Appl. Qual. Methods Mark. Manag. Res. 141, 156 (2004)
Google Scholar
Brooks, M., et al.: Statistical affect detection in collaborative chat. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 317–328 (2013)
Google Scholar
Charmaz, K.: Constructing Grounded Theory: A Practical Guide Through Qualitative Analysis. Sage, London (2006)
Google Scholar
Cicchetti, D.V., Shoinralter, D., Tyrer, P.J.: The effect of number of rating scale categories on levels of interrater reliability: a monte carlo investigation. Appl. Psychol. Meas. 9(1), 31–36 (1985)
Article Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Article Google Scholar
Eagan, B., Brohinsky, J., Wang, J., Shaffer, D.W.: Testing the reliability of inter-rater reliability. In: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, pp. 454–461 (2020)
Google Scholar
Eagan, B.R., Rogers, B., Serlin, R., Ruis, A.R., Arastoopour Irgens, G., Shaffer, D.W.: Can we rely on IRR? testing the assumptions of inter-rater reliability. In: International Conference on Computer Supported Collaborative Learning, pp. 529–532 (2017)
Google Scholar
Epstein, M.H., Harniss, M.K., Pearson, N., Ryser, G.: The behavioral and emotional rating scale: test-retest and inter-rater reliability. J. Child Fam. Stud. 8(3), 319–327 (1999)
Article Google Scholar
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971)
Article Google Scholar
Fleiss, J.L.: Measuring agreement between two judges on the presence or absence of a trait. Biometrics 31, 651–659 (1975)
Google Scholar
Ghosh, S., Figueroa, A.: Establishing TikTo as a platform for informal learning: evidence from mixed-methods analysis of creators and viewers. In: Proceedings of the 56th Hawaii International Conference on System Sciences, pp. 2431–2440 (2023)
Google Scholar
Ghosh, S., Froelich, N., Aragon, C.: “i love you, my dear friend": Analyzing the role of emotions in the building of friendships in online fanfiction communities. In: Proceedings of the 15th International Conference on Social Computing and Social Media in the context of the 25th International Conference on Human-Computer Interaction (HCI International). Springer (2023)
Google Scholar
Gisev, N., Bell, J.S., Chen, T.F.: Interrater agreement and interrater reliability: key concepts, approaches, and applications. Res. Social Adm. Pharm. 9(3), 330–338 (2013)
Article Google Scholar
Gwet, K.: Kappa statistic is not satisfactory for assessing the extent of agreement between raters. Stat. Methods Inter-Rater Reliab. Assess. 1(6), 1–6 (2002)
Google Scholar
Gwet, K.L.: Computing inter-rater reliability and its variance in the presence of high agreement. Br. J. Math. Stat. Psychol. 61(1), 29–48 (2008)
Article MathSciNet Google Scholar
Kirilenko, A.P., Stepchenkova, S.: Inter-coder agreement in one-to-many classification: fuzzy kappa. PLoS ONE 11(3), e0149787 (2016)
Article Google Scholar
Krippendorff, K.: Computing krippendorff’s alpha-reliability (2011)
Google Scholar
McDonald, N., Schoenebeck, S., Forte, A.: Reliability and inter-rater reliability in qualitative research: Norms and guidelines for CSCW and HCI practice. In: Proceedings of the ACM on Human-Computer Interaction, vol. 3 (CSCW), pp. 1–23 (2019)
Google Scholar
Owen, A.: Monte Carlo Theory. Methods and Examples, Stanford (2013)
Google Scholar
Scott, W.A.: Reliability of content analysis: the case of nominal scale coding. Public opinion quarterly pp. 321–325 (1955)
Google Scholar
Tinsley, H.E., Weiss, D.J.: Interrater reliability and agreement. In: Handbook of applied multivariate statistics and mathematical modeling, pp. 95–124. Elsevier (2000)
Google Scholar
Uebersax, J.S.: Diversity of decision-making models and the measurement of interrater agreement. Psychol. Bull. 101(1), 140 (1987)
Article Google Scholar
Yin, K., Aragon, C., Evans, S., Davis, K.: Where no one has gone before: A meta-dataset of the world’s largest fanfiction repository. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 6106–6110 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Washington, Seattle, USA
Andrea Figueroa, Sourojit Ghosh & Cecilia Aragon

Authors

Andrea Figueroa
View author publications
You can also search for this author in PubMed Google Scholar
Sourojit Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Cecilia Aragon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sourojit Ghosh .

Editor information

Editors and Affiliations

Tokyo City University, Tokyo, Japan
Hirohiko Mori
Tokyo University of Science, Tokyo, Japan
Yumi Asahi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Figueroa, A., Ghosh, S., Aragon, C. (2023). Generalized Cohen’s Kappa: A Novel Inter-rater Reliability Metric for Non-mutually Exclusive Categories. In: Mori, H., Asahi, Y. (eds) Human Interface and the Management of Information. HCII 2023. Lecture Notes in Computer Science, vol 14015. Springer, Cham. https://doi.org/10.1007/978-3-031-35132-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-35132-7_2
Published: 09 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35131-0
Online ISBN: 978-3-031-35132-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Generalized Cohen’s Kappa: A Novel Inter-rater Reliability Metric for Non-mutually Exclusive Categories