Abstract
Assessment of annotation reliability is typically undertaken as a quality assurance measure in order to provide a sound fulcrum for establishing the answers to research questions that require the annotated data. We argue that the assessment of inter-rater reliability can provide a source of information more directly related to the background research. The discussion is anchored in the analysis of conversational dominance in the MULTISIMO corpus. Other research has explored factors in dialogue (e.g. big-five personality traits and conversational style of participants) as predictors of independently perceived dominance. Rather than assessing the contributions of experimental factors to perceived dominance as a unitary aggregated response variable following verification of an acceptable level of inter-rater reliability, we use the variability in inter-annotator agreement as a response variable. We argue the general applicability of this in exploring research hypotheses that focus on qualities assessed with multiple annotations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
With complete compliance with the terms of consent provided by the participants, 18 of these dialogues are represented in the version of the corpus publicly available.
- 2.
This survey was conducted independently, and rankings are reported in a database related to the game, http://familyfeudfriends.arjdesigns.com//, last accessed 11.05.2018.
- 3.
Correctness of the answers and their rankings is determined by responses to an independent survey of sample of 100 people.
- 4.
The first two rows of this table are provided for sake of completeness—it does not appear rational to propose a Likert scale with only one point, and if the experimental question required only two points, it seems unlikely that one would approach the binary judgement using a Likert scale.
References
Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)
Beigman Klebanov, B., Beigman, E.: From annotator agreement to noise models. Comput. Linguist. 35(4), 495–503 (2009)
Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)
Geertzen, J., Bunt, H.: Measuring annotator agreement in a complex hierarchical dialogue act annotation scheme. In: Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, pp. 126–133. Association for Computational Linguistics (2006)
Koutsombogera, M., Costello, R., Vogel, C.: Quantifying dominance in the multisimo corpus. In: Baranyi P., Esposito A., Földesi P., Mihálydeák T. (eds.) 9th IEEE International Conference on Cognitive Infocommunications (CogInfoCom 2018), pp. 147–152. IEEE (2018)
Koutsombogera, M., Vogel, C.: Modeling collaborative multimodal behavior in group dialogues: The MULTISIMO corpus. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Paris, France (2018)
Krippendorff, K.: Content Analysis: An Introduction to its Methodology. Sage, Thousand Oaks, CA (2004)
Krippendorff, K.: Reliability in content analysis: some common misconceptions and recommendations. Hum. Commun. Res. 30(3), 411–433 (2004)
Pham-Gia, T., Hung, T.L.: The mean and median absolute deviations. Math. Comput. Model. 34, 921–936 (2001)
Reidsma, D., Carletta, J.: Reliability measurement without limits. Comput. Linguist. 34(3), 319–326 (2008)
Veronis, J.: A study of polysemy judgements and inter-annotator agreement. In: Programme and Advanced Papers of the Senseval Workshop, Herstmonceux (1998). http://www.itri.brighton.ac.uk/events/senseval/ARCHIVE/PROCEEDINGS/interannotator.ps. URL last verified Feb 2019
Acknowledgements
The research leading to these results has received funding from (a) the ADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme (Grant 13/RC/2106) and co-funded under the European Regional Development Fund, and (b) the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 701621 (MULTISIMO).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Vogel, C., Koutsombogera, M., Costello, R. (2020). Analyzing Likert Scale Inter-annotator Disagreement. In: Esposito, A., Faundez-Zanuy, M., Morabito, F., Pasero, E. (eds) Neural Approaches to Dynamics of Signal Exchanges. Smart Innovation, Systems and Technologies, vol 151. Springer, Singapore. https://doi.org/10.1007/978-981-13-8950-4_34
Download citation
DOI: https://doi.org/10.1007/978-981-13-8950-4_34
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8949-8
Online ISBN: 978-981-13-8950-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)