Abstract
Supervised machine learning, in particular in Natural Language Processing, is based on the creation of high-quality gold standard datasets for training and benchmarking. The de-facto standard annotation methodologies work well for traditionally relevant tasks in Computational Linguistics. However, critical issues are surfacing when applying old techniques to the study of highly subjective phenomena such as irony and sarcasm, or abusive and offensive language. This paper calls for a paradigm shift, away from monolithic, majority-aggregated gold standards, and towards an inclusive framework that preserves the personal opinions and culturally-driven perspectives of the annotators. New training sets and supervised machine learning techniques will have to be adapted in order to create fair, inclusive, and ultimately more informed models of subjective semantic and pragmatic phenomena. The arguments are backed by a synthetic experiment showing the lack of correlation between the difficulty of an annotation task, its degree of subjectivity, and the quality of the predictions of a supervised classifier trained on the resulting data. A further experiment on real data highlights the beneficial impact of the proposed methodologies in terms of explainability of perspective-aware hate speech detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The metaphor refers to the ideal spectrum often used in linguistics, where phenomena of natural language are organized on a scale roughly covering, in order: morphology, syntax, semantics, pragmatics.
- 2.
- 3.
- 4.
Whether this price is fair has been debated for some years now [7].
- 5.
- 6.
- 7.
- 8.
The ideas presented in this position paper are also collected and organized around the online initiative The Non-aggregation Manifesto: https://valeriobasile.github.io/manifesto/.
- 9.
References
Akhtar, S., Basile, V., Patti, V.: A new measure of polarization in the annotation of hate speech. In: Alviano, M., Greco, G., Scarcello, F. (eds.) AI*IA 2019. LNCS (LNAI), vol. 11946, pp. 588–603. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35166-3_41
Akhtar, S., Basile, V., Patti, V.: Modeling annotator perspective and polarized opinions to improve hate speech detection. In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, vol. 8, no. 1, pp. 151–154, October 2020. https://ojs.aaai.org/index.php/HCOMP/article/view/7473
Aroyo, L., Welty, C.: Truth is a lie: crowd truth and the seven myths of human annotation. AI Mag. 36(1), 15–24 (2015)
Checco, A., Roitero, K., Maddalena, E., Mizzaro, S., Demartini, G.: Let’s agree to disagree: fixing agreement measures for crowdsourcing. In: Proceedings of the Fifth AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2017, 23–26 October 2017, Québec City, Québec, Canada, pp. 11–20. AAAI Press (2017)
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th International AAAI Conference on Web and Social Media, ICWSM 2017, pp. 512–515 (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), Minneapolis, MN, pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/N19-1423. https://www.aclweb.org/anthology/N19-1423
Felstiner, A.: Working the crowd: employment and labor law in the crowdsourcing industry. Berkeley J. Employ. Lab. Law 32, 143 (2011)
Hovy, D., Berg-Kirkpatrick, T., Vaswani, A., Hovy, E.: Learning whom to trust with MACE. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, pp. 1120–1130. Association for Computational Linguistics (2013). https://www.aclweb.org/anthology/N13-1132
Klenner, M., Göhring, A., Amsler, M.: Harmonization sometimes harms. In: Ebling, S., Tuggener, D., Hürlimann, M., Cieliebak, M., Volk, M. (eds.) Proceedings of the 5th Swiss Text Analytics Conference and the 16th Conference on Natural Language Processing, SwissText/KONVENS 2020, Zurich, Switzerland, 23–25 June 2020 [Online Only]. CEUR Workshop Proceedings, vol. 2624. CEUR-WS.org (2020). http://ceur-ws.org/Vol-2624/paper10.pdf
Powers, D.M.W.: The problem with kappa. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp. 345–355. Association for Computational Linguistics, April 2012. https://www.aclweb.org/anthology/E12-1035
Uma, A., Fornaciari, T., Hovy, D., Paun, S., Plank, B., Poesio, M.: A case for soft loss functions. In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, vol. 8, no. 1, pp. 173–177, October 2020. https://ojs.aaai.org/index.php/HCOMP/article/view/7478
Waseem, Z.: Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, Austin, Texas, pp. 138–142. Association for Computational Linguistics, November 2016. https://doi.org/10.18653/v1/W16-5618. https://www.aclweb.org/anthology/W16-5618
Acknowledgments
The author would like to express his gratitude to the anonymous reviewers of AIxIA, whose comments greatly contributed to improving this work for the final version. The author would also like to thank Thomas Davidson and his colleagues, for kindly and promptly provide the non-aggregated version of their corpus. This work is partially funded by the project “Be Positive!” (under the 2019 “Google.org Impact Challenge on Safety” call).Footnote 9
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Basile, V. (2021). It’s the End of the Gold Standard as We Know It. In: Baldoni, M., Bandini, S. (eds) AIxIA 2020 – Advances in Artificial Intelligence. AIxIA 2020. Lecture Notes in Computer Science(), vol 12414. Springer, Cham. https://doi.org/10.1007/978-3-030-77091-4_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-77091-4_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77090-7
Online ISBN: 978-3-030-77091-4
eBook Packages: Computer ScienceComputer Science (R0)