It’s the End of the Gold Standard as We Know It

Basile, Valerio

doi:10.1007/978-3-030-77091-4_26

Valerio Basile ORCID: orcid.org/0000-0001-8110-6832¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12414))

Included in the following conference series:

International Conference of the Italian Association for Artificial Intelligence

870 Accesses
2 Citations

Abstract

Supervised machine learning, in particular in Natural Language Processing, is based on the creation of high-quality gold standard datasets for training and benchmarking. The de-facto standard annotation methodologies work well for traditionally relevant tasks in Computational Linguistics. However, critical issues are surfacing when applying old techniques to the study of highly subjective phenomena such as irony and sarcasm, or abusive and offensive language. This paper calls for a paradigm shift, away from monolithic, majority-aggregated gold standards, and towards an inclusive framework that preserves the personal opinions and culturally-driven perspectives of the annotators. New training sets and supervised machine learning techniques will have to be adapted in order to create fair, inclusive, and ultimately more informed models of subjective semantic and pragmatic phenomena. The arguments are backed by a synthetic experiment showing the lack of correlation between the difficulty of an annotation task, its degree of subjectivity, and the quality of the predictions of a supervised classifier trained on the resulting data. A further experiment on real data highlights the beneficial impact of the proposed methodologies in terms of explainability of perspective-aware hate speech detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The metaphor refers to the ideal spectrum often used in linguistics, where phenomena of natural language are organized on a scale roughly covering, in order: morphology, syntax, semantics, pragmatics.
2.
https://www.mturk.com/.
3.
https://appen.com/.
4.
Whether this price is fair has been debated for some years now [7].
5.
https://en.wikipedia.org/wiki/Gold_standard.
6.
https://github.com/CyberZHG/keras-bert.
7.
https://sites.google.com/view/semeval2021-task12/home.
8.
The ideas presented in this position paper are also collected and organized around the online initiative The Non-aggregation Manifesto: https://valeriobasile.github.io/manifesto/.
9.
https://impactchallenge.withgoogle.com/safety2019.

References

Akhtar, S., Basile, V., Patti, V.: A new measure of polarization in the annotation of hate speech. In: Alviano, M., Greco, G., Scarcello, F. (eds.) AI*IA 2019. LNCS (LNAI), vol. 11946, pp. 588–603. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35166-3_41
Chapter Google Scholar
Akhtar, S., Basile, V., Patti, V.: Modeling annotator perspective and polarized opinions to improve hate speech detection. In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, vol. 8, no. 1, pp. 151–154, October 2020. https://ojs.aaai.org/index.php/HCOMP/article/view/7473
Aroyo, L., Welty, C.: Truth is a lie: crowd truth and the seven myths of human annotation. AI Mag. 36(1), 15–24 (2015)
Google Scholar
Checco, A., Roitero, K., Maddalena, E., Mizzaro, S., Demartini, G.: Let’s agree to disagree: fixing agreement measures for crowdsourcing. In: Proceedings of the Fifth AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2017, 23–26 October 2017, Québec City, Québec, Canada, pp. 11–20. AAAI Press (2017)
Google Scholar
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th International AAAI Conference on Web and Social Media, ICWSM 2017, pp. 512–515 (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), Minneapolis, MN, pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/N19-1423. https://www.aclweb.org/anthology/N19-1423
Felstiner, A.: Working the crowd: employment and labor law in the crowdsourcing industry. Berkeley J. Employ. Lab. Law 32, 143 (2011)
Google Scholar
Hovy, D., Berg-Kirkpatrick, T., Vaswani, A., Hovy, E.: Learning whom to trust with MACE. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, pp. 1120–1130. Association for Computational Linguistics (2013). https://www.aclweb.org/anthology/N13-1132
Klenner, M., Göhring, A., Amsler, M.: Harmonization sometimes harms. In: Ebling, S., Tuggener, D., Hürlimann, M., Cieliebak, M., Volk, M. (eds.) Proceedings of the 5th Swiss Text Analytics Conference and the 16th Conference on Natural Language Processing, SwissText/KONVENS 2020, Zurich, Switzerland, 23–25 June 2020 [Online Only]. CEUR Workshop Proceedings, vol. 2624. CEUR-WS.org (2020). http://ceur-ws.org/Vol-2624/paper10.pdf
Powers, D.M.W.: The problem with kappa. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp. 345–355. Association for Computational Linguistics, April 2012. https://www.aclweb.org/anthology/E12-1035
Uma, A., Fornaciari, T., Hovy, D., Paun, S., Plank, B., Poesio, M.: A case for soft loss functions. In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, vol. 8, no. 1, pp. 173–177, October 2020. https://ojs.aaai.org/index.php/HCOMP/article/view/7478
Waseem, Z.: Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, Austin, Texas, pp. 138–142. Association for Computational Linguistics, November 2016. https://doi.org/10.18653/v1/W16-5618. https://www.aclweb.org/anthology/W16-5618

Download references

Acknowledgments

The author would like to express his gratitude to the anonymous reviewers of AIxIA, whose comments greatly contributed to improving this work for the final version. The author would also like to thank Thomas Davidson and his colleagues, for kindly and promptly provide the non-aggregated version of their corpus. This work is partially funded by the project “Be Positive!” (under the 2019 “Google.org Impact Challenge on Safety” call).^{Footnote 9}

Author information

Authors and Affiliations

University of Turin, Corso Svizzera 185, Turin, Italy
Valerio Basile

Authors

Valerio Basile
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valerio Basile .

Editor information

Editors and Affiliations

Università degli Studi di Torino, Turin, Italy
Matteo Baldoni
Department of Informatics, Systems and C, University of Milano-Bicocca, Milan, Italy
Stefania Bandini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Basile, V. (2021). It’s the End of the Gold Standard as We Know It. In: Baldoni, M., Bandini, S. (eds) AIxIA 2020 – Advances in Artificial Intelligence. AIxIA 2020. Lecture Notes in Computer Science(), vol 12414. Springer, Cham. https://doi.org/10.1007/978-3-030-77091-4_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-77091-4_26
Published: 22 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77090-7
Online ISBN: 978-3-030-77091-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics