Skip to main content

It’s the End of the Gold Standard as We Know It

Leveraging Non-aggregated Data for Better Evaluation and Explanation of Subjective Tasks

  • Conference paper
  • First Online:
AIxIA 2020 – Advances in Artificial Intelligence (AIxIA 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12414))

Abstract

Supervised machine learning, in particular in Natural Language Processing, is based on the creation of high-quality gold standard datasets for training and benchmarking. The de-facto standard annotation methodologies work well for traditionally relevant tasks in Computational Linguistics. However, critical issues are surfacing when applying old techniques to the study of highly subjective phenomena such as irony and sarcasm, or abusive and offensive language. This paper calls for a paradigm shift, away from monolithic, majority-aggregated gold standards, and towards an inclusive framework that preserves the personal opinions and culturally-driven perspectives of the annotators. New training sets and supervised machine learning techniques will have to be adapted in order to create fair, inclusive, and ultimately more informed models of subjective semantic and pragmatic phenomena. The arguments are backed by a synthetic experiment showing the lack of correlation between the difficulty of an annotation task, its degree of subjectivity, and the quality of the predictions of a supervised classifier trained on the resulting data. A further experiment on real data highlights the beneficial impact of the proposed methodologies in terms of explainability of perspective-aware hate speech detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The metaphor refers to the ideal spectrum often used in linguistics, where phenomena of natural language are organized on a scale roughly covering, in order: morphology, syntax, semantics, pragmatics.

  2. 2.

    https://www.mturk.com/.

  3. 3.

    https://appen.com/.

  4. 4.

    Whether this price is fair has been debated for some years now [7].

  5. 5.

    https://en.wikipedia.org/wiki/Gold_standard.

  6. 6.

    https://github.com/CyberZHG/keras-bert.

  7. 7.

    https://sites.google.com/view/semeval2021-task12/home.

  8. 8.

    The ideas presented in this position paper are also collected and organized around the online initiative The Non-aggregation Manifesto: https://valeriobasile.github.io/manifesto/.

  9. 9.

    https://impactchallenge.withgoogle.com/safety2019.

References

  1. Akhtar, S., Basile, V., Patti, V.: A new measure of polarization in the annotation of hate speech. In: Alviano, M., Greco, G., Scarcello, F. (eds.) AI*IA 2019. LNCS (LNAI), vol. 11946, pp. 588–603. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35166-3_41

    Chapter  Google Scholar 

  2. Akhtar, S., Basile, V., Patti, V.: Modeling annotator perspective and polarized opinions to improve hate speech detection. In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, vol. 8, no. 1, pp. 151–154, October 2020. https://ojs.aaai.org/index.php/HCOMP/article/view/7473

  3. Aroyo, L., Welty, C.: Truth is a lie: crowd truth and the seven myths of human annotation. AI Mag. 36(1), 15–24 (2015)

    Google Scholar 

  4. Checco, A., Roitero, K., Maddalena, E., Mizzaro, S., Demartini, G.: Let’s agree to disagree: fixing agreement measures for crowdsourcing. In: Proceedings of the Fifth AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2017, 23–26 October 2017, Québec City, Québec, Canada, pp. 11–20. AAAI Press (2017)

    Google Scholar 

  5. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th International AAAI Conference on Web and Social Media, ICWSM 2017, pp. 512–515 (2017)

    Google Scholar 

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), Minneapolis, MN, pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/N19-1423. https://www.aclweb.org/anthology/N19-1423

  7. Felstiner, A.: Working the crowd: employment and labor law in the crowdsourcing industry. Berkeley J. Employ. Lab. Law 32, 143 (2011)

    Google Scholar 

  8. Hovy, D., Berg-Kirkpatrick, T., Vaswani, A., Hovy, E.: Learning whom to trust with MACE. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, pp. 1120–1130. Association for Computational Linguistics (2013). https://www.aclweb.org/anthology/N13-1132

  9. Klenner, M., Göhring, A., Amsler, M.: Harmonization sometimes harms. In: Ebling, S., Tuggener, D., Hürlimann, M., Cieliebak, M., Volk, M. (eds.) Proceedings of the 5th Swiss Text Analytics Conference and the 16th Conference on Natural Language Processing, SwissText/KONVENS 2020, Zurich, Switzerland, 23–25 June 2020 [Online Only]. CEUR Workshop Proceedings, vol. 2624. CEUR-WS.org (2020). http://ceur-ws.org/Vol-2624/paper10.pdf

  10. Powers, D.M.W.: The problem with kappa. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp. 345–355. Association for Computational Linguistics, April 2012. https://www.aclweb.org/anthology/E12-1035

  11. Uma, A., Fornaciari, T., Hovy, D., Paun, S., Plank, B., Poesio, M.: A case for soft loss functions. In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, vol. 8, no. 1, pp. 173–177, October 2020. https://ojs.aaai.org/index.php/HCOMP/article/view/7478

  12. Waseem, Z.: Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, Austin, Texas, pp. 138–142. Association for Computational Linguistics, November 2016. https://doi.org/10.18653/v1/W16-5618. https://www.aclweb.org/anthology/W16-5618

Download references

Acknowledgments

The author would like to express his gratitude to the anonymous reviewers of AIxIA, whose comments greatly contributed to improving this work for the final version. The author would also like to thank Thomas Davidson and his colleagues, for kindly and promptly provide the non-aggregated version of their corpus. This work is partially funded by the project “Be Positive!” (under the 2019 “Google.org Impact Challenge on Safety” call).Footnote 9

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valerio Basile .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Basile, V. (2021). It’s the End of the Gold Standard as We Know It. In: Baldoni, M., Bandini, S. (eds) AIxIA 2020 – Advances in Artificial Intelligence. AIxIA 2020. Lecture Notes in Computer Science(), vol 12414. Springer, Cham. https://doi.org/10.1007/978-3-030-77091-4_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77091-4_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77090-7

  • Online ISBN: 978-3-030-77091-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics