Abstract
Frequently, Machine Learning (ML) algorithms are trained on human-labeled data. Although often seen as a “gold standard,” human labeling is all but error free. Decisions in the design of labeling tasks can lead to distortions of the resulting labeled data and impact predictions. Building on insights from survey methodology, a field that studies the impact of instrument design on survey data and estimates, we examine how the structure of a hate speech labeling task affects which labels are assigned. We also examine what effect task ordering has on the perception of hate speech and what role background characteristics of annotators have on classifications provided by annotators. The study demonstrates the importance of applying design thinking at the earliest steps of ML product development. Design principles such as quick prototyping and critically assessing user interfaces are not only important in interaction with end users of an artificial intelligence (AI)-driven products, but are crucial early in development, prior to training AI algorithms.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Sen, S., Giesel, M., Gold, R., et al.: Turkers, scholars, “Arafat” and “peace”: cultural communities and algorithmic gold standards. In: CSCW 2015: Computer Supported Cooperative Work and Social Computing, Vancouver BC, Canada (2015)
Sambasivan, N., Kapania, S., Highfill, H., et al.: Everyone wants to do the model work, not the data work: data cascades in high-stakes AI (2021). https://doi.org/10.1145/3411764.3445518
Goldschmidt, G.: Critical design and design thinking vs. critical design and design thinking. In: Different Perspectives in Design Thinking (2022)
Schuman, H., Presser, S.: Question wording as an independent variable in survey analysis. Sociol. Methods Res. 6(2), 151–170 (2016). https://doi.org/10.1177/004912417700600202
Groves, R., Fowler, F., Couper, M., Lepkowski, J., Singer, E., Tourangeau, R.: Survey Methodology. Wiley, Hoboken (2004)
Blasius, I., Thiessen, V.: Assessing the quality of survey data (2012)
Biemer, P., de Leeuw, E.D., Eckman, S., et al.: Total Survey Error in Practice. Wiley, Hoboken (2017)
Tourangeau, R., Rips, L.J., Rasinski, K.: The Psychology of Survey Response (2000)
Willis, G.B.: Cognitive Interviewing: A tool for Improving Questionnaire Design. Sage, Thousand Oaks (2005). https://doi.org/10.4135/9781412983655
Groves, R.M., Fowler, F.J., Couper, M.P., et al.: Survey Methodology, 2nd edn. Wiley, Hoboken (2009)
Meinel, C., von Thienen, J.: Design thinking. Informatik-Spektrum 39(4), 310–314 (2016). https://doi.org/10.1007/s00287-016-0977-2
Schuman, H., Presser, S.: Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context. Sage, Thousand Oaks (1996)
Thorsteinson, T.J., Breier, J., Atwell, A., et al.: Anchoring effects on performance judgments. Organ. Behav. Hum. Decis. Process. 107(1), 29–40 (2008). https://doi.org/10.1016/j.obhdp.2008.01.003
Bless, H., Schwarz, N.: Mental construal and the emergence of assimilation and contrast effects: the inclusion/exclusion model. Adv. Exp. Soc. Psychol. 42, 319–373 (2010)
Manis, M., Biernat, M., Nelson, T.F.: Comparison and expectancy processes in human judgment. J. Pers. Soc. Psychol. 61(2), 203–211 (1991). https://doi.org/10.1037//0022-3514.61.2.203[publishedOnlineFirst:1991/08/01]
Neurocomputing 273, 494–508 (2018). https://doi.org/10.1016/j.neucom.2017.08.001
Wu, J., Sheng, V.S., Zhang, J., et al.: Multi-label active learning algorithms for image classification: overview and future promise. ACM Comput. Surv. 53(2) (2020). https://doi.org/10.1145/3379504 [published Online First: 06 Jan 2020]
Krosnick, J.A., Narayan, S., Smith, W.R.: Satisficing in surveys: initial evidence. N. Dir. Eval. 1996(70), 29–44 (1996). https://doi.org/10.1002/ev.1033
Stern, M., Dillman, D.A., Smyth, J.D.: Visual design, order effects, and respondent characteristics in a self-administered survey (2007). https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1681&context=sociologyfacpub
Kara, Y.E., Gaye, G., Aran, O., et al.: Modeling annotator behaviors for crowd labeling (2014). https://yekara.com/pub/kara2015cl.pdf
De Vries, T., Misra, I., Wang, C., et al.: Does object recognition work for everyone? In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, Long Beach, CA (2019)
Spinde, T., Rudnitckaia, L., Sinha, K., et al.: MBIC – a media bias annotation dataset including annotator characteristics. In: Proceedings of the iConference (2021)
Davidson, T., Warmsley, D., Macy, M., et al.: Automated hate speech detection and the problem of offensive language (2017). https://ojs.aaai.org/index.php/ICWSM/article/view/14955/14805
Bhardvaj, V., Passonneau, R.J., Salleb-Aouissi, A., et al.: Anveshan: a framework for analysis of multiple annotators’ labeling behavior. In: LAW IV 2010: Proceedings of the Fourth Linguistic Annotation Workshop (2010)
Mangione, T.W., Fowler, F.J., Louis, T.A.: Question characteristics and interviewer effects. J. Off. Stat. 8(3), 293–307 (1992)
Beaman, C., Morton, J.: The separate but related origins of the recency effect and the modality effect in free recall (2000). https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.727.9587&rep=rep1&type=pdf
Schnell, R., Kreuter, F.: Separating interviewer and sampling-point effects. J. Off. Stat. 21(3), 389–410 (2005)
Founta, A., Djouvas, C., Chatzakou, D., et al.: Large scale crowdsourcing and characterization of twitter abusive behavior. In: Proceedings of the Twelfth International AAAI Conference on Web and Social Media (ICWSM) (2018)
Eckman, S., Kreuter, F., Kirchner, A., Jäckle, A., Tourangeau, R., Presser, S.: Assessing the mechanisms of misreporting to filter questions in surveys. Publ. Opin. Q. 78(3), 721–733 (2014)
Chew, R., et al.: SMART: an open source data labeling platform for supervised learning. J. Mach. Learn. Res. 20(82), 1–5 (2019). https://arxiv.org/abs/1812.06591
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Beck, J., Eckman, S., Chew, R., Kreuter, F. (2022). Improving Labeling Through Social Science Insights: Results and Research Agenda. In: Chen, J.Y.C., Fragomeni, G., Degen, H., Ntoa, S. (eds) HCI International 2022 – Late Breaking Papers: Interacting with eXtended Reality and Artificial Intelligence. HCII 2022. Lecture Notes in Computer Science, vol 13518. Springer, Cham. https://doi.org/10.1007/978-3-031-21707-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-21707-4_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21706-7
Online ISBN: 978-3-031-21707-4
eBook Packages: Computer ScienceComputer Science (R0)