Improving Labeling Through Social Science Insights: Results and Research Agenda

Beck, Jacob; Eckman, Stephanie; Chew, Rob; Kreuter, Frauke

doi:10.1007/978-3-031-21707-4_19

Improving Labeling Through Social Science Insights: Results and Research Agenda

Conference paper
First Online: 25 November 2022

1478 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13518))

Abstract

Frequently, Machine Learning (ML) algorithms are trained on human-labeled data. Although often seen as a “gold standard,” human labeling is all but error free. Decisions in the design of labeling tasks can lead to distortions of the resulting labeled data and impact predictions. Building on insights from survey methodology, a field that studies the impact of instrument design on survey data and estimates, we examine how the structure of a hate speech labeling task affects which labels are assigned. We also examine what effect task ordering has on the perception of hate speech and what role background characteristics of annotators have on classifications provided by annotators. The study demonstrates the importance of applying design thinking at the earliest steps of ML product development. Design principles such as quick prototyping and critically assessing user interfaces are not only important in interaction with end users of an artificial intelligence (AI)-driven products, but are crucial early in development, prior to training AI algorithms.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Sen, S., Giesel, M., Gold, R., et al.: Turkers, scholars, “Arafat” and “peace”: cultural communities and algorithmic gold standards. In: CSCW 2015: Computer Supported Cooperative Work and Social Computing, Vancouver BC, Canada (2015)
Google Scholar
Sambasivan, N., Kapania, S., Highfill, H., et al.: Everyone wants to do the model work, not the data work: data cascades in high-stakes AI (2021). https://doi.org/10.1145/3411764.3445518
Goldschmidt, G.: Critical design and design thinking vs. critical design and design thinking. In: Different Perspectives in Design Thinking (2022)
Google Scholar
Schuman, H., Presser, S.: Question wording as an independent variable in survey analysis. Sociol. Methods Res. 6(2), 151–170 (2016). https://doi.org/10.1177/004912417700600202
Article Google Scholar
Groves, R., Fowler, F., Couper, M., Lepkowski, J., Singer, E., Tourangeau, R.: Survey Methodology. Wiley, Hoboken (2004)
MATH Google Scholar
Blasius, I., Thiessen, V.: Assessing the quality of survey data (2012)
Google Scholar
Biemer, P., de Leeuw, E.D., Eckman, S., et al.: Total Survey Error in Practice. Wiley, Hoboken (2017)
Google Scholar
Tourangeau, R., Rips, L.J., Rasinski, K.: The Psychology of Survey Response (2000)
Google Scholar
Willis, G.B.: Cognitive Interviewing: A tool for Improving Questionnaire Design. Sage, Thousand Oaks (2005). https://doi.org/10.4135/9781412983655
Groves, R.M., Fowler, F.J., Couper, M.P., et al.: Survey Methodology, 2nd edn. Wiley, Hoboken (2009)
Google Scholar
Meinel, C., von Thienen, J.: Design thinking. Informatik-Spektrum 39(4), 310–314 (2016). https://doi.org/10.1007/s00287-016-0977-2
Article Google Scholar
Schuman, H., Presser, S.: Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context. Sage, Thousand Oaks (1996)
Google Scholar
Thorsteinson, T.J., Breier, J., Atwell, A., et al.: Anchoring effects on performance judgments. Organ. Behav. Hum. Decis. Process. 107(1), 29–40 (2008). https://doi.org/10.1016/j.obhdp.2008.01.003
Article Google Scholar
Bless, H., Schwarz, N.: Mental construal and the emergence of assimilation and contrast effects: the inclusion/exclusion model. Adv. Exp. Soc. Psychol. 42, 319–373 (2010)
Article Google Scholar
Manis, M., Biernat, M., Nelson, T.F.: Comparison and expectancy processes in human judgment. J. Pers. Soc. Psychol. 61(2), 203–211 (1991). https://doi.org/10.1037//0022-3514.61.2.203[publishedOnlineFirst:1991/08/01]
Article Google Scholar
Neurocomputing 273, 494–508 (2018). https://doi.org/10.1016/j.neucom.2017.08.001
Wu, J., Sheng, V.S., Zhang, J., et al.: Multi-label active learning algorithms for image classification: overview and future promise. ACM Comput. Surv. 53(2) (2020). https://doi.org/10.1145/3379504 [published Online First: 06 Jan 2020]
Krosnick, J.A., Narayan, S., Smith, W.R.: Satisficing in surveys: initial evidence. N. Dir. Eval. 1996(70), 29–44 (1996). https://doi.org/10.1002/ev.1033
Article Google Scholar
Stern, M., Dillman, D.A., Smyth, J.D.: Visual design, order effects, and respondent characteristics in a self-administered survey (2007). https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1681&context=sociologyfacpub
Kara, Y.E., Gaye, G., Aran, O., et al.: Modeling annotator behaviors for crowd labeling (2014). https://yekara.com/pub/kara2015cl.pdf
De Vries, T., Misra, I., Wang, C., et al.: Does object recognition work for everyone? In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, Long Beach, CA (2019)
Google Scholar
Spinde, T., Rudnitckaia, L., Sinha, K., et al.: MBIC – a media bias annotation dataset including annotator characteristics. In: Proceedings of the iConference (2021)
Google Scholar
Davidson, T., Warmsley, D., Macy, M., et al.: Automated hate speech detection and the problem of offensive language (2017). https://ojs.aaai.org/index.php/ICWSM/article/view/14955/14805
Bhardvaj, V., Passonneau, R.J., Salleb-Aouissi, A., et al.: Anveshan: a framework for analysis of multiple annotators’ labeling behavior. In: LAW IV 2010: Proceedings of the Fourth Linguistic Annotation Workshop (2010)
Google Scholar
Mangione, T.W., Fowler, F.J., Louis, T.A.: Question characteristics and interviewer effects. J. Off. Stat. 8(3), 293–307 (1992)
Google Scholar
Beaman, C., Morton, J.: The separate but related origins of the recency effect and the modality effect in free recall (2000). https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.727.9587&rep=rep1&type=pdf
Schnell, R., Kreuter, F.: Separating interviewer and sampling-point effects. J. Off. Stat. 21(3), 389–410 (2005)
Google Scholar
Founta, A., Djouvas, C., Chatzakou, D., et al.: Large scale crowdsourcing and characterization of twitter abusive behavior. In: Proceedings of the Twelfth International AAAI Conference on Web and Social Media (ICWSM) (2018)
Google Scholar
Eckman, S., Kreuter, F., Kirchner, A., Jäckle, A., Tourangeau, R., Presser, S.: Assessing the mechanisms of misreporting to filter questions in surveys. Publ. Opin. Q. 78(3), 721–733 (2014)
Article Google Scholar
Chew, R., et al.: SMART: an open source data labeling platform for supervised learning. J. Mach. Learn. Res. 20(82), 1–5 (2019). https://arxiv.org/abs/1812.06591

Download references

Author information

Authors and Affiliations

Ludwig-Maximilians-University Munich, Munich, Germany
Jacob Beck & Frauke Kreuter
RTI International, Washington, DC, USA
Stephanie Eckman & Rob Chew
University of Maryland, College Park, MD, USA
Frauke Kreuter

Authors

Jacob Beck
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie Eckman
View author publications
You can also search for this author in PubMed Google Scholar
Rob Chew
View author publications
You can also search for this author in PubMed Google Scholar
Frauke Kreuter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jacob Beck .

Editor information

Editors and Affiliations

U.S. Army Research Laboratory, Adelphi, MD, USA
Jessie Y. C. Chen
U.S. Army Combat Capabilities Development Command Soldier Center, Orlando, FL, USA
Gino Fragomeni
Siemens (United States), Princeton, NJ, USA
Helmut Degen
Institute of Computer Science, Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Stavroula Ntoa

Appendix

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beck, J., Eckman, S., Chew, R., Kreuter, F. (2022). Improving Labeling Through Social Science Insights: Results and Research Agenda. In: Chen, J.Y.C., Fragomeni, G., Degen, H., Ntoa, S. (eds) HCI International 2022 – Late Breaking Papers: Interacting with eXtended Reality and Artificial Intelligence. HCII 2022. Lecture Notes in Computer Science, vol 13518. Springer, Cham. https://doi.org/10.1007/978-3-031-21707-4_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-21707-4_19
Published: 25 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21706-7
Online ISBN: 978-3-031-21707-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation