Skip to main content

Improving Labeling Through Social Science Insights: Results and Research Agenda

  • Conference paper
  • First Online:
  • 1478 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13518))

Abstract

Frequently, Machine Learning (ML) algorithms are trained on human-labeled data. Although often seen as a “gold standard,” human labeling is all but error free. Decisions in the design of labeling tasks can lead to distortions of the resulting labeled data and impact predictions. Building on insights from survey methodology, a field that studies the impact of instrument design on survey data and estimates, we examine how the structure of a hate speech labeling task affects which labels are assigned. We also examine what effect task ordering has on the perception of hate speech and what role background characteristics of annotators have on classifications provided by annotators. The study demonstrates the importance of applying design thinking at the earliest steps of ML product development. Design principles such as quick prototyping and critically assessing user interfaces are not only important in interaction with end users of an artificial intelligence (AI)-driven products, but are crucial early in development, prior to training AI algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Sen, S., Giesel, M., Gold, R., et al.: Turkers, scholars, “Arafat” and “peace”: cultural communities and algorithmic gold standards. In: CSCW 2015: Computer Supported Cooperative Work and Social Computing, Vancouver BC, Canada (2015)

    Google Scholar 

  2. Sambasivan, N., Kapania, S., Highfill, H., et al.: Everyone wants to do the model work, not the data work: data cascades in high-stakes AI (2021). https://doi.org/10.1145/3411764.3445518

  3. Goldschmidt, G.: Critical design and design thinking vs. critical design and design thinking. In: Different Perspectives in Design Thinking (2022)

    Google Scholar 

  4. Schuman, H., Presser, S.: Question wording as an independent variable in survey analysis. Sociol. Methods Res. 6(2), 151–170 (2016). https://doi.org/10.1177/004912417700600202

    Article  Google Scholar 

  5. Groves, R., Fowler, F., Couper, M., Lepkowski, J., Singer, E., Tourangeau, R.: Survey Methodology. Wiley, Hoboken (2004)

    MATH  Google Scholar 

  6. Blasius, I., Thiessen, V.: Assessing the quality of survey data (2012)

    Google Scholar 

  7. Biemer, P., de Leeuw, E.D., Eckman, S., et al.: Total Survey Error in Practice. Wiley, Hoboken (2017)

    Google Scholar 

  8. Tourangeau, R., Rips, L.J., Rasinski, K.: The Psychology of Survey Response (2000)

    Google Scholar 

  9. Willis, G.B.: Cognitive Interviewing: A tool for Improving Questionnaire Design. Sage, Thousand Oaks (2005). https://doi.org/10.4135/9781412983655

  10. Groves, R.M., Fowler, F.J., Couper, M.P., et al.: Survey Methodology, 2nd edn. Wiley, Hoboken (2009)

    Google Scholar 

  11. Meinel, C., von Thienen, J.: Design thinking. Informatik-Spektrum 39(4), 310–314 (2016). https://doi.org/10.1007/s00287-016-0977-2

    Article  Google Scholar 

  12. Schuman, H., Presser, S.: Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context. Sage, Thousand Oaks (1996)

    Google Scholar 

  13. Thorsteinson, T.J., Breier, J., Atwell, A., et al.: Anchoring effects on performance judgments. Organ. Behav. Hum. Decis. Process. 107(1), 29–40 (2008). https://doi.org/10.1016/j.obhdp.2008.01.003

    Article  Google Scholar 

  14. Bless, H., Schwarz, N.: Mental construal and the emergence of assimilation and contrast effects: the inclusion/exclusion model. Adv. Exp. Soc. Psychol. 42, 319–373 (2010)

    Article  Google Scholar 

  15. Manis, M., Biernat, M., Nelson, T.F.: Comparison and expectancy processes in human judgment. J. Pers. Soc. Psychol. 61(2), 203–211 (1991). https://doi.org/10.1037//0022-3514.61.2.203[publishedOnlineFirst:1991/08/01]

    Article  Google Scholar 

  16. Neurocomputing 273, 494–508 (2018). https://doi.org/10.1016/j.neucom.2017.08.001

  17. Wu, J., Sheng, V.S., Zhang, J., et al.: Multi-label active learning algorithms for image classification: overview and future promise. ACM Comput. Surv. 53(2) (2020). https://doi.org/10.1145/3379504 [published Online First: 06 Jan 2020]

  18. Krosnick, J.A., Narayan, S., Smith, W.R.: Satisficing in surveys: initial evidence. N. Dir. Eval. 1996(70), 29–44 (1996). https://doi.org/10.1002/ev.1033

    Article  Google Scholar 

  19. Stern, M., Dillman, D.A., Smyth, J.D.: Visual design, order effects, and respondent characteristics in a self-administered survey (2007). https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1681&context=sociologyfacpub

  20. Kara, Y.E., Gaye, G., Aran, O., et al.: Modeling annotator behaviors for crowd labeling (2014). https://yekara.com/pub/kara2015cl.pdf

  21. De Vries, T., Misra, I., Wang, C., et al.: Does object recognition work for everyone? In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, Long Beach, CA (2019)

    Google Scholar 

  22. Spinde, T., Rudnitckaia, L., Sinha, K., et al.: MBIC – a media bias annotation dataset including annotator characteristics. In: Proceedings of the iConference (2021)

    Google Scholar 

  23. Davidson, T., Warmsley, D., Macy, M., et al.: Automated hate speech detection and the problem of offensive language (2017). https://ojs.aaai.org/index.php/ICWSM/article/view/14955/14805

  24. Bhardvaj, V., Passonneau, R.J., Salleb-Aouissi, A., et al.: Anveshan: a framework for analysis of multiple annotators’ labeling behavior. In: LAW IV 2010: Proceedings of the Fourth Linguistic Annotation Workshop (2010)

    Google Scholar 

  25. Mangione, T.W., Fowler, F.J., Louis, T.A.: Question characteristics and interviewer effects. J. Off. Stat. 8(3), 293–307 (1992)

    Google Scholar 

  26. Beaman, C., Morton, J.: The separate but related origins of the recency effect and the modality effect in free recall (2000). https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.727.9587&rep=rep1&type=pdf

  27. Schnell, R., Kreuter, F.: Separating interviewer and sampling-point effects. J. Off. Stat. 21(3), 389–410 (2005)

    Google Scholar 

  28. Founta, A., Djouvas, C., Chatzakou, D., et al.: Large scale crowdsourcing and characterization of twitter abusive behavior. In: Proceedings of the Twelfth International AAAI Conference on Web and Social Media (ICWSM) (2018)

    Google Scholar 

  29. Eckman, S., Kreuter, F., Kirchner, A., Jäckle, A., Tourangeau, R., Presser, S.: Assessing the mechanisms of misreporting to filter questions in surveys. Publ. Opin. Q. 78(3), 721–733 (2014)

    Article  Google Scholar 

  30. Chew, R., et al.: SMART: an open source data labeling platform for supervised learning. J. Mach. Learn. Res. 20(82), 1–5 (2019). https://arxiv.org/abs/1812.06591

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacob Beck .

Editor information

Editors and Affiliations

Appendix

Appendix

Appendix 1.
figure 6

Screenshots of Conditions A, C, and E

Appendix 2.
figure 7

Balance plots for demographic covariates across Conditions A–F

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Beck, J., Eckman, S., Chew, R., Kreuter, F. (2022). Improving Labeling Through Social Science Insights: Results and Research Agenda. In: Chen, J.Y.C., Fragomeni, G., Degen, H., Ntoa, S. (eds) HCI International 2022 – Late Breaking Papers: Interacting with eXtended Reality and Artificial Intelligence. HCII 2022. Lecture Notes in Computer Science, vol 13518. Springer, Cham. https://doi.org/10.1007/978-3-031-21707-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21707-4_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21706-7

  • Online ISBN: 978-3-031-21707-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics