skip to main content
10.1145/3159652.3159654acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Cognitive Biases in Crowdsourcing

Published: 02 February 2018 Publication History

Abstract

Crowdsourcing has become a popular paradigm in data curation, annotation and evaluation for many artificial intelligence and information retrieval applications. Considerable efforts have gone into devising effective quality control mechanisms that identify or discourage cheat submissions in an attempt to improve the quality of noisy crowd judgments. Besides purposeful cheating, there is another source of noise that is often alluded to but insufficiently studied: Cognitive biases.
This paper investigates the prevalence and effect size of a range of common cognitive biases on a standard relevance judgment task. Our experiments are based on three sizable publicly available document collections and note significant detrimental effects on annotation quality, system ranking and the performance of derived rankers when task design does not account for such biases.

References

[1]
Omar Alonso, Daniel E Rose, and Benjamin Stewart . 2008. Crowdsourcing for relevance evaluation. In ACM SigIR Forum, Vol. Vol. 42. ACM, 9--15.
[2]
Piyush Bansal, Carsten Eickhoff, and Thomas Hofmann . 2016. Active Content-Based Crowdsourcing Task Selection. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 529--538.
[3]
Sushil Bikhchandani, David Hirshleifer, and Ivo Welch . 1992. A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of political Economy Vol. 100, 5(1992), 992--1026.
[4]
Roi Blanco, Harry Halpin, Daniel M Herzig, Peter Mika, Jeffrey Pound, Henry S Thompson, and Thanh Tran Duc . 2011. Repeatable and reliable search system evaluation using crowdsourcing Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 923--932.
[5]
Herbert Bless, Klaus Fiedler, and Fritz Strack . 2004. Social cognition: How individuals construct social reality. Psychology Press.
[6]
Ben Carterette, Paul Bennett, David Chickering, and Susan Dumais . 2008. Here or there. Advances in Information Retrieval(2008), 16--27.
[7]
Dana Chandler and John Joseph Horton . 2011. Labor allocation in paid crowdsourcing: Experimental evidence on positioning, nudges and prices. Human Computation Vol. 11(2011), 11.
[8]
Cyril Cleverdon . 1997. Readings in Information Retrieval. Readings in Information Retrieval, bibfieldeditorKaren Sparck Jones and Peter Willett(Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, Chapter The Cranfield Tests on Index Language Devices, 47--59. http://dl.acm.org/citation.cfm?id=275537.275544
[9]
Kevyn Collins-Thompson, Paul Bennett, Charles LA Clarke, and Ellen M Voorhees . 2014. TREC 2013 web track overview. Technical Report. MICHIGAN UNIV ANN ARBOR.
[10]
Kevyn Collins-Thompson, Craig Macdonald, Paul Bennett, Fernando Diaz, and Ellen M Voorhees . 2015. TREC 2014 web track overview. Technical Report. MICHIGAN UNIV ANN ARBOR.
[11]
Martin Davtyan, Carsten Eickhoff, and Thomas Hofmann . 2015. Exploiting Document Content for Efficient Aggregation of Crowdsourcing Votes Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 783--790.
[12]
Alexander Philip Dawid and Allan M Skene . 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied statistics(1979), 20--28.
[13]
Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini, and Philippe Cudré-Mauroux . 2014. Scaling-up the Crowd: Micro-Task Pricing Schemes for Worker Retention and Latency Improvement Second AAAI Conference on Human Computation and Crowdsourcing.
[14]
Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudré-Mauroux . 2013. Pick-a-crowd: tell me what you like, and i'll tell you what to do Proceedings of the 22nd international conference on World Wide Web. International World Wide Web Conferences Steering Committee, 367--374.
[15]
Carsten Eickhoff . 2014. Crowd-powered experts: Helping surgeons interpret breast cancer images Proceedings of the First International Workshop on Gamification for Information Retrieval. ACM, 53--56.
[16]
Carsten Eickhoff and Arjen de Vries . 2011. How crowdsourcable is your task. In Proceedings of the workshop on crowdsourcing for search and data mining(CSDM) at the fourth ACM international conference on web search and data mining(WSDM). 11--14.
[17]
Carsten Eickhoff and Arjen P de Vries . 2013. Increasing cheat robustness of crowdsourcing tasks. Information retrieval Vol. 16, 2(2013), 121--137.
[18]
Carsten Eickhoff, Christopher G Harris, Arjen P de Vries, and Padmini Srinivasan . 2012. Quality through flow and immersion: gamifying crowdsourced relevance assessments Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 871--880.
[19]
Daniel Ellsberg . 1961. Risk, ambiguity, and the Savage axioms. The quarterly journal of economics(1961), 643--669.
[20]
Boi Faltings, Radu Jurca, Pearl Pu, and Bao Duy Tran . 2014. Incentives to counter bias in human computation. Second AAAI conference on human computation and crowdsourcing.
[21]
Marvin Fleischmann, Miglena Amirpur, Alexander Benlian, and Thomas Hess . 2014. Cognitive biases in information systems research: a scientometric analysis. (2014).
[22]
Catherine Grady and Matthew Lease . 2010. Crowdsourcing document relevance assessment with mechanical turk Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon's mechanical turk. Association for Computational Linguistics, 172--179.
[23]
Simon Greuter, Philip Junker, Lorenz Kuhn, Felix Mance, Virgile Mermet, Angela Rellstab, and Carsten Eickhoff . 2016. ETH Zurich at TREC Clinical Decision Support 2016. TREC.
[24]
Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft . 2016. A deep relevance matching model for ad-hoc retrieval Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 55--64.
[25]
Sara Hajian, Francesco Bonchi, and Carlos Castillo . 2016. Algorithmic bias: from discrimination discovery to fairness-aware data mining Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2125--2126.
[26]
Donna Harman . 1992. The DARPA tipster project. In ACM SIGIR Forum, Vol. Vol. 26. ACM, 26--28.
[27]
Martie G Haselton, Daniel Nettle, and Damian R Murray . 2005. The evolution of cognitive bias. The handbook of evolutionary psychology(2005).
[28]
Matthias Hirth, Tobias Hoßfeld, and Phuoc Tran-Gia . 2010. Cheat-detection mechanisms for crowdsourcing. University of Würzburg, Tech. Rep Vol. 4(2010).
[29]
Mehdi Hosseini, Ingemar J Cox, Natavsa Milić-Frayling, Gabriella Kazai, and Vishwa Vinay . 2012. On aggregating labels from multiple crowd workers to infer relevance of documents. Advances in information retrieval. Springer, 182--194.
[30]
Jeff Howe . 2008. Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business(bibinfoedition1 ed.). Crown Publishing Group, New York, NY, USA.
[31]
Joel Huber, John W Payne, and Christopher Puto . 1982. Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis. Journal of consumer research Vol. 9, 1(1982), 90--98.
[32]
Panos Ipeirotis . 2011. Crowdsourcing using mechanical turk: quality management and scalability Proceedings of the 8th International Workshop on Information Integration on the Web: in conjunction with WWW 2011. ACM, 1.
[33]
Panagiotis G Ipeirotis and Praveen K Paritosh . 2011. Managing crowdsourced human computation: a tutorial Proceedings of the 20th international conference companion on World wide web. ACM, 287--288.
[34]
David R Karger, Sewoong Oh, and Devavrat Shah . 2014. Budget-optimal task allocation for reliable crowdsourcing systems. Operations Research, Vol. 62, 1(2014), 1--24.
[35]
Gabriella Kazai, Jaap Kamps, Marijn Koolen, and Natasa Milic-Frayling . 2011. Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 205--214.
[36]
Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling . 2011. Worker types and personality traits in crowdsourcing relevance labels Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 1941--1944.
[37]
Gabriella Kazai, Jaap Kamps, and Natasa Milic-Frayling . 2013. An analysis of human factors and label accuracy in crowdsourcing relevance judgments. Information retrieval Vol. 16, 2(2013), 138--178.
[38]
Gabriella Kazai and Natasa Milic-Frayling . 2009. On the evaluation of the quality of relevance assessments collected through crowdsourcing SIGIR 2009 Workshop on the Future of IR Evaluation. 21.
[39]
Aniket Kittur, Ed H Chi, and Bongwon Suh . 2008. Crowdsourcing user studies with Mechanical Turk. Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 453--456.
[40]
Matthew Lease . 2011. On Quality Control and Machine Learning in Crowdsourcing. Human Computation Vol. 11(2011), 11.
[41]
Catherine C Marshall and Frank M Shipman . 2011. The ownership and reuse of visual media. In Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries. ACM, 157--166.
[42]
Rohan Ramanath, Monojit Choudhury, Kalika Bali, and Rishiraj Saha Roy . 2013. Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation. In ACL(1). 1713--1722.
[43]
Kirk Roberts, Dina Demner-Fushman, Ellen M Voorhees, and William R Hersh . 2016. Overview of the TREC 2016 Clinical Decision Support Track. TREC.
[44]
Kirk Roberts, Matthew S Simpson, Ellen M Voorhees, and William R Hersh . 2015. Overview of the TREC 2015 Clinical Decision Support Track. TREC.
[45]
Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, and others . 1995. Okapi at TREC-3. Nist Special Publication Sp Vol. 109(1995), 109.
[46]
Marta Sabou, Kalina Bontcheva, Leon Derczynski, and Arno Scharl . 2014. Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines. LREC. 859--866.
[47]
Matthew S Simpson, Ellen M Voorhees, and William Hersh . 2014. Overview of the trec 2014 clinical decision support track. Technical Report. LISTER HILL NATIONAL CENTER FOR BIOMEDICAL COMMUNICATIONS BETHESDA MD.
[48]
Wei Tang and Matthew Lease . 2011. Semi-supervised consensus labeling for crowdsourcing SIGIR 2011 workshop on crowdsourcing for information retrieval(CIR).
[49]
Amos Tversky and Daniel Kahneman . 1975. Judgment under uncertainty: Heuristics and biases. Utility, probability, and human decision making. Springer, 141--162.
[50]
Ellen M Voorhees, Donna K Harman, and others . 2005. TREC: Experiment and evaluation in information retrieval. Vol. Vol. 1. MIT press Cambridge.
[51]
Jeroen Vuurens, Arjen P de Vries, and Carsten Eickhoff . 2011. How much spam can you take? an analysis of crowdsourcing results to increase accuracy Proc. ACM SIGIR Workshop on Crowdsourcing for Information Retrieval(CIR'11). 21--26.
[52]
Jing Wang, Siamak Faridani, and Panagiotis G Ipeirotis . 2011. Estimating the Completion Time of Crowdsourced Tasks Using Survival Analysis Models. Crowdsourcing for Search and Data Mining(CSDM 2011)(2011), 31.
[53]
Tianyi Wang, Gang Wang, Xing Li, Haitao Zheng, and Ben Y Zhao . 2013. Characterizing and detecting malicious crowdsourcing. ACM SIGCOMM Computer Communication Review Vol. 43, 4(2013), 537--538.
[54]
Fabian L Wauthier and Michael I Jordan . 2011. Bayesian bias mitigation for crowdsourcing. In Advances in neural information processing systems. 1800--1808.
[55]
Ryen White . 2013. Beliefs and biases in web search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 3--12.
[56]
Qiang Wu, Christopher JC Burges, Krysta M Svore, and Jianfeng Gao . 2010. Adapting boosting for information retrieval measures. Information Retrieval Vol. 13, 3(2010), 254--270.
[57]
Yan Yan, Glenn M Fung, Rómer Rosales, and Jennifer G Dy . 2011. Active learning from crowds. In Proceedings of the 28th international conference on machine learning(ICML-11). 1161--1168.
[58]
Peng Ye and David Doermann . 2013. Combining preference and absolute judgements in a crowd-sourced setting Proc. of Intl. Conf. on Machine Learning. 1--7.
[59]
Dongqing Zhu and Ben Carterette . 2010. An analysis of assessor behavior in crowdsourced preference judgments SIGIR 2010 workshop on crowdsourcing for search evaluation. 17--20.
[60]
Honglei Zhuang and Joel Young . 2015. Leveraging in-batch annotation bias for crowdsourced active learning Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. ACM, 243--252.

Cited By

View all
  • (2025)Herd Accountability of Privacy-Preserving Algorithms: A Stackelberg Game ApproachIEEE Transactions on Information Forensics and Security10.1109/TIFS.2025.354035720(2237-2251)Online publication date: 2025
  • (2024)Crowdsourcing Geospatial Data for Earth and Human Observations: A ReviewJournal of Remote Sensing10.34133/remotesensing.01054Online publication date: 22-Jan-2024
  • (2024)Decoy Effect in Search Interaction: Understanding User Behavior and Measuring System VulnerabilityACM Transactions on Information Systems10.1145/370888443:2(1-58)Online publication date: 19-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining
February 2018
821 pages
ISBN:9781450355810
DOI:10.1145/3159652
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cognitive biases
  2. crowdsourcing
  3. human computation
  4. relevance assessment

Qualifiers

  • Research-article

Conference

WSDM 2018

Acceptance Rates

WSDM '18 Paper Acceptance Rate 81 of 514 submissions, 16%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)110
  • Downloads (Last 6 weeks)9
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Herd Accountability of Privacy-Preserving Algorithms: A Stackelberg Game ApproachIEEE Transactions on Information Forensics and Security10.1109/TIFS.2025.354035720(2237-2251)Online publication date: 2025
  • (2024)Crowdsourcing Geospatial Data for Earth and Human Observations: A ReviewJournal of Remote Sensing10.34133/remotesensing.01054Online publication date: 22-Jan-2024
  • (2024)Decoy Effect in Search Interaction: Understanding User Behavior and Measuring System VulnerabilityACM Transactions on Information Systems10.1145/370888443:2(1-58)Online publication date: 19-Dec-2024
  • (2024)Evaluating Cognitive Biases in Conversational and Generative IIR: A TutorialProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698437(287-290)Online publication date: 8-Dec-2024
  • (2024)AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance AssessmentProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698420(54-63)Online publication date: 8-Dec-2024
  • (2024)Cognitively Biased Users Interacting with Algorithmically Biased Results in Whole-Session Search on Debated TopicsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672520(227-237)Online publication date: 2-Aug-2024
  • (2024)Search under Uncertainty: Cognitive Biases and Heuristics - Tutorial on Modeling Search Interaction using Behavioral EconomicsProceedings of the 2024 Conference on Human Information Interaction and Retrieval10.1145/3627508.3638297(427-430)Online publication date: 10-Mar-2024
  • (2024)Disentangling Web Search on Debated Topics: A User-Centered ExplorationProceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3627043.3659559(24-35)Online publication date: 22-Jun-2024
  • (2024)Search under Uncertainty: Cognitive Biases and Heuristics: A Tutorial on Testing, Mitigating and Accounting for Cognitive Biases in Search ExperimentsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661382(3013-3016)Online publication date: 10-Jul-2024
  • (2024)Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657712(1952-1962)Online publication date: 10-Jul-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media