research-article

Resolvable vs. Irresolvable Disagreement: A Study on Worker Deliberation in Crowd Work

Authors:
Mike Schaekermann

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Joslin Goh

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Kate Larson

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Edith Law

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

Proceedings of the ACM on Human-Computer Interaction Volume 2 Issue CSCWArticle No.: 154pp 1–19https://doi.org/10.1145/3274423

Published:01 November 2018Publication History

Proceedings of the ACM on Human-Computer Interaction

Abstract

Crowdsourced classification of data typically assumes that objects can be unambiguously classified into categories. In practice, many classification tasks are ambiguous due to various forms of disagreement. Prior work shows that exchanging verbal justifications can significantly improve answer accuracy over aggregation techniques. In this work, we study how worker deliberation affects resolvability and accuracy using case studies with both an objective and a subjective task. Results show that case resolvability depends on various factors, including the level and reasons for the initial disagreement, as well as the amount and quality of deliberation activities. Our work reinforces the finding that deliberation can increase answer accuracy and the importance of verbal discussion in this process. We contribute a new public data set on worker deliberation for text classification tasks, and discuss considerations for the design of deliberation workflows for classification.

References

Christopher R. Bilder and Thomas M. Loughin. 2004. Testing for Marginal Independence between Two Categorical Variables with Multiple Responses. Biometrics 60, 1 (3 2004), 241--248.Google Scholar
Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems - CHI '17. ACM, ACM Press, New York, New York, USA, 2334--2346. Google ScholarDigital Library
Norman Dalkey and Olaf Helmer. 1963. An Experimental Application of the DELPHI Method to the Use of Experts. Management Science 9, 3 (4 1963), 458--467.Google Scholar
Heidi Danker-Hopfe, Peter Anderer, Josef Zeitlhofer, Marion Boeck, Hans Dorn, Georg Gruber, Esther Heller, Erna Loretz, Doris Moser, Silvia Parapatics, Bernd Saletu, Andrea Schmidt, and Georg Dorffner. 2009. Interrater reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM standard. Journal of Sleep Research 18, 1 (3 2009), 74--84.Google ScholarCross Ref
Jeff Donahue and Kristen Grauman. 2011. Annotator rationales for visual recognition. In 2011 International Conference on Computer Vision. IEEE, 1395--1402. Google ScholarDigital Library
Shayan Doroudi, Ece Kamar, Emma Brunskill, and Eric Horvitz. 2016. Toward a Learning Science for Complex Crowdsourcing Tasks. In Proceedings of the 2016 SIGCHI Conference on Human Factors in Computing Systems - CHI '16. ACM Press, New York, New York, USA, 2623--2634. Google ScholarDigital Library
Ryan Drapeau, Lydia B. Chilton, Jonathan Bragg, and Daniel S. Weld. 2016. MicroTalk: Using Argumentation to Improve Crowdsourcing Accuracy. In Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP).Google Scholar
Anca Dumitrache, Lora Aroyo, and Chris Welty. 2018. Crowdsourcing Ground Truth for Medical Relation Extraction. ACM Transactions on Interactive Intelligent Systems 8, 2 (7 2018), 1--20. Google ScholarDigital Library
Elena Filatova. 2012. Irony and Sarcasm: Corpus Generation and Analysis Using Crowdsourcing. In Proceedings of the Eight International Conference on Language Resources and Evaluation - LREC '12, Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), 392--398.Google Scholar
Deen G. Freelon, Travis Kriplean, Jonathan Morgan, W. Lance Bennett, and Alan Borning. 2012. Facilitating Diverse Political Engagement with the Living Voters Guide. Journal of Information Technology & Politics 9, 3 (7 2012), 279--297.Google ScholarCross Ref
Snehalkumar (Neil) S. Gaikwad, Mark Whiting, Karolina Ziulkoski, Alipta Ballav, Aaron Gilbee, Senadhipathige S. Niranga, Vibhor Sehgal, Jasmine Lin, Leonardy Kristianto, Angela Richmond-Fuller, Jeff Regino, Durim Morina, Nalin Chhibber, Dinesh Majeti, Sachin Sharma, Kamila Mananova, Dinesh Dhakal, William Dai, Victoria Purynova, Samarth Sandeep, Varshine Chandrakanthan, Tejas Sarma, Adam Ginzberg, Sekandar Matin, Ahmed Nasser, Rohit Nistala, Alexander Stolzoff, Kristy Milland, Vinayak Mathur, Rajan Vaish, Michael S. Bernstein, Catherine Mullings, Shirish Goyal, Dilrukshi Gamage, Christopher Diemert, Mathias Burton, and Sharon Zhou. 2016. Boomerang: Rebounding the Consequences of Reputation Feedback on Crowdsourcing Platforms. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology - UIST '16. ACM Press, New York, New York, USA, 625--637. Google ScholarDigital Library
Luciana Garbayo. 2014. Epistemic Considerations on Expert Disagreement, Normative Justification, and Inconsistency Regarding Multi-criteria Decision Making. Constraint Programming and Decision Making 539 (2014), 35--45. http: //link.springer.com/10.1007/978--3--319-04280-0_5Google ScholarDigital Library
Melody Guan, Varun Gulshan, Andrew Dai, and Geoffrey Hinton. 2018. Who said what: Modeling individual labelers improves classification. In AAAI Conference on Artificial Intelligence. https://arxiv.org/pdf/1703.08774.pdfGoogle ScholarCross Ref
Danna Gurari and Kristen Grauman. 2017. CrowdVerge: Predicting If People Will Agree on the Answer to a Visual Question. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems - CHI '17. ACM, ACM Press, New York, New York, USA, 3511--3522. Google ScholarDigital Library
Francis T. Hartman and Andrew Baldwin. 1995. Using Technology to Improve Delphi Method. Journal of Computing in Civil Engineering 9, 4 (10 1995), 244--249.Google ScholarCross Ref
DavidW. Hosmer and Stanley Lemesbow. 1980. Goodness of fit tests for the multiple logistic regression model. Communications in Statistics - Theory and Methods 9, 10 (1980), 1043--1069.Google ScholarCross Ref
Alan M. Jones. 1973. Victims of Groupthink: A Psychological Study of Foreign Policy Decisions and Fiascoes. The ANNALS of the American Academy of Political and Social Science 407, 1 (5 1973), 179--180.Google Scholar
Sanjay Kairam and Jeffrey Heer. 2016. Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing - CSCW '16. ACM Press, New York, New York, USA, 1635--1646. Google ScholarDigital Library
Sara Kiesler and Lee Sproull. 1992. Group decision making and communication technology. Organizational Behavior and Human Decision Processes 52, 1 (6 1992), 96--123.Google Scholar
Jonathan Krause, Varun Gulshan, Ehsan Rahimy, Peter Karth, Kasumi Widner, Greg S. Corrado, Lily Peng, and Dale R. Webster. 2018. Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy. Ophthalmology (3 2018).Google Scholar
Travis Kriplean, Caitlin Bonnar, Alan Borning, Bo Kinney, and Brian Gill. 2014. Integrating on-demand fact-checking with public dialogue. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing - CSCW '14. ACM Press, New York, New York, USA, 1188--1199. Google ScholarDigital Library
Travis Kriplean, Jonathan Morgan, Deen Freelon, Alan Borning, and Lance Bennett. 2012a. Supporting reflective public thought with considerit. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work - CSCW '12. ACM Press, New York, New York, USA, 265. Google ScholarDigital Library
Travis Kriplean, Michael Toomim, Jonathan Morgan, Alan Borning, and Andrew Ko. 2012b. Is this what you meant?: promoting listening on the web with reflect. In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems - CHI '12. ACM Press, New York, New York, USA, 1559. Google ScholarDigital Library
Weichen Liu, Sijia Xiao, Jacob T Browne, Ming Yang, and Steven P Dow. 2018. ConsensUs: Supporting Multi-Criteria Group Decisions by Visualizing Points of Disagreement. ACM Transactions on Social Computing 1, 1 (1 2018), 4:1--4:26. Google ScholarDigital Library
Peter McCullagh and John Nelder. 1989. Generalized Linear Models (2 ed.). Chapman & Hall/CRC.Google Scholar
Tyler McDonnell, Matthew Lease, Tamer Elsayad, and Mucahid Kutlu. 2016. Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments. In Proceedings of the 4th AAAI Conference on Human Computation and Crowdsourcing (HCOMP).Google Scholar
Jeryl L. Mumpower and Thomas R. Stewart. 1996. Expert Judgement and Expert Disagreement. Thinking & Reasoning 2, 2--3 (7 1996), 191--212.Google ScholarCross Ref
Joaquin Navajas, Tamara Niella, Gerry Garbulsky, Bahador Bahrami, and Mariano Sigman. 2018. Aggregated knowledge from a small number of debates outperforms the wisdom of large crowds. Nature Human Behaviour (1 2018).Google Scholar
Charlan Nemeth. 1977. Interactions Between Jurors as a Function of Majority vs. Unanimity Decision Rules. Journal of Applied Social Psychology 7, 1 (3 1977), 38--56.Google ScholarCross Ref
Gerhard Osius and Dieter Rojek. 1992. Normal Goodness-of-Fit Tests for Multinomial Models with Large Degrees of Freedom. J. Amer. Statist. Assoc. 87, 420 (12 1992), 1145--1152.Google ScholarCross Ref
Shengying Pan, Kate Larson, Joshua Bradshaw, and Edith Law. 2016. Dynamic Task Allocation Algorithm for Hiring Workers that Learn. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016). New York, 3825--3831. Google ScholarDigital Library
Pranav Rajpurkar, Awni Y. Hannun, Masoumeh Haghpanahi, Codie Bourn, and Andrew Y. Ng. 2017. Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks. (7 2017). http://arxiv.org/abs/1707.01836Google Scholar
Richard S. Rosenberg and Steven van Hout. 2013. The American Academy of Sleep Medicine Inter-scorer Reliability Program: Sleep Stage Scoring. Journal of Clinical Sleep Medicine (1 2013).Google Scholar
Harold Sackman. 1974. Delphi assessment: Expert opinion, forecasting, and group process. Technical Report. RAND CORP SANTA MONICA CA.Google Scholar
Manali Sharma, Di Zhuang, and Mustafa Bilgic. 2015. Active Learning with Rationales for Text Classification. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT).Google ScholarCross Ref
Miriam Solomon. 2006. Groupthink versus The Wisdom of Crowds : The Social Epistemology of Deliberation and Dissent. The Southern Journal of Philosophy 44, S1 (3 2006), 28--42.Google ScholarCross Ref
Thérèse A. Stukel. 1988. Generalized Logistic Models. J. Amer. Statist. Assoc. 83, 402 (6 1988), 426--431.Google ScholarCross Ref
Ainur Yessenalina, Yejin Choi, and Claire Cardie. 2010. Automatically Generating Annotator Rationales to Improve Sentiment Classification. In Proceedings of the ACL 2010 Conference Short Papers (ACLShort '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 336--341. http://dl.acm.org/citation.cfm?id=1858842.1858904 Google ScholarDigital Library
Omar F. Zaidan, Jason Eisner, and Christine D. Piatko. 2007. Using "Annotator Rationales" to Improve Machine Learning for Text Categorization. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT). 260--267.Google Scholar
Omar F. Zaidan, Jason Eisner, and Christine D. Piatko. 2008. Machine learning with annotator rationales to reduce annotation cost. In Proceedings of the NIPS 2008 Workshop on Cost Sensitive Learning.Google Scholar
Amy X. Zhang, Lea Verou, and David Karger. 2017. Wikum: Bridging Discussion Forums and Wikis Using Recursive Summarization. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing - CSCW '17. ACM Press, New York, New York, USA, 2082--2096. Google ScholarDigital Library

Index Terms

Resolvable vs. Irresolvable Disagreement: A Study on Worker Deliberation in Crowd Work
1. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing systems and tools
    2. Empirical studies in collaborative and social computing
  2. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Collaborative interaction

Recommendations

Disagreement, Agreement, and Elaboration in Crowdsourced Deliberation: Ideation Through Elaborated Perspectives
CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

In this study, we examined disagreement, agreement, and elaboration (rationale sharing) and their association with idea generation in a crowdsourced deliberation that took place within a crowdsourced policymaking process led by a national government. We ...
Read More
Political participation via social media: a case study of deliberative quality in the public online budgeting process of Frankfurt/Main, Germany 2013

If social media are to reinforce sustainability of political decisions, their design has conceptually to take into account the implications of deliberative democracy, which stresses the active cooperation of virtually all citizens of a democracy for the ...
Read More
E-Democracy and Public Online Budgeting
Proceedings of the 6th International Conference on Social Computing and Social Media - Volume 8531

If social media are to reinforce sustainability of political decisions their design has conceptually to take into account the implications of deliberative democracy, which stresses the active co-operation of virtually all citizens of a democracy for the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Human-Computer Interaction Volume 2, Issue CSCW
November 2018
4104 pages
EISSN:2573-0142
DOI:10.1145/3290265
Editors:
Karrie Karahalios
University of Illinois & Adobe
,
Andrés Monroy-Hernández
Snap Inc.
,
Airi Lampinen
Stockholm University
,
Geraldine Fitzpatrick
Vienna University of Technology
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 2018
Published in pacmhci Volume 2, Issue CSCW

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ambiguity
deliberation
inter-rater disagreement
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 44
  Total Citations
  View Citations
- 529
  Total Downloads
- Downloads (Last 12 months)83
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Resolvable vs. Irresolvable Disagreement: A Study on Worker Deliberation in Crowd Work

Proceedings of the ACM on Human-Computer Interaction

Abstract

References

Cited By

Index Terms

Recommendations

Disagreement, Agreement, and Elaboration in Crowdsourced Deliberation: Ideation Through Elaborated Perspectives

Political participation via social media: a case study of deliberative quality in the public online budgeting process of Frankfurt/Main, Germany 2013

E-Democracy and Public Online Budgeting