skip to main content
10.1145/3308558.3320096acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Crowdsourcing Inclusivity: Dealing with Diversity of Opinions, Perspectives and Ambiguity in Annotated Data

Published: 13 May 2019 Publication History

Abstract

In this tutorial, we introduce a novel crowdsourcing methodology called CrowdTruth [1, 9]. The central characteristic of CrowdTruth is harnessing the diversity in human interpretation to capture the wide range of opinions and perspectives, and thus provide more reliable, realistic and inclusive real-world annotated data for training and evaluating machine learning components. Unlike other methods, we do not discard dissenting votes, but incorporate them into a richer and more continuous representation of truth. CrowdTruth is a widely used crowdsourcing methodology1 adopted by industrial partners and public organizations such as Google, IBM, New York Times, Cleveland Clinic, Crowdynews, Sound and Vision archive, Rijksmuseum, and in a multitude of domains such as AI, news, medicine, social media, cultural heritage, and social sciences. The goal of this tutorial is to introduce the audience to a novel approach to crowdsourcing that takes advantage of the diversity of opinions and perspectives that is inherent to the Web, as methods that deal with disagreement and diversity in crowdsourcing have become increasingly popular. Creating this more complex notion of truth contributes directly to the larger discussion on how to make the Web more reliable, diverse and inclusive.

References

[1]
Lora Aroyo and Chris Welty. 2014. The Three Sides of CrowdTruth. Journal of Human Computation 1 (2014), 31–34. Issue 1.
[2]
Joseph Chee Chang, Saleema Amershi, and Ece Kamar. 2017. Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems(CHI ’17). ACM, New York, NY, USA, 2334–2346.
[3]
Alessandro Checco, Kevin Roitero, Eddy Maddalena, Stefano Mizzaro, and Gianluca Demartini. 2017. Let’s Agree to Disagree: Fixing Agreement Measures for Crowdsourcing., 11–20 pages. http://eprints.whiterose.ac.uk/122865/© 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org).
[4]
Victor De Boer, Johan Oomen, Oana Inel, Lora Aroyo, Elco Van Staveren, Werner Helmich, and Dennis De Beurs. 2015. DIVE in the Event-Based Browsing of Linked Historical Media. Web Semantics: Science, Services and Agents on WWW 35 (2015), 152–158. http://www.websemanticsjournal.org/index.php/ps/article/view/427
[5]
Anca Dumitrache, Lora Aroyo, and Chris Welty. 2017. Crowdsourcing Ground Truth for Medical Relation Extraction. ACM Trans. Interact. Intell. Syst., Special Issue on Human-Centered Machine Learning (in publication) 8, 2 (2017), 12. http://arxiv.org/abs/1701.02185
[6]
Anca Dumitrache, Lora Aroyo, and Chris Welty. 2017. False positive and cross-relation signals in distant supervision data. arXiv preprint arXiv:1711.05186(2017).
[7]
Anca Dumitrache, Lora Aroyo, and Chris Welty. 2018. Capturing ambiguity in crowdsourcing frame disambiguation. In Sixth AAAI Conference on Human Computation and Crowdsourcing.
[8]
Anca Dumitrache, Oana Inel, Lora Aroyo, Benjamin Timmermans, and Chris Welty. 2018. CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement. (2018). https://arxiv.org/abs/1808.06080
[9]
Anca Dumitrache, Oana Inel, Benjamin Timmermans, Carlos Ortiz, Robert-Jan Sips, and Lora Aroyo. 2017. Empirical Methodology for Crowdsourcing Ground Truth. Semantic Web Journal, Special Issue on Human Computation and Crowdsourcing (in review) (2017). http://www.semantic-web-journal.net/content/empirical-methodology-crowdsourcing-ground-truth-0
[10]
Oana Inel and Lora Aroyo. 2017. Harnessing diversity in crowds and machines for better ner performance. In European Semantic Web Conference. Springer, 289–304.
[11]
Oana Inel, Tommaso Caselli, and Lora Aroyo. 2016. Crowdsourcing Salient Information from News and Tweets. In LREC. European Language Resources Association (ELRA), 3959–3966.
[12]
Oana Inel, Giannis Haralabopoulos, Dan Li, Christophe Van Gysel, Zoltán Szlávik, Elena Simperl, Evangelos Kanoulas, and Lora Aroyo. 2018. Studying Topical Relevance with Evidence-based Crowdsourcing. In CIKM. ACM, 1253–1262.

Cited By

View all
  • (2024)A New Perspective for Computational Social Systems: Fuzzy Modeling and Reasoning for Social Computing in CPSSIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.319742111:1(101-116)Online publication date: Feb-2024

Index Terms

  1. Crowdsourcing Inclusivity: Dealing with Diversity of Opinions, Perspectives and Ambiguity in Annotated Data
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          WWW '19: Companion Proceedings of The 2019 World Wide Web Conference
          May 2019
          1331 pages
          ISBN:9781450366755
          DOI:10.1145/3308560
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          In-Cooperation

          • IW3C2: International World Wide Web Conference Committee

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 13 May 2019

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Ambiguity
          2. Computational Social Sciences
          3. Crowdsourcing
          4. Digital Humanities
          5. Diversity
          6. Ground Truth
          7. Inter-annotator Disagreement
          8. Medical Text Annotation
          9. Perspectives

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WWW '19
          WWW '19: The Web Conference
          May 13 - 17, 2019
          San Francisco, USA

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)26
          • Downloads (Last 6 weeks)2
          Reflects downloads up to 18 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)A New Perspective for Computational Social Systems: Fuzzy Modeling and Reasoning for Social Computing in CPSSIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.319742111:1(101-116)Online publication date: Feb-2024

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media