skip to main content
10.1145/3379336.3381499acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
poster

The Influence of Input Data Complexity on Crowdsourcing Quality

Published: 17 March 2020 Publication History

Abstract

Crowdsourcing has a huge impact on data gathering for NLP tasks. However, most quality control measures rely on data aggregation methods which are only employed after the crowdsourcing process and thus cannot deal with different worker qualifications during data gathering. This is time-consuming and cost-ineffective because some datapoints might have to be re-labeled or discarded. Training workers and distributing work according to worker qualifications beforehand helps to overcome this limitation. We propose a setup that accounts for input data complexity and allows only a set of workers that successfully completed tasks of rising complexity to continue work on more difficult subsets. Like this, we are able to train workers and at the same time exclude unqualified workers. In initial experiments, our method achieves higher agreement with four annotations by qualified crowd workers compared to five annotations from random crowd workers on the same dataset.

References

[1]
Ron Artstein and Massimo Poesio. 2008. Inter-coder agreement for computational linguistics. Computational Linguistics 34, 4 (2008), 555--596.
[2]
Michael S. Bernstein, Greg Little, Robert C. Miller, Björn Hartmann, Mark S. Ackerman, David R. Karger, David Crowell, and Katrina Panovich. 2015. Soylent: a word processor with a crowd inside. Commun. ACM 58, 8 (2015), 85--94.
[3]
Ria Mae Borromeo, Thomas Laurent, Motomichi Toyama, Maha Alsayasneh, Sihem Amer-Yahia, and Vincent Leroy. 2017. Deployment strategies for crowdsourcing text creation. Information Systems 71 (Nov. 2017), 103--110.
[4]
Jonathan Bragg, Andrey Kolobov, Mausam, and Daniel Weld. 2014. Parallel Task Routing for Crowdsourcing. In Proc. of HCOMP-14.
[5]
Rudolph Flesch. 1948. A new readability yardstick. J. Appl. Psychol. 32, 3 (1948).
[6]
Youxuan Jiang, Jonathan K. Kummerfeld, and Walter S. Lasecki. 2017. Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection. In Proc. of ACL-17. 103--109.
[7]
Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush. 2016. Character-aware Neural Language Models. In Proc. of AAAI'16. 2741--2749.
[8]
Jianhua Lin. 1991. Divergence Measures Based on the Shannon Entropy. IEEE Transactions on Information Theory 37, 1 (Sept. 1991), 145--151.
[9]
Claude Elwood Shannon. 2001. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5, 1 (2001), 3--55.
[10]
Christian Stab, Tristan Miller, Benjamin Schiller, Pranav Rai, and Iryna Gurevych. 2018. Cross-topic Argument Mining from Heterogeneous Sources. In Proc. of EMNLP-18. Brussels, Belgium, 3664--3674.
[11]
Jie Yang, Judith Redi, Gianluca DeMartini, and Alessandro Bozzon. 2016. Modeling Task Complexity in Crowdsourcing. In Proc. of HCOMP-16. 249--258.
[12]
Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth Inference in Crowdsourcing: Is the Problem Solved? Proc. VLDB Endow. 10, 5 (Jan. 2017), 541--552.

Cited By

View all
  • (2022)Annotation Curricula to Implicitly Train Non-Expert AnnotatorsComputational Linguistics10.1162/coli_a_0043648:2(343-373)Online publication date: 9-Jun-2022
  • (2021)Aggregating Reliable Submissions in Crowdsourcing SystemsIEEE Access10.1109/ACCESS.2021.31279949(153058-153071)Online publication date: 2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IUI '20 Companion: Companion Proceedings of the 25th International Conference on Intelligent User Interfaces
March 2020
153 pages
ISBN:9781450375139
DOI:10.1145/3379336
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 March 2020

Check for updates

Author Tags

  1. Crowdsourcing
  2. Natural Language Processing
  3. Task distribution

Qualifiers

  • Poster
  • Research
  • Refereed limited

Conference

IUI '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Annotation Curricula to Implicitly Train Non-Expert AnnotatorsComputational Linguistics10.1162/coli_a_0043648:2(343-373)Online publication date: 9-Jun-2022
  • (2021)Aggregating Reliable Submissions in Crowdsourcing SystemsIEEE Access10.1109/ACCESS.2021.31279949(153058-153071)Online publication date: 2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media