A Study on the Influence of the Number of MTurkers on the Quality of the Aggregate Output

Carvalho, Arthur; Dimitrov, Stanko; Larson, Kate

doi:10.1007/978-3-319-17130-2_19

Arthur Carvalho⁵,
Stanko Dimitrov⁶ &
Kate Larson⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8953))

Included in the following conference series:

European Conference on Multi-Agent Systems

1221 Accesses
2 Citations

Abstract

Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a large group of workers at a reduced cost. In general, there are arguments for and against using multiple workers to perform a task. On the positive side, multiple workers bring different perspectives to the process, which may result in a more accurate aggregate output since biases of individual judgments might offset each other. On the other hand, a larger population of workers is more likely to have a higher concentration of poor workers, which might bring down the quality of the aggregate output.

In this paper, we empirically investigate how the number of workers on the crowdsourcing platform Amazon Mechanical Turk influences the quality of the aggregate output in a content-analysis task. We find that both the expected error in the aggregate output as well as the risk of a poor combination of workers decrease as the number of workers increases.

Moreover, our results show that restricting the population of workers to up to the overall top 40 % workers is likely to produce more accurate aggregate outputs, whereas removing up to the overall worst 40 % workers can actually make the aggregate output less accurate. We find that this result holds due to top-performing workers being consistent across multiple tasks, whereas worst-performing workers tend to be inconsistent. Our results thus contribute to a better understanding of, and provide valuable insights into, how to design more effective crowdsourcing processes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.mturk.com.

References

Carvalho, A., Dimitrov, S., Larson, K.: The output-agreement method induces honest behavior in the presence of social projection. ACM SIGecom Exch. 13(1), 77–81 (2014)
Article Google Scholar
Carvalho, A., Larson, K.: A consensual linear opinion pool. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pp. 2518–2524. AAAI Press (2013)
Google Scholar
Clemen, R.T.: Combining forecasts: a review and annotated bibliography. Int. J. Forecast. 5(4), 559–583 (1989)
Article Google Scholar
Gao, X.A., Mao, A., Chen, Y.: Trick or treat: putting peer prediction to the test. In: Proceedings of the 1st Workshop on Crowdsourcing and Online Behavioral Experiments (2013)
Google Scholar
Ho, C.J., Vaughan, J.W.: Online task assignment in crowdsourcing markets. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence, pp. 45–51 (2012)
Google Scholar
Ipeirotis, P.G.: Analyzing the amazon mechanical turk marketplace. XRDS Crossroads: ACM Mag. Stud. 17(2), 16–21 (2010)
Article Google Scholar
Lin, C.H., Weld, D.S.: Dynamically switching between synergistic workflows for crowdsourcing. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence, pp. 132–133 (2012)
Google Scholar
Mason, W., Suri, S.: Conducting behavioral research on amazon’s mechanical turk. Behav. Res. Methods 44(1), 1–23 (2012)
Article Google Scholar
Neruda, P.: 100 Love Sonnets. Exile, Holstein (2007)
Google Scholar
Quinn, A.J., Bederson, B.B.: Human computation: a survey and taxonomy of a growing field. In: Proceedings of the 2011 SIGCHI Conference on Human Factors in Computing Systems, pp. 1403–1412 (2011)
Google Scholar
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008)
Google Scholar
Taylor, J., Taylor, A., Greenaway, K.: Little Ann and Other Poems. Nabu Press, Charleston (2010)
Google Scholar
Tran-Thanh, L., Stein, S., Rogers, A., Jennings, N.R.: Efficient crowdsourcing of unknown experts using multi-armed bandits. In: Proceedings of the 20th European Conference on Artificial Intelligence, pp. 768–773 (2012)
Google Scholar
Yuen, M.C., King, I., Leung, K.S.: A survey of crowdsourcing systems. In: Proceedings of IEEE 3rd International Conference on Social Computing, pp. 766–773 (2011)
Google Scholar
Zhang, H., Horvitz, E., Parkes, D.: Automated workflow synthesis. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence, pp. 1020–1026 (2013)
Google Scholar

Download references

Acknowledgments

The authors acknowledge Craig Boutilier, Pascal Poupart, Daniel Lizotte, and Xi Alice Gao for useful discussions. The authors thank Carol Acton, Katherine Acheson, Stefan Rehm, Susan Gow, and Veronica Austen for providing gold-standard outputs for our experiment. The authors also thank the Natural Sciences and Engineering Research Council of Canada for funding this research.

Author information

Authors and Affiliations

Rotterdam School of Management, Erasmus University, Rotterdam, The Netherlands
Arthur Carvalho
Department of Management Sciences, University of Waterloo, Waterloo, Canada
Stanko Dimitrov
Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada
Kate Larson

Authors

Arthur Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Stanko Dimitrov
View author publications
You can also search for this author in PubMed Google Scholar
Kate Larson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arthur Carvalho .

Editor information

Editors and Affiliations

Delft University of Technology, Delft, The Netherlands
Nils Bulling

Appendices

A Description of the Texts

We describe in this appendix the texts used in our experiments as well as the gold-standard scores.

1.1 Text 1

An excerpt from the “Sonnet XVII” by Neruda [9]. Intentionally misspelled words are highlighted in bold.

“I do not love you as if you was salt-rose, or topaz

or the arrown of carnations that spread fire:

I love you as certain dark things are loved,

secretly, between the shadown and the soul”

The gold-standard scores for the criteria grammar, clarity, and relevance are, respectively, 1, 2, and 2.

1.2 Text 2

An excerpt from “The Cow” by Taylor et al. [12]. Intentionally misspelled words are highlighted in bold.

“THANK you, prety cow, that made

Plesant milk to soak my bread,

Every day and every night,

Warm, and fresh, and sweet, and white.”

The gold-standard scores for the criteria grammar, clarity, and relevance are, respectively, 1, 2, and 1.

1.3 Text 3

Words randomly generated in a semi-structured way. Each line starts with a noun followed by a verb in a wrong verb form. In order to mimic a poetic writing style, all the words in the same line start with a similar letter.

“Baby bet binary boundaries bubbles

Carlos cease CIA conditionally curve

Daniel deny disease domino dumb

Faust fest fierce forced furbished”

The gold-standard scores for the criteria grammar, clarity, and relevance are, respectively, 0, 0, and 0.

B Experimental Results

Table 1 shows the numerical results from all the analysis performed in this paper.

Table 1. The average error, the standard deviation of the errors, and the maximum error per text for different populations of agents. All the values are rounded to 3 decimal places.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carvalho, A., Dimitrov, S., Larson, K. (2015). A Study on the Influence of the Number of MTurkers on the Quality of the Aggregate Output. In: Bulling, N. (eds) Multi-Agent Systems. EUMAS 2014. Lecture Notes in Computer Science(), vol 8953. Springer, Cham. https://doi.org/10.1007/978-3-319-17130-2_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-17130-2_19
Published: 16 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17129-6
Online ISBN: 978-3-319-17130-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics