Skip to main content

A Study on the Influence of the Number of MTurkers on the Quality of the Aggregate Output

  • Conference paper
  • First Online:
Multi-Agent Systems (EUMAS 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8953))

Included in the following conference series:

Abstract

Recent years have seen an increased interest in crowdsourcing as a way of obtaining information from a large group of workers at a reduced cost. In general, there are arguments for and against using multiple workers to perform a task. On the positive side, multiple workers bring different perspectives to the process, which may result in a more accurate aggregate output since biases of individual judgments might offset each other. On the other hand, a larger population of workers is more likely to have a higher concentration of poor workers, which might bring down the quality of the aggregate output.

In this paper, we empirically investigate how the number of workers on the crowdsourcing platform Amazon Mechanical Turk influences the quality of the aggregate output in a content-analysis task. We find that both the expected error in the aggregate output as well as the risk of a poor combination of workers decrease as the number of workers increases.

Moreover, our results show that restricting the population of workers to up to the overall top 40 % workers is likely to produce more accurate aggregate outputs, whereas removing up to the overall worst 40 % workers can actually make the aggregate output less accurate. We find that this result holds due to top-performing workers being consistent across multiple tasks, whereas worst-performing workers tend to be inconsistent. Our results thus contribute to a better understanding of, and provide valuable insights into, how to design more effective crowdsourcing processes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.mturk.com.

References

  1. Carvalho, A., Dimitrov, S., Larson, K.: The output-agreement method induces honest behavior in the presence of social projection. ACM SIGecom Exch. 13(1), 77–81 (2014)

    Article  Google Scholar 

  2. Carvalho, A., Larson, K.: A consensual linear opinion pool. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pp. 2518–2524. AAAI Press (2013)

    Google Scholar 

  3. Clemen, R.T.: Combining forecasts: a review and annotated bibliography. Int. J. Forecast. 5(4), 559–583 (1989)

    Article  Google Scholar 

  4. Gao, X.A., Mao, A., Chen, Y.: Trick or treat: putting peer prediction to the test. In: Proceedings of the 1st Workshop on Crowdsourcing and Online Behavioral Experiments (2013)

    Google Scholar 

  5. Ho, C.J., Vaughan, J.W.: Online task assignment in crowdsourcing markets. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence, pp. 45–51 (2012)

    Google Scholar 

  6. Ipeirotis, P.G.: Analyzing the amazon mechanical turk marketplace. XRDS Crossroads: ACM Mag. Stud. 17(2), 16–21 (2010)

    Article  Google Scholar 

  7. Lin, C.H., Weld, D.S.: Dynamically switching between synergistic workflows for crowdsourcing. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence, pp. 132–133 (2012)

    Google Scholar 

  8. Mason, W., Suri, S.: Conducting behavioral research on amazon’s mechanical turk. Behav. Res. Methods 44(1), 1–23 (2012)

    Article  Google Scholar 

  9. Neruda, P.: 100 Love Sonnets. Exile, Holstein (2007)

    Google Scholar 

  10. Quinn, A.J., Bederson, B.B.: Human computation: a survey and taxonomy of a growing field. In: Proceedings of the 2011 SIGCHI Conference on Human Factors in Computing Systems, pp. 1403–1412 (2011)

    Google Scholar 

  11. Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008)

    Google Scholar 

  12. Taylor, J., Taylor, A., Greenaway, K.: Little Ann and Other Poems. Nabu Press, Charleston (2010)

    Google Scholar 

  13. Tran-Thanh, L., Stein, S., Rogers, A., Jennings, N.R.: Efficient crowdsourcing of unknown experts using multi-armed bandits. In: Proceedings of the 20th European Conference on Artificial Intelligence, pp. 768–773 (2012)

    Google Scholar 

  14. Yuen, M.C., King, I., Leung, K.S.: A survey of crowdsourcing systems. In: Proceedings of IEEE 3rd International Conference on Social Computing, pp. 766–773 (2011)

    Google Scholar 

  15. Zhang, H., Horvitz, E., Parkes, D.: Automated workflow synthesis. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence, pp. 1020–1026 (2013)

    Google Scholar 

Download references

Acknowledgments

The authors acknowledge Craig Boutilier, Pascal Poupart, Daniel Lizotte, and Xi Alice Gao for useful discussions. The authors thank Carol Acton, Katherine Acheson, Stefan Rehm, Susan Gow, and Veronica Austen for providing gold-standard outputs for our experiment. The authors also thank the Natural Sciences and Engineering Research Council of Canada for funding this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arthur Carvalho .

Editor information

Editors and Affiliations

Appendices

A Description of the Texts

We describe in this appendix the texts used in our experiments as well as the gold-standard scores.

1.1 Text 1

An excerpt from the “Sonnet XVII” by Neruda [9]. Intentionally misspelled words are highlighted in bold.

“I do not love you as if you was salt-rose, or topaz

or the arrown of carnations that spread fire:

I love you as certain dark things are loved,

secretly, between the shadown and the soul”

The gold-standard scores for the criteria grammar, clarity, and relevance are, respectively, 1, 2, and 2.

1.2 Text 2

An excerpt from “The Cow” by Taylor et al. [12]. Intentionally misspelled words are highlighted in bold.

“THANK you, prety cow, that made

Plesant milk to soak my bread,

Every day and every night,

Warm, and fresh, and sweet, and white.”

The gold-standard scores for the criteria grammar, clarity, and relevance are, respectively, 1, 2, and 1.

1.3 Text 3

Words randomly generated in a semi-structured way. Each line starts with a noun followed by a verb in a wrong verb form. In order to mimic a poetic writing style, all the words in the same line start with a similar letter.

“Baby bet binary boundaries bubbles

Carlos cease CIA conditionally curve

Daniel deny disease domino dumb

Faust fest fierce forced furbished”

The gold-standard scores for the criteria grammar, clarity, and relevance are, respectively, 0, 0, and 0.

B Experimental Results

Table 1 shows the numerical results from all the analysis performed in this paper.

Table 1. The average error, the standard deviation of the errors, and the maximum error per text for different populations of agents. All the values are rounded to 3 decimal places.

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Carvalho, A., Dimitrov, S., Larson, K. (2015). A Study on the Influence of the Number of MTurkers on the Quality of the Aggregate Output. In: Bulling, N. (eds) Multi-Agent Systems. EUMAS 2014. Lecture Notes in Computer Science(), vol 8953. Springer, Cham. https://doi.org/10.1007/978-3-319-17130-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17130-2_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17129-6

  • Online ISBN: 978-3-319-17130-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics