An Analysis of the Use of Qualifications on the Amazon Mechanical Turk Online Labor Market

Sodré, Ianna; Brasileiro, Francisco

doi:10.1007/s10606-017-9283-z

An Analysis of the Use of Qualifications on the Amazon Mechanical Turk Online Labor Market

Published: 07 July 2017

Volume 26, pages 837–872, (2017)
Cite this article

Computer Supported Cooperative Work (CSCW) Aims and scope Submit manuscript

1215 Accesses
4 Citations
2 Altmetric
Explore all metrics

Abstract

Several human computation systems use crowdsourcing labor markets to recruit workers. However, it is still a challenge to guarantee that the results produced by workers have a high enough quality. This is particularly difficult in markets based on micro-tasks, where the assessment of the quality of the results needs to be done automatically. Pre-selection of suitable workers is a mechanism that can improve the quality of the results achieved. This can be done by considering worker’s personal information, worker’s historical behavior in the system, or through the use of customized qualification tasks. However, little is known about how requesters use these mechanisms in practice. This study advances present knowledge in worker pre-selection by analyzing data collected from the Amazon Mechanical Turk platform, regarding the way requesters use qualifications to this end. Furthermore, the influence of using customized qualification tasks in the quality of the results produced by workers is investigated. Results show that most jobs (93.6%) use some mechanism for the pre-selection of workers. While most workers use standard qualifications provided by the system, the few requesters that submit most of the jobs prefer to use customized ones. Regarding worker behavior, we identified a positive and significant correlation between the propensity of the worker to possess a particular qualification, and both the number of tasks that require this qualification, and the reward offered for the tasks that require the qualification, although this correlation is weak. To assess the impact that the use of customized qualifications has in the quality of the results produced, we have executed experiments with three different types of tasks using both unqualified and qualified workers. The results showed that, generally, qualified workers provide more accurate answers, when compared to unqualified ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effects of Increasing Working Opportunity on Result Quality in Labor-Intensive Crowdsourcing

Risks and Rewards of Crowdsourcing Marketplaces

How (not) to Incent Crowd Workers

Article 08 April 2015

Notes

As commented before, result is the term commonly used to refer to the output of the activity executed by the worker.
At the time that the data was collected, this information was publicly available; currently, it is no longer available.
This is the output of the execution of the Ward algorithm, which is used as input to the k-means.
When the threshold is set to 1, none of the three workers that executed the tasks of the Categorize jobs could be considered as qualified workers, therefore we cannot compute a new mean accuracy value for this case.

References

Agrawal, Rakesh; and Ramakrishnan Srikant (1994). Fast algorithms for mining association rules in Large Databases. In Jorge B. Bocca, Matthias Jarke; and Carlo Zaniolo (eds.): VLDB’94. Proceedings of the 20^th International Conference on Very Large Data Bases, Santiago, Chile, 12 September – 15 September 1994. San Francisco: Morgan Kaufmann Publishers Inc., pp. 487–499.
Allahbakhsh, Mohammad; Boualem Benatallah; Aleksandar Ignjatovic; Hamid Reza Motahari-Nezhad; Elisa Bertino; and Schahram Dustdar (2013). Quality Control in Crowdsourcing Systems: Issues and Directions. IEEE Internet Computing, vol. 17, no. 2, pp. 76–81.
Article Google Scholar
Amazon Web Services (2017). Amazon Mechanical Turk - Requester UI Guide. AWS Documentation. Amazon Web Services (AWS), 2017. https://docs.aws.amazon.com/AWSMechTurk/latest/RequesterUI/amt-ui.pdf. Accessed 28 January 2017.
Archak, Nikolay (2010). Money, Glory and Cheap Talk: Analyzing Strategic Behavior of Contestants in Simultaneous Crowdsourcing Contests on TopCoder.Com. In Michael Rappa, Paul Jones, Juliana Freire; and Soumen Chakrabarti (eds.): WWW’10. Proceedings of the 19^th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010. New York: ACM, pp. 21–30.
Google Scholar
Barowy, Daniel W.; Charlie Curtsinger; Emery D. Berger; and Andrew McGregor (2012). AutoMan: A Platform for Integrating Human-based and Digital Computation. In Gary T. Leavens and Matthew B. Dwyer (eds.): OOPSLA’12. Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, Tucson, AZ, USA, 21–25 October 2012. New York: ACM, pp. 639–654.
Google Scholar
Bernstein, Michael S.; Greg Little; Robert C. Miller; Björn Hartmann; Mark S. Ackerman; David R. Karger; David Crowell; and Katrina Panovich (2015). Soylent: A Word Processor with a Crowd Inside. Communications of the ACM, vol. 58, no. 8, pp. 85–94.
Article Google Scholar
Callison-Burch, Chris (2009). Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk. In Philipp Koehn and Rada Mihalcea (eds.): EMNLP’09. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–7 August 2009. Stroudsburg: Association for Computational Linguistics, pp. 286–295.
Google Scholar
Carletta, Jean (1996). Assessing Agreement on Classification Tasks: the kappa Statistic. Computational Linguistics, vol. 22, no. 2, pp. 249–254.
Google Scholar
Difallah, Djellel Eddine; Gianluca Demartini; and Philippe Cudré-Mauroux (2012). Mechanical Cheat: Spamming Schemes and Adversarial Techniques on Crowdsourcing Platforms. In Ricardo A. Baeza-Yates, Stefano Ceri, Piero Fraternali; and Fausto Giunchiglia (eds.): CrowdSearch’12. Proceedings of the First International Workshop on Crowdsourcing Web Search, Lyon, France, 17 April 2012. Aachen: CEUR Workshop Proceedings, pp. 26–30.
Google Scholar
Dow, Steven; Anand Kulkarni; Brie Bunge; Truc Nguyen; Scott Klemmer; and Björn Hartmann (2011). Shepherding the Crowd: Managing and Providing Feedback to Crowd Workers. In Desney Tan, Geraldine Fitzpatrick, Carl Gutwin, Bo Begole; and Wendy A. Kellogg (eds.): CHI EA’11. CHI’11 Extended Abstracts on Human Factors in Computing Systems, Vancouver, BC, Canada, 7–12 May 2011. New York: ACM, pp. 1669–1674.
Google Scholar
Eickhoff, Carsten; and Arjen P. de Vries (2011). How Crowdsourcable is your Task? In Matthew Lease, Vitor R. Carvalho; and Emine Yilmaz (eds.): CSE’10. Proceedings of the SIGIR 2011 Workshop on Crowdsourcing for Search Evaluation, Hong Kong, China, 9 February 2011. New York: ACM, pp. 11–14.
Feldman, Ronen; and James Sanger (2007). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge: Cambridge University Press.
Gadiraju, Ujwal; Ricardo Kawase; and Stefan Dietze (2014). A Taxonomy of Microtasks on the Web. In Leo Ferres, Gustavo Rossi, Virgilio Almeida; and Eelco Herder (eds.): HT’14. Proceedings of the 25^th ACM Conference on Hypertext and Social Media, Santiago, Chile, 1–4 September 2014. New York: ACM, pp. 218–223.
Google Scholar
Grier, David Alan (2013). When Computers Were Human. Princeton: Princeton University Press.
Book Google Scholar
Hartigan, John A.; and Manchek A. Wong (1979). Algorithm AS 136: A k-means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), vol. 28, no. 1, pp. 100–108.
Hsu, Chih-Wei; Chih-Chung Chang; and Chih-Jen Lin (2016). A Practical Guide to Support Vector Classification. Online guide. National Taiwan University, 2016. http://www.csie.ntu.edu.tw/∼cjlin/paper s/guide/guide.pdf. Accessed 28 January 2017.
Ipeirotis, Panagiotis G. (2010). Analyzing the Amazon Mechanical Turk Marketplace. XRDS: Crossroads, The ACM Magazine for Students, vol. 17, no. 2, pp. 16–21.
Article Google Scholar
Ipeirotis, Panagiotis G.; Foster Provost; and Jing Wang (2010). Quality Management on Amazon Mechanical Turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation, Washington, DC, USA, 25 July 2010. New York: ACM, pp. 64–67.
Khanna, Shashank; Aishwarya Ratan; James Davis; and William Thies (2010). Evaluating and Improving the Usability of Mechanical Turk for Low-income Workers in India. In Andrew Dearden, Tapan Parikh; and Lakshminarayanan Subramanian (eds.): ACM DEV ‘10. Proceedings of the First ACM Symposium on Computing for Development, London, UK, 17–18 December 2010. New York: ACM, pp. 12:1–12:10.
Google Scholar
Khazankin, Roman; Harald Psaier; Daniel Schall; and Schahram Dustdar (2011). QoS-Based Task Scheduling in Crowdsourcing Environments. In Gerti Kappel, Zakaria Maamar; and Hamid R. Motahari-Nezhad (eds.): ICSOC’11. Proceedings of the 9^th International Conference on Service-Oriented Computing, Paphos, Cyprus, 5–8 December 2011. Berlin: Springer-Verlag, pp. 297–311.
Google Scholar
Kittur, Aniket; Ed H. Chi; and Bongwon Suh (2008). Crowdsourcing User Studies with Mechanical Turk. In Mary Czerwinski, Arnie Lund; and Desney Tan (eds.): CHI’08. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Florence, Italy, 5–10 April 2008. New York: ACM, pp. 453–456.
Google Scholar
Kittur, Aniket; Jeffrey V. Nickerson; Michael Bernstein; Elizabeth Gerber; Aaron Shaw; John Zimmerman; Matt Lease; and John Horton (2013). The Future of Crowd Work. In Amy Bruckman, Scott Counts, Cliff Lampe; and Loren Terveen (eds.): CSCW’13. Proceedings of the 2013 Conference on Computer Supported Cooperative Work, San Antonio, Texas, USA, 23–27 February 2013. New York: ACM, pp. 1301–1318.
Google Scholar
Kohavi, Ron (1995). A Study of Cross-validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the 14^th International Joint Conference on Artificial Intelligence - Vol. 2, Montreal, Quebec, Canada, 20–25 August 1995. San Francisco: Morgan Kaufmann Publishers Inc., pp. 1137–1143.
Kokkodis, Marios; and Panagiotis G. Ipeirotis (2013). Have You Done Anything Like That?: Predicting Performance Using Inter-category Reputation. In Stefano Leonardi, Alessandro Panconesi, Paolo Ferragina; and Aristides Gionis (eds.): WSDM’13. Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, Rome, Italy, 4–8 February 2013. New York: ACM, pp. 435–444.
Le, John; Andy Edmonds; Vaughn Hester; and Lukas Biewald (2010). Ensuring Quality in Crowdsourced Search Relevance Evaluation: The Effects of Training Question Distribution. In Vitor R. Carvalho, Matthew Lease; and Emine Yilmaz (eds.): CSE’10. Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation, Geneva, Switzerland, 23 July 2010. New York: ACM, pp. 21–26.
Google Scholar
Mason, Winter; and Duncan J. Watts (2010). Financial Incentives and the Performance of Crowds. ACM SigKDD Explorations Newsletter, vol. 11, no. 2, pp. 100–108.
Oleson, David; Alexander Sorokin; Greg Laughlin; Vaughn Hester; John Le; and Lukas Biewald (2011). Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing. In Luis von Ahn and Panagiotis G. Ipeirotis (eds.): AAAIWS’11. Proceedings of the 11^th AAAI Conference on Human Computation, San Francisco, CA, USA, 8 August 2011. Palo Alto: AAAI Press, pp. 43–48.
Google Scholar
Ponciano, Lesandro; and Francisco Brasileiro (2013). On the Dynamics of Micro-and Macro-task Human Computation Markets. Universidade Federal de Campina Grande, Brazil: Distributed Systems Lab, Department of Systems and Computing.
Ponciano, Lesandro; Francisco Brasileiro; Nazareno Andrade; and Lívia Sampaio (2014). Considering Human Aspects on Strategies for Designing and Managing Distributed Human Computation. Journal of Internet Services and Applications, vol. 5, no. 1, pp. 1–15.
Article Google Scholar
Quinn, Alexander J.; and Benjamin B. Bederson (2011). Human Computation: A Survey and Taxonomy of a Growing Field. In Desney Tan, Geraldine Fitzpatrick, Carl Gutwin, Bo Begole; and Wendy A. Kellogg (eds.): CHI’11. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Vancouver, BC, Canada, 7–12 May 2011. New York: ACM, pp. 1403–1412.
Rzeszotarski, Jeffrey M.; and Aniket Kittur (2011). Instrumenting the Crowd: Using Implicit Behavioral Measures to Predict Task Performance. In Jeff Pierce, Maneesh Agrawala; and Scott Klemmer (eds.): UIST’11. Proceedings of the 24^th Annual ACM Symposium on User Interface Software and Technology, Santa Barbara, CA, USA, 16–19 October 2011. New York: ACM, pp. 13–22.
Schulze, Thimo; Dennis Nordheimer; and Martin Schader (2013). Worker Perception of Quality Assurance Mechanisms in Crowdsourcing and Human Computation Markets. In Proceedings of the 19th Americas Conference on Information Systems, Chicago, IL, USA, 15–17 August 2013. Red Hook: Curran Associates, Inc., pp. 4046–4056.
Snow, Rion; Brendan O’Connor; Daniel Jurafsky; and Andrew Y. Ng (2008). Cheap and Fast—but is It Good?: Evaluating Non-expert Annotations for Natural Language Tasks. In Mirella Lapata and Hwee Tou Ng (eds.): EMNLP’08. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, 25–27 October 2008. Stroudsburg: Association for Computational Linguistics, pp. 254–263.
Google Scholar
Su, Qi; Dmitry Pavlov; Jyh-Herng Chow; and Wendell C. Baker (2007). Internet-scale Collection of Human-reviewed Data. In Carey Williamson, Mary Ellen Zurko, Peter Patel-Schneider; and Prashant Shenoy (eds.): WWW’07. Proceedings of the 16^th International Conference on World Wide Web, Banff, Alberta, Canada, 8–12 May 2007. New York: ACM, pp. 231–240.
Vakharia, Donna; and Matthew Lease (2013). Beyond AMT: An Analysis of Crowd Work Platforms. Computing Research Repository, vol. abs/1310.1672, pp. 1–17.
Ward, Joe H. (1963). Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association, vol. 58, no. 301, pp. 236–244.
Article MathSciNet Google Scholar
Yu-Wei, Chiu David Chiu (2015). Machine Learning with R Cookbook. Birmingham: Packt Publishing Ltd
Google Scholar
Zar, Jerrold H. (2007). Biostatistical Analysis (5^th Edition). Upper Saddle River,: Prentice-Hall, Inc.
Google Scholar
Zhu, Dongqing; and Ben Carterette (2010). An Analysis of Assessor Behavior in Crowdsourced Preference Judgments. In Vitor R. Carvalho, Matthew Lease; and Emine Yilmaz (eds.): CSE’10. Proceedings of the SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation, Geneva, Switzerland, 23 July 2010. New York: ACM, pp. 17–20.

Download references

Acknowledgements

The authors are indebted to the anonymous reviewers for their insightful comments and recommendations. Francisco Brasileiro is a CNPq/Brazil researcher.

Funding

Francisco Brasileiro is a CNPq/Brazil researcher (grant 311,297/2014–5).

Author information

Authors and Affiliations

Departamento de Sistemas e Computação, Universidade Federal de Campina Grande, Av. Aprígio Veloso, 882 - Bloco CO, CEP, 58429-900, Campina Grande, Paraíba, Brazil
Ianna Sodré & Francisco Brasileiro

Authors

Ianna Sodré
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Brasileiro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ianna Sodré.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sodré, I., Brasileiro, F. An Analysis of the Use of Qualifications on the Amazon Mechanical Turk Online Labor Market. Comput Supported Coop Work 26, 837–872 (2017). https://doi.org/10.1007/s10606-017-9283-z

Download citation

Published: 07 July 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10606-017-9283-z

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Analysis of the Use of Qualifications on the Amazon Mechanical Turk Online Labor Market

Abstract

Access this article

Similar content being viewed by others

Effects of Increasing Working Opportunity on Result Quality in Labor-Intensive Crowdsourcing

Risks and Rewards of Crowdsourcing Marketplaces

How (not) to Incent Crowd Workers

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key words

Navigation

An Analysis of the Use of Qualifications on the Amazon Mechanical Turk Online Labor Market

Abstract

Access this article

Similar content being viewed by others

Effects of Increasing Working Opportunity on Result Quality in Labor-Intensive Crowdsourcing

Risks and Rewards of Crowdsourcing Marketplaces

How (not) to Incent Crowd Workers

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation